🗣️

VOICEVOXのコアが別プロジェクト化されたので試してみた

2021/09/14に公開

VOICEVOX

idea

以前記事にしたVOICEVOXですがコアが別プロジェクトとして公開されWineを使わなくてもLinuxで使えるようになったとのことで早速試してみました

[以前の記事]

PCに直接インストールするのは避けたいのでDocker版を入れてみました
Dockerでビルドしたあとは消えても構わないので/tmpで作業しています

とりあえずgitクローン

$ cd /tmp
$ git clone https://github.com/Hiroshiba/voicevox_core.git

Dockerイメージをビルドする

時間掛かるから気長に放置

$ cd voicevox_core/
$ docker build -t voicevox_core example/python

合成した音声ファイルの保存用ディレクトリを作成しマウントして起動

$ mkdir /tmp/voice
$ docker run -it -v /tmp/voice:/root/voice voicevox_core bash

とりあえず音声合成

root@420cf51a9161:/voicevox_core/example/python# python run.py --text おはようございます --speaker_id 1

合成した音声ファイルをマウントしたディレクトリに移動してコンテナを抜ける

root@420cf51a9161:/voicevox_core/example/python# mv *wav /root/voice/
root@420cf51a9161:/voicevox_core/example/python#
exit

ホストOSにて生成された音声を再生

$ aplay /tmp/voice/おはようございます-1.wav
再生中 WAVE '/tmp/voice/おはようございます-1.wav' : Signed 16 bit Little Endian, レート 24000 Hz, モノラル

実際の作業ログ

$ cd /tmp
$ git clone https://github.com/Hiroshiba/voicevox_core.git
Cloning into 'voicevox_core'...
remote: Enumerating objects: 54, done.
remote: Counting objects: 100% (54/54), done.
remote: Compressing objects: 100% (37/37), done.
remote: Total 54 (delta 16), reused 34 (delta 10), pack-reused 0
Unpacking objects: 100% (54/54), 19.79 KiB | 1.10 MiB/s, done.
$ cd voicevox_core/
$ docker build -t voicevox_core example/python
Sending build context to Docker daemon  46.08kB
Step 1/16 : FROM python:3.9.6
3.9.6: Pulling from library/python
4c25b3090c26: Pull complete
1acf565088aa: Pull complete
b95c0dd0dc0d: Pull complete
5cf06daf6561: Pull complete
942374d5c114: Pull complete
64c0f10e4cfa: Pull complete
76571888410b: Pull complete
5e88ca15437b: Pull complete
0ab5ec771994: Pull complete
Digest: sha256:2bd64896cf4ff75bf91a513358457ed09d890715d9aa6bb602323aedbee84d14
Status: Downloaded newer image for python:3.9.6
 ---> 1e76b28bfd4e
Step 2/16 : RUN apt-get update -yqq
 ---> Running in 7c24a76596df
Removing intermediate container 7c24a76596df
 ---> 2d8c0e23cf9c
Step 3/16 : RUN apt-get install -yqq     curl cmake git     unzip jq libsndfile-dev
 ---> Running in a7ae4e0f8ae4
debconf: delaying package configuration, since apt-utils is not installed
Selecting previously unselected package cmake-data.
(Reading database ... 23378 files and directories currently installed.)
Preparing to unpack .../00-cmake-data_3.18.4-2_all.deb ...
Unpacking cmake-data (3.18.4-2) ...
Selecting previously unselected package libarchive13:amd64.
Preparing to unpack .../01-libarchive13_3.4.3-2+b1_amd64.deb ...
Unpacking libarchive13:amd64 (3.4.3-2+b1) ...
Selecting previously unselected package libjsoncpp24:amd64.
Preparing to unpack .../02-libjsoncpp24_1.9.4-4_amd64.deb ...
Unpacking libjsoncpp24:amd64 (1.9.4-4) ...
Selecting previously unselected package librhash0:amd64.
Preparing to unpack .../03-librhash0_1.4.1-2_amd64.deb ...
Unpacking librhash0:amd64 (1.4.1-2) ...
Selecting previously unselected package libuv1:amd64.
Preparing to unpack .../04-libuv1_1.40.0-2_amd64.deb ...
Unpacking libuv1:amd64 (1.40.0-2) ...
Selecting previously unselected package cmake.
Preparing to unpack .../05-cmake_3.18.4-2_amd64.deb ...
Unpacking cmake (3.18.4-2) ...
Selecting previously unselected package libonig5:amd64.
Preparing to unpack .../06-libonig5_6.9.6-1.1_amd64.deb ...
Unpacking libonig5:amd64 (6.9.6-1.1) ...
Selecting previously unselected package libjq1:amd64.
Preparing to unpack .../07-libjq1_1.6-2.1_amd64.deb ...
Unpacking libjq1:amd64 (1.6-2.1) ...
Selecting previously unselected package jq.
Preparing to unpack .../08-jq_1.6-2.1_amd64.deb ...
Unpacking jq (1.6-2.1) ...
Selecting previously unselected package libogg0:amd64.
Preparing to unpack .../09-libogg0_1.3.4-0.1_amd64.deb ...
Unpacking libogg0:amd64 (1.3.4-0.1) ...
Selecting previously unselected package libflac8:amd64.
Preparing to unpack .../10-libflac8_1.3.3-2_amd64.deb ...
Unpacking libflac8:amd64 (1.3.3-2) ...
Selecting previously unselected package libogg-dev:amd64.
Preparing to unpack .../11-libogg-dev_1.3.4-0.1_amd64.deb ...
Unpacking libogg-dev:amd64 (1.3.4-0.1) ...
Selecting previously unselected package libflac-dev:amd64.
Preparing to unpack .../12-libflac-dev_1.3.3-2_amd64.deb ...
Unpacking libflac-dev:amd64 (1.3.3-2) ...
Selecting previously unselected package libopus0:amd64.
Preparing to unpack .../13-libopus0_1.3.1-0.1_amd64.deb ...
Unpacking libopus0:amd64 (1.3.1-0.1) ...
Selecting previously unselected package libopus-dev:amd64.
Preparing to unpack .../14-libopus-dev_1.3.1-0.1_amd64.deb ...
Unpacking libopus-dev:amd64 (1.3.1-0.1) ...
Selecting previously unselected package libvorbis0a:amd64.
Preparing to unpack .../15-libvorbis0a_1.3.7-1_amd64.deb ...
Unpacking libvorbis0a:amd64 (1.3.7-1) ...
Selecting previously unselected package libvorbisenc2:amd64.
Preparing to unpack .../16-libvorbisenc2_1.3.7-1_amd64.deb ...
Unpacking libvorbisenc2:amd64 (1.3.7-1) ...
Selecting previously unselected package libsndfile1:amd64.
Preparing to unpack .../17-libsndfile1_1.0.31-2_amd64.deb ...
Unpacking libsndfile1:amd64 (1.0.31-2) ...
Selecting previously unselected package libvorbisfile3:amd64.
Preparing to unpack .../18-libvorbisfile3_1.3.7-1_amd64.deb ...
Unpacking libvorbisfile3:amd64 (1.3.7-1) ...
Selecting previously unselected package libvorbis-dev:amd64.
Preparing to unpack .../19-libvorbis-dev_1.3.7-1_amd64.deb ...
Unpacking libvorbis-dev:amd64 (1.3.7-1) ...
Selecting previously unselected package libsndfile1-dev:amd64.
Preparing to unpack .../20-libsndfile1-dev_1.0.31-2_amd64.deb ...
Unpacking libsndfile1-dev:amd64 (1.0.31-2) ...
Setting up libogg0:amd64 (1.3.4-0.1) ...
Setting up libarchive13:amd64 (3.4.3-2+b1) ...
Setting up libogg-dev:amd64 (1.3.4-0.1) ...
Setting up libflac8:amd64 (1.3.3-2) ...
Setting up libuv1:amd64 (1.40.0-2) ...
Setting up libopus0:amd64 (1.3.1-0.1) ...
Setting up libvorbis0a:amd64 (1.3.7-1) ...
Setting up libjsoncpp24:amd64 (1.9.4-4) ...
Setting up librhash0:amd64 (1.4.1-2) ...
Setting up cmake-data (3.18.4-2) ...
Setting up libonig5:amd64 (6.9.6-1.1) ...
Setting up libvorbisenc2:amd64 (1.3.7-1) ...
Setting up libflac-dev:amd64 (1.3.3-2) ...
Setting up libjq1:amd64 (1.6-2.1) ...
Setting up libopus-dev:amd64 (1.3.1-0.1) ...
Setting up libvorbisfile3:amd64 (1.3.7-1) ...
Setting up jq (1.6-2.1) ...
Setting up cmake (3.18.4-2) ...
Setting up libsndfile1:amd64 (1.0.31-2) ...
Setting up libvorbis-dev:amd64 (1.3.7-1) ...
Setting up libsndfile1-dev:amd64 (1.0.31-2) ...
Processing triggers for libc-bin (2.31-13) ...
Removing intermediate container a7ae4e0f8ae4
 ---> a7966e2b7901
Step 4/16 : RUN curl -sLO https://download.pytorch.org/libtorch/cu111/libtorch-cxx11-abi-shared-with-deps-1.9.0%2Bcu111.zip
 ---> Running in 3947c976d793
Removing intermediate container 3947c976d793
 ---> 4832ce043dd1
Step 5/16 : RUN unzip -q libtorch*.zip && rm libtorch*.zip
 ---> Running in a3fed2e9028a
Removing intermediate container a3fed2e9028a
 ---> c1d9b9ae0128
Step 6/16 : RUN ln -s /libtorch/lib/libnvToolsExt-24de1d56.so.1 /libtorch/lib/libnvToolsExt.so.1
 ---> Running in ae75c00b2e77
Removing intermediate container ae75c00b2e77
 ---> 44dde57e2b52
Step 7/16 : RUN ln -s /libtorch/lib/libcudart-6d56b25a.so.11.0 /libtorch/lib/libcudart.so.11.0
 ---> Running in 6a796643c4ed
Removing intermediate container 6a796643c4ed
 ---> 437d442ee925
Step 8/16 : ENV LD_LIBRARY_PATH="${LD_LIBRARY_PATH}:/libtorch/lib/"
 ---> Running in 4d54069b96a7
Removing intermediate container 4d54069b96a7
 ---> a21729f22c25
Step 9/16 : RUN git clone -q --depth 1 https://github.com/Hiroshiba/voicevox_core
 ---> Running in d0d0f1af9c7a
Removing intermediate container d0d0f1af9c7a
 ---> 1215e4ffa813
Step 10/16 : WORKDIR voicevox_core/example/python
 ---> Running in 29a30c7f3856
Removing intermediate container 29a30c7f3856
 ---> 67f3a3198f59
Step 11/16 : RUN curl -sLO "`curl -s https://api.github.com/repos/Hiroshiba/voicevox_core/releases/latest     | jq -r '.assets[]|select(.name=="core.zip")|.browser_download_url'`"
 ---> Running in 59e4683450b3
Removing intermediate container 59e4683450b3
 ---> cb0efc46986b
Step 12/16 : RUN unzip -q core.zip && rm core.zip
 ---> Running in a53492faa972
Removing intermediate container a53492faa972
 ---> c80a84b1c011
Step 13/16 : RUN mv core/* .
 ---> Running in 8204a66a0129
Removing intermediate container 8204a66a0129
 ---> 64dedb092d09
Step 14/16 : RUN pip install -U pip && pip install -q -r requirements.txt
 ---> Running in 40de6bf2c592
Requirement already satisfied: pip in /usr/local/lib/python3.9/site-packages (21.2.4)
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
Removing intermediate container 40de6bf2c592
 ---> 12122b829fc0
Step 15/16 : RUN python -c 'import pyopenjtalk;pyopenjtalk._lazy_init()'
 ---> Running in 0f93ae8838b0
Downloading: "https://downloads.sourceforge.net/open-jtalk/open_jtalk_dic_utf_8-1.11.tar.gz"
Extracting tar file /usr/local/lib/python3.9/site-packages/pyopenjtalk/dic.tar.gz
Removing intermediate container 0f93ae8838b0
 ---> 768ece317b52
Step 16/16 : RUN LIBRARY_PATH="$LIBRARY_PATH:." python setup.py install
 ---> Running in c6af5e0427ab
/usr/local/lib/python3.9/site-packages/Cython/Compiler/Main.py:369: FutureWarning: Cython directive 'language_level' not set, using 2 for now (Py2). This will change in a later release! File: /voicevox_core/example/python/core.pxd
  tree = Parsing.p_module(s, pxd, full_module_name)
Compiling core.pyx because it changed.
[1/1] Cythonizing core.pyx
running install
running build
running build_ext
building 'core' extension
creating build
creating build/temp.linux-x86_64-3.9
gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -fPIC -I/usr/local/lib/python3.9/site-packages/numpy/core/include -I/usr/local/include/python3.9 -c core.cpp -o build/temp.linux-x86_64-3.9/core.o
In file included from /usr/local/lib/python3.9/site-packages/numpy/core/include/numpy/ndarraytypes.h:1944,
                 from /usr/local/lib/python3.9/site-packages/numpy/core/include/numpy/ndarrayobject.h:12,
                 from /usr/local/lib/python3.9/site-packages/numpy/core/include/numpy/arrayobject.h:4,
                 from core.cpp:654:
/usr/local/lib/python3.9/site-packages/numpy/core/include/numpy/npy_1_7_deprecated_api.h:17:2: warning: #warning "Using deprecated NumPy API, disable it with " "#define NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION" [-Wcpp]
   17 | #warning "Using deprecated NumPy API, disable it with " \
      |  ^~~~~~~
creating build/lib.linux-x86_64-3.9
g++ -pthread -shared build/temp.linux-x86_64-3.9/core.o -L/usr/local/lib -lcore -o build/lib.linux-x86_64-3.9/core.cpython-39-x86_64-linux-gnu.so
running install_lib
copying build/lib.linux-x86_64-3.9/core.cpython-39-x86_64-linux-gnu.so -> /usr/local/lib/python3.9/site-packages
running install_egg_info
Writing /usr/local/lib/python3.9/site-packages/core-0.0.0-py3.9.egg-info
Removing intermediate container c6af5e0427ab
 ---> dfd96042c88c
Successfully built dfd96042c88c
Successfully tagged voicevox_core:latest
$ mkdir /tmp/voice
$ docker run -it -v /tmp/voice:/root/voice voicevox_core bash
root@420cf51a9161:/voicevox_core/example/python# python run.py --text おはようございます --speaker_id 1
[W BinaryOps.cpp:467] Warning: floor_divide is deprecated, and will be removed in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values.
To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor'). (function operator())
root@420cf51a9161:/voicevox_core/example/python# ls *wav
おはようございます-1.wav
root@420cf51a9161:/voicevox_core/example/python# mv *wav /root/voice/
root@420cf51a9161:/voicevox_core/example/python#
exit
$ aplay /tmp/voice/おはようございます-1.wav
再生中 WAVE '/tmp/voice/おはようございます-1.wav' : Signed 16 bit Little Endian, レート 24000 Hz, モノラル

はい一丁上がり！

。。。

記事として書いてみたらサクサク動いてるように見えますが実際にはビルドに時間が結構掛かってるので試してみたい人は時間つぶしできるものを用意しときましょう

やってる内容はgithubに載ってた通りの事やってるだけなんですが
「おはようございます」を合成するのに結構な時間がかかりました

# time python run.py --text おはようございます --speaker_id 1
[W BinaryOps.cpp:467] Warning: floor_divide is deprecated, and will be removed in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values.
To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor'). (function operator())

real    0m32.512s
user    0m36.572s
sys     0m8.501s

まぁWindows上のVMWare上のLinux上のDockerというマトリョーシカみたいな構成の中で動かしてるのでここらへんは環境によって差が出そうですがそこそこのスペックがないと厳しそうですね

GPUが使えたら大分改善するのかな？

とはいえリアルタイムでレスポンス欲しいときには使うの難しそうなので動画編集時に解説音声とか付ける的な用途等に使えそうですね

ちなみにDockerに限った話じゃないでしょうが結構容量やられます

$ docker images voicevox_core
REPOSITORY      TAG       IMAGE ID       CREATED       SIZE
voicevox_core   latest    dfd96042c88c   2 hours ago   9.73GB

Dockerで動くんだーって安易にビルドしたら10G位持ってかれました

それでは良い音声合成ライフを〜

しゃみしゃっきりー

P.S.

基本CLIで使いたい派なので今回は触りませんでしたがフロントエンド側も結構変わってて設定とかも保存できるようになってるみたいですね

どんどん改良されているみたいで今は厳しそうですがある程度成熟して安定した段階でopenjtalkみたいに何かしらのパッケージマネージャからインストール出来るようになってくれたらいいなぁ。。。

あとwavファイル固定じゃなくて標準出力にデータ流すモードもあったらいいなぁ。。。

P.S.

Discussion