😊

voicevox_coreをpythonを使って触る。

2023/08/11に公開

github上にある方法でいろいろ調べながらやってみたがなかなかうまくいかなかったので、
環境はwsl2 ubuntu20.04
voicevox0.14.4のcpu版をインストールします。

curl -sSfL https://raw.githubusercontent.com/VOICEVOX/voicevox_core/8cf307df4412dc0db0b03c6957b83b032770c31a/scripts/downloads/download.sh | bash -s
cd voicevox_core/
wget https://github.com/VOICEVOX/voicevox_core/releases/download/0.14.4/voicevox_core-0.14.4+cpu-cp38-abi3-linux_x86_64.whl
pip install voicevox_core-0.14.4+cpu-cp38-abi3-linux_x86_64.whl
wget https://raw.githubusercontent.com/VOICEVOX/voicevox_core/406f6c41408836840b9a38489d0f670fb960f412/example/python/run.py

これでインストールできるはず。

読み上げの方法が少し分からず困惑した。
CLIだとこんなので動かせた。

python3 run.py --mode AUTO --dict-dir ./open_jtalk_dic_utf_8-1.11 --text "こんにちは、ずんだもんなのだ。" --out ./output1.wav --speaker-id 26

pythonファイルで動かす場合こういうので動いた。

from pathlib import Path
import voicevox_core
from voicevox_core import AccelerationMode, AudioQuery, VoicevoxCore
from playsound import playsound

SPEAKER_ID = 26

open_jtalk_dict_dir = './open_jtalk_dic_utf_8-1.11'
text = 'こんにちはかすかべつむぎです。'
out = Path('output2.wav')
acceleration_mode = AccelerationMode.AUTO

def main() -> None:
    core = VoicevoxCore(
        acceleration_mode=acceleration_mode, open_jtalk_dict_dir=open_jtalk_dict_dir
    )
    core.load_model(SPEAKER_ID)
    audio_query = core.audio_query(text, SPEAKER_ID)
    wav = core.synthesis(audio_query, SPEAKER_ID)
    out.write_bytes(wav)
    playsound(out)


if __name__ == "__main__":
    main()

from pathlib import Path
from voicevox_core import VoicevoxCore, METAS

core = VoicevoxCore(open_jtalk_dict_dir=Path("open_jtalk_dic_utf_8-1.11"))

speaker_id = 23

text = "ディ'イプ/ラ'アニングワ/バンノオヤクデワアリマセ'ン"
if not core.is_model_loaded(speaker_id):
    core.load_model(speaker_id)
wave_bytes = core.tts(text, speaker_id, kana=True)
with open("output3.wav", "wb") as f:
    f.write(wave_bytes)

3つめの読み上げはアクセントを自分でつけられる

voicevox_coreをpythonを使って触る。

参考

Discussion