Closed6ヶ月前にクローズ8

Raspberry Piでも動くローカルTTS「Piper」を試す

Raspberry Pi

TTS

Text to Speech

piper

kun432

GitHubレポジトリ
https://github.com/rhasspy/piper

 Piper高速かつローカルで動作するニューラル音声合成システムで、優れた音質を実現し、Raspberry Pi 4向けに最適化されています。

Piperはさまざまなプロジェクトで使用されています。
音声サンプルを聴く

Thorsten Müllerによるビデオチュートリアル
音声モデルはVITSで訓練され、onnxruntime形式にエクスポートされています。

 音声私たちの目標は、Home AssistantおよびYear of Voiceをサポートすることです。
対応言語の音声をダウンロードできます：
アラビア語（ar_JO）
カタルーニャ語（ca_ES）
チェコ語（cs_CZ）
ウェールズ語（cy_GB）
デンマーク語（da_DK）
ドイツ語（de_DE）
ギリシャ語（el_GR）
英語（en_GB, en_US）
スペイン語（es_ES, es_MX）
フィンランド語（fi_FI）
フランス語（fr_FR）
ハンガリー語（hu_HU）
アイスランド語（is_IS）
イタリア語（it_IT）
グルジア語（ka_GE）
カザフ語（kk_KZ）
ルクセンブルク語（lb_LU）
ネパール語（ne_NP）
オランダ語（nl_BE, nl_NL）
ノルウェー語（no_NO）
ポーランド語（pl_PL）
ポルトガル語（pt_BR, pt_PT）
ルーマニア語（ro_RO）
ロシア語（ru_RU）
セルビア語（sr_RS）
スウェーデン語（sv_SE）
スワヒリ語（sw_CD）
トルコ語（tr_TR）
ウクライナ語（uk_UA）
ベトナム語（vi_VN）
中国語（zh_CN）
各音声には以下の2つのファイルが必要です：

.onnxモデルファイル（例：en_US-lessac-medium.onnx）

.onnx.json設定ファイル（例：en_US-lessac-medium.onnx.json）
各音声に含まれるMODEL_CARDファイルには、重要なライセンス情報が記載されています。

Piperは音声合成研究を目的としており、音声モデルに対して追加の制限を設けていません。ただし、一部の音声には制限付きライセンスがある場合があるため、必ず確認してください。

 Piperを使用しているプロジェクトPiperは以下のプロジェクト・論文で使用されています：
Home Assistant
Rhasspy 3
NVDA - NonVisual Desktop Access
Image Captioning for the Visually Impaired and Blind: A Recipe for Low-Resource Languages
Open Voice Operating System
JetsonGPT
LocalAI
Lernstick EDU / EXAM: 読み上げと多言語対応
Natural Speech - Runelite用音声プラグイン
mintPiper
Vim-Piper

kun432

インストール

Piperは以下に対応している

amd64 (64-bit desktop Linux)
arm64 (64-bit Raspberry Pi 4)
armv7 (32-bit Raspberry Pi 3/4)

今回はちょうど手元にRaspberry Pi 4があるので、それで試してみる。

uvで環境作成。Python-3.12だとエラー（現在の環境にインストールできるpiper-phonemize==1.1.0がないと言われる）が出て、インストールできない。Python-3.11なら問題ない。

uv init -p 3.11 piper-work && cd piper-work
uv venv

パッケージ追加

uv add piper-tts

出力

(snip)
 + onnxruntime==1.22.0
(snip)
 + piper-phonemize==1.1.0
 + piper-tts==1.2.0
(snip)

これでCLIが使えるようになる。Usageを確認。

uv run piper --help

出力

usage: piper [-h] -m MODEL [-c CONFIG] [-f OUTPUT_FILE] [-d OUTPUT_DIR] [--output-raw]
             [-s SPEAKER] [--length-scale LENGTH_SCALE] [--noise-scale NOISE_SCALE]
             [--noise-w NOISE_W] [--cuda] [--sentence-silence SENTENCE_SILENCE]
             [--data-dir DATA_DIR] [--download-dir DOWNLOAD_DIR] [--update-voices] [--debug]

options:
  -h, --help            show this help message and exit
  -m MODEL, --model MODEL
                        Path to Onnx model file
  -c CONFIG, --config CONFIG
                        Path to model config file
  -f OUTPUT_FILE, --output-file OUTPUT_FILE, --output_file OUTPUT_FILE
                        Path to output WAV file (default: stdout)
  -d OUTPUT_DIR, --output-dir OUTPUT_DIR, --output_dir OUTPUT_DIR
                        Path to output directory (default: cwd)
  --output-raw, --output_raw
                        Stream raw audio to stdout
  -s SPEAKER, --speaker SPEAKER
                        Id of speaker (default: 0)
  --length-scale LENGTH_SCALE, --length_scale LENGTH_SCALE
                        Phoneme length
  --noise-scale NOISE_SCALE, --noise_scale NOISE_SCALE
                        Generator noise
  --noise-w NOISE_W, --noise_w NOISE_W
                        Phoneme width noise
  --cuda                Use GPU
  --sentence-silence SENTENCE_SILENCE, --sentence_silence SENTENCE_SILENCE
                        Seconds of silence after each sentence
  --data-dir DATA_DIR, --data_dir DATA_DIR
                        Data directory to check for downloaded models (default: current
                        directory)
  --download-dir DOWNLOAD_DIR, --download_dir DOWNLOAD_DIR
                        Directory to download voices into (default: first data dir)
  --update-voices       Download latest voices.json during startup
  --debug               Print DEBUG messages to console

実行

echo 'Welcome to the world of speech synthesis!' | \
    uv run piper --model en_US-lessac-medium --output_file welcome.wav

モデルは自動で実行したパスにダウンロードされる様子。

出力

INFO:piper.download:Downloaded /home/kun432/piper-work/en_US-lessac-medium.onnx (https://huggingface.co/rhasspy/piper-voices/resolve/v1.0.0/en/en_US/lessac/medium/en_US-lessac-medium.onnx)
INFO:piper.download:Downloaded /home/kun432/piper-work/en_US-lessac-medium.onnx.json (https://huggingface.co/rhasspy/piper-voices/resolve/v1.0.0/en/en_US/lessac/medium/en_US-lessac-medium.onnx.json)

生成されたWAVファイルを再生。

aplay welcome.wav

余談だが、再度実行すると、モデルはダウンロード済みなのに再ダウンロードしているような気がする。

ls -lt en_US-lessac-medium.onnx*

出力

-rw-rw-r-- 1 kun432 kun432     4885 May 17 20:22 en_US-lessac-medium.onnx.json
-rw-rw-r-- 1 kun432 kun432 63201294 May 17 20:02 en_US-lessac-medium.onnx

再実行

echo 'Welcome to the world of speech synthesis!' | \
    uv run piper --model en_US-lessac-medium --output_file welcome.wav

出力

WARNING:piper.download:Wrong size (expected=7010, actual=4885) for /home/kun432/piper-work/en_US-lessac-medium.onnx.json
INFO:piper.download:Downloaded /home/kun432/piper-work/en_US-lessac-medium.onnx.json (https://huggingface.co/rhasspy/piper-voices/resolve/v1.0.0/en/en_US/lessac/medium/en_US-lessac-medium.onnx.json)

ls -lt en_US-lessac-medium.onnx*

出力

-rw-rw-r-- 1 kun432 kun432     4885 May 17 20:24 en_US-lessac-medium.onnx.json
-rw-rw-r-- 1 kun432 kun432 63201294 May 17 20:02 en_US-lessac-medium.onnx

なんかJSONファイルのサイズが違うとか。

どうやらこの辺？

piper --update-voices -m モデル名を実行すれば良いらしい。ただこれ単体で実行するとなんか標準入力待ちになる。モデルを初めて使うときにだけ--update-voices をつけて実行すれば良い

echo 'Welcome to the world of speech synthesis!' | \
     uv run piper --update-voices --model en_US-lessac-medium --output_file welcome.wav

以降は先程のエラーは出力されなくなる。

ストリームで出力することもできる。--output-rawを使う。

echo 'This sentence is spoken first. This sentence is synthesized while the first sentence is spoken.' | \
  uv run piper --model en_US-lessac-medium --output-raw | \
  aplay -r 22050 -f S16_LE -t raw -

あとはJSONで入力することもできるみたい。

2025/06/05追記

Python経由ではなく、配布されているバイナリでやったほうが速い。

kun432

トレーニング用のドキュメントがある
https://github.com/rhasspy/piper/blob/master/TRAINING.md
これ見てる限りはespeak-ng使ってるっぽいので（精度云々は置いといて）日本語もできそうなもの、と思うのだけど、Issueがこれだけあって実際にトライしている人がいてもまだ実現してないので、まあそんな簡単なものではないということかな。
https://github.com/rhasspy/piper/issues?q=is%3Aissue  japanese

kun432

Pythonから扱いたい場合はこのあたりが参考になる
https://noerguerra.com/how-to-read-text-aloud-with-piper-and-python/
https://github.com/broadfield-dev/PyPiperTTS
https://github.com/Reqeique/Dimits

kun432

最近のモデルに比べればいかにも機械的ではあるのだけども、簡易に使えるという点ではフィットするユースケースはありそう。

kun432

音声はこちらで確認できる
https://rhasspy.github.io/piper-samples/
音声モデルごとにライセンスは異なるようなので、ここで確認するのがわかりやすい。

kun432

めっちゃすごいのでは？

kun432

なんかレポジトリや名前が変わったみたい

https://github.com/OHF-Voice/piper1-gpl

このスクラップは6ヶ月前にクローズされました