音声生成モデルのCosyVoiceを使ってみる
多言語の音声合成モデルCosyVoice2
アリババグループが開発している多言語の大規模音声合成モデルCosyVoice 2を試してみました。
ゼロショットインコンテキスト生成、クロスランゲージインコンテキスト生成、感情豊かな音声生成など、多様な音声生成シナリオに対応しているとのこと。
次の特徴があります。
-
超低遅延
CosyVoice 2はオフラインとストリーミングの両方のモデリングを統合し、双方向のストリーミング音声合成をサポートします。初期のパケット合成レイテンシは150msの低遅延で、品質低下も最小限に抑えられるとのことです。 -
高精度
バージョン1.0と比べて発音エラーを30-50%削減し、Seed-TTS評価セットのハードテストセットで最も低い文字エラー率を達成しています。 -
安定性
ゼロショット音声生成やクロスランゲージ音声合成において、一貫性のある優れた声質を得ることができます。バージョン1.0と比較してクロスランゲージ合成が大幅に改善されています。 -
自然な音声
CosyVoice 2では韻律、音質、感情の整合性が大幅に向上しました。MOS評価スコアは5.4から5.53に上昇し、商用レベルの大規模音声合成モデルと同等のスコアを達成しています。また、より細かい感情制御や方言アクセント調整をサポートするために、制御可能な音声合成機能がアップグレードされています。
手元のmacで実行してみます
準備
1. リポジトリをクローンする
git clone --recursive https://github.com/FunAudioLLM/CosyVoice.git && cd "$(basename "$_" .git)"
ysic@m4macmini ~ % git clone --recursive https://github.com/FunAudioLLM/CosyVoice.git && cd "$(basename "$_" .git)"
Cloning into 'CosyVoice'...
remote: Enumerating objects: 1801, done.
remote: Counting objects: 100% (1001/1001), done.
remote: Compressing objects: 100% (377/377), done.
remote: Total 1801 (delta 831), reused 627 (delta 624), pack-reused 800 (from 2)
Receiving objects: 100% (1801/1801), 1.49 MiB | 14.85 MiB/s, done.
Resolving deltas: 100% (1119/1119), done.
Submodule 'third_party/Matcha-TTS' (https://github.com/shivammehta25/Matcha-TTS.git) registered for path 'third_party/Matcha-TTS'
Cloning into '/Users/ysic/CosyVoice/third_party/Matcha-TTS'...
remote: Enumerating objects: 1068, done.
remote: Counting objects: 100% (480/480), done.
remote: Compressing objects: 100% (171/171), done.
remote: Total 1068 (delta 385), reused 318 (delta 309), pack-reused 588 (from 2)
Receiving objects: 100% (1068/1068), 64.11 MiB | 22.33 MiB/s, done.
Resolving deltas: 100% (517/517), done.
Submodule path 'third_party/Matcha-TTS': checked out 'dd9105b34bf2be2230f4aa1e4769fb586a3c824e'
2. Minicondaをインストール
brew install --cask miniconda
ysic@m4macmini CosyVoice % brew install --cask miniconda
==> Downloading https://formulae.brew.sh/api/cask.jws.json
==> Caveats
Please run the following to setup your shell:
conda init "$(basename "${SHELL}")"
Alternatively, manually add the following to your shell init:
eval "$(conda "shell.$(basename "${SHELL}")" hook)"
==> Downloading https://repo.anaconda.com/miniconda/Miniconda3-py312_24.11.1-0-MacOSX-arm64.sh
Already downloaded: /Users/ysic/Library/Caches/Homebrew/downloads/855aebb3ebaf629877f030f6fd69b50da4d9a930f0a07c0ec81f47c292298d18--Miniconda3-py312_24.11.1-0-MacOSX-arm64.sh
==> Installing Cask miniconda
==> Running installer script 'Miniconda3-py312_24.11.1-0-MacOSX-arm64.sh'
PREFIX=/opt/homebrew/Caskroom/miniconda/base
Unpacking payload ...
Installing base environment...
Preparing transaction: ...working... done
Executing transaction: ...working...
done
installation finished.
==> Linking Binary 'conda' to '/opt/homebrew/bin/conda'
🍺 miniconda was successfully installed!
3. condaをシェルで有効化します
eval "$(conda "shell.$(basename "${SHELL}")" hook)"
4. condaで仮想環境を作成します
conda create -n cosyvoice -y python=3.10
(base) ysic@m4macmini CosyVoice % conda create -n cosyvoice -y python=3.10
Channels:
- defaults
Platform: osx-arm64
Collecting package metadata (repodata.json): done
Solving environment: done
## Package Plan ##
environment location: /opt/homebrew/Caskroom/miniconda/base/envs/cosyvoice
added / updated specs:
- python=3.10
The following packages will be downloaded:
package | build
---------------------------|-----------------
ca-certificates-2024.12.31 | hca03da5_0 129 KB
pip-25.0 | py310hca03da5_0 2.3 MB
python-3.10.16 | hb885b13_1 12.0 MB
setuptools-75.8.0 | py310hca03da5_0 1.6 MB
tzdata-2025a | h04d1e81_0 117 KB
wheel-0.45.1 | py310hca03da5_0 116 KB
------------------------------------------------------------
Total: 16.3 MB
The following NEW packages will be INSTALLED:
bzip2 pkgs/main/osx-arm64::bzip2-1.0.8-h80987f9_6
ca-certificates pkgs/main/osx-arm64::ca-certificates-2024.12.31-hca03da5_0
libffi pkgs/main/osx-arm64::libffi-3.4.4-hca03da5_1
ncurses pkgs/main/osx-arm64::ncurses-6.4-h313beb8_0
openssl pkgs/main/osx-arm64::openssl-3.0.15-h80987f9_0
pip pkgs/main/osx-arm64::pip-25.0-py310hca03da5_0
python pkgs/main/osx-arm64::python-3.10.16-hb885b13_1
readline pkgs/main/osx-arm64::readline-8.2-h1a28f6b_0
setuptools pkgs/main/osx-arm64::setuptools-75.8.0-py310hca03da5_0
sqlite pkgs/main/osx-arm64::sqlite-3.45.3-h80987f9_0
tk pkgs/main/osx-arm64::tk-8.6.14-h6ba3021_0
tzdata pkgs/main/noarch::tzdata-2025a-h04d1e81_0
wheel pkgs/main/osx-arm64::wheel-0.45.1-py310hca03da5_0
xz pkgs/main/osx-arm64::xz-5.4.6-h80987f9_1
zlib pkgs/main/osx-arm64::zlib-1.2.13-h18a0788_1
Downloading and Extracting Packages:
Preparing transaction: done
Verifying transaction: done
Executing transaction: done
#
# To activate this environment, use
#
# $ conda activate cosyvoice
#
# To deactivate an active environment, use
#
# $ conda deactivate
5. 仮想環境に入ります
conda activate cosyvoice
6. conda-forgeからpyniniをインストールします
conda install -y -c conda-forge pynini==2.1.5
(cosyvoice) ysic@m4macmini CosyVoice % conda install -y -c conda-forge pynini==2.1.5
Channels:
- conda-forge
- defaults
Platform: osx-arm64
Collecting package metadata (repodata.json): done
Solving environment: done
## Package Plan ##
environment location: /opt/homebrew/Caskroom/miniconda/base/envs/cosyvoice
added / updated specs:
- pynini==2.1.5
The following packages will be downloaded:
package | build
---------------------------|-----------------
atk-1.0-2.38.0 | hd03087b_2 339 KB conda-forge
cairo-1.18.2 | h6a3b0d2_1 874 KB conda-forge
font-ttf-dejavu-sans-mono-2.37| hab24e00_0 388 KB conda-forge
font-ttf-inconsolata-3.000 | h77eed37_0 94 KB conda-forge
font-ttf-source-code-pro-2.038| h77eed37_0 684 KB conda-forge
font-ttf-ubuntu-0.83 | h77eed37_3 1.5 MB conda-forge
fontconfig-2.15.0 | h1383a14_1 229 KB conda-forge
fonts-conda-ecosystem-1 | 0 4 KB conda-forge
fonts-conda-forge-1 | 0 4 KB conda-forge
freetype-2.12.1 | hadb7bae_2 582 KB conda-forge
fribidi-1.0.10 | h27ca646_0 59 KB conda-forge
gdk-pixbuf-2.42.12 | h7ddc832_0 498 KB conda-forge
graphite2-1.3.13 | hebf3989_1003 78 KB conda-forge
graphviz-12.0.0 | hbf8cc41_0 4.8 MB conda-forge
gtk2-2.24.33 | hc5c4cae_7 5.9 MB conda-forge
gts-0.7.6 | he42f4ea_4 297 KB conda-forge
harfbuzz-10.2.0 | ha0dd535_0 1.4 MB conda-forge
icu-75.1 | hfee45f7_0 11.3 MB conda-forge
lerc-4.0.0 | h9a09cb3_0 211 KB conda-forge
libcxx-19.1.7 | ha82da77_0 511 KB conda-forge
libdeflate-1.22 | hd74edd7_0 53 KB conda-forge
libexpat-2.6.4 | h286801f_0 63 KB conda-forge
libgd-2.3.3 | hb2c3a21_11 153 KB conda-forge
libglib-2.82.2 | hdff4504_1 3.5 MB conda-forge
libiconv-1.17 | h0d3ecfb_2 661 KB conda-forge
libintl-0.22.5 | h8414b35_3 79 KB conda-forge
libjpeg-turbo-3.0.0 | hb547adb_1 535 KB conda-forge
libpng-1.6.46 | h3783ad8_0 260 KB conda-forge
librsvg-2.58.4 | h266df6f_2 4.5 MB conda-forge
libsqlite-3.45.2 | h091b4b1_0 806 KB conda-forge
libtiff-4.7.0 | hfce79cd_1 358 KB conda-forge
libwebp-base-1.5.0 | h2471fea_0 283 KB conda-forge
libxml2-2.13.5 | hbbdcc80_0 569 KB conda-forge
libzlib-1.3.1 | h8359307_2 45 KB conda-forge
openfst-1.8.2 | hdb0ca01_2 6.1 MB conda-forge
openssl-3.4.0 | h81ee809_1 2.8 MB conda-forge
pango-1.56.1 | h73f1e88_0 414 KB conda-forge
pcre2-10.44 | h297a79d_2 604 KB conda-forge
pixman-0.44.2 | h2f9eb0b_0 196 KB conda-forge
pynini-2.1.5 | py310h38f39d4_6 1.1 MB conda-forge
python-3.10.13 |h2469fbe_1_cpython 11.1 MB conda-forge
python_abi-3.10 | 5_cp310 6 KB conda-forge
sqlite-3.45.2 | hf2abe2d_0 793 KB conda-forge
tk-8.6.13 | h5083fa2_1 3.0 MB conda-forge
zlib-1.3.1 | h8359307_2 76 KB conda-forge
zstd-1.5.6 | hb46c0d2_0 396 KB conda-forge
------------------------------------------------------------
Total: 68.0 MB
The following NEW packages will be INSTALLED:
atk-1.0 conda-forge/osx-arm64::atk-1.0-2.38.0-hd03087b_2
cairo conda-forge/osx-arm64::cairo-1.18.2-h6a3b0d2_1
font-ttf-dejavu-s~ conda-forge/noarch::font-ttf-dejavu-sans-mono-2.37-hab24e00_0
font-ttf-inconsol~ conda-forge/noarch::font-ttf-inconsolata-3.000-h77eed37_0
font-ttf-source-c~ conda-forge/noarch::font-ttf-source-code-pro-2.038-h77eed37_0
font-ttf-ubuntu conda-forge/noarch::font-ttf-ubuntu-0.83-h77eed37_3
fontconfig conda-forge/osx-arm64::fontconfig-2.15.0-h1383a14_1
fonts-conda-ecosy~ conda-forge/noarch::fonts-conda-ecosystem-1-0
fonts-conda-forge conda-forge/noarch::fonts-conda-forge-1-0
freetype conda-forge/osx-arm64::freetype-2.12.1-hadb7bae_2
fribidi conda-forge/osx-arm64::fribidi-1.0.10-h27ca646_0
gdk-pixbuf conda-forge/osx-arm64::gdk-pixbuf-2.42.12-h7ddc832_0
graphite2 conda-forge/osx-arm64::graphite2-1.3.13-hebf3989_1003
graphviz conda-forge/osx-arm64::graphviz-12.0.0-hbf8cc41_0
gtk2 conda-forge/osx-arm64::gtk2-2.24.33-hc5c4cae_7
gts conda-forge/osx-arm64::gts-0.7.6-he42f4ea_4
harfbuzz conda-forge/osx-arm64::harfbuzz-10.2.0-ha0dd535_0
icu conda-forge/osx-arm64::icu-75.1-hfee45f7_0
lerc conda-forge/osx-arm64::lerc-4.0.0-h9a09cb3_0
libcxx conda-forge/osx-arm64::libcxx-19.1.7-ha82da77_0
libdeflate conda-forge/osx-arm64::libdeflate-1.22-hd74edd7_0
libexpat conda-forge/osx-arm64::libexpat-2.6.4-h286801f_0
libgd conda-forge/osx-arm64::libgd-2.3.3-hb2c3a21_11
libglib conda-forge/osx-arm64::libglib-2.82.2-hdff4504_1
libiconv conda-forge/osx-arm64::libiconv-1.17-h0d3ecfb_2
libintl conda-forge/osx-arm64::libintl-0.22.5-h8414b35_3
libjpeg-turbo conda-forge/osx-arm64::libjpeg-turbo-3.0.0-hb547adb_1
libpng conda-forge/osx-arm64::libpng-1.6.46-h3783ad8_0
librsvg conda-forge/osx-arm64::librsvg-2.58.4-h266df6f_2
libsqlite conda-forge/osx-arm64::libsqlite-3.45.2-h091b4b1_0
libtiff conda-forge/osx-arm64::libtiff-4.7.0-hfce79cd_1
libwebp-base conda-forge/osx-arm64::libwebp-base-1.5.0-h2471fea_0
libxml2 conda-forge/osx-arm64::libxml2-2.13.5-hbbdcc80_0
libzlib conda-forge/osx-arm64::libzlib-1.3.1-h8359307_2
openfst conda-forge/osx-arm64::openfst-1.8.2-hdb0ca01_2
pango conda-forge/osx-arm64::pango-1.56.1-h73f1e88_0
pcre2 conda-forge/osx-arm64::pcre2-10.44-h297a79d_2
pixman conda-forge/osx-arm64::pixman-0.44.2-h2f9eb0b_0
pynini conda-forge/osx-arm64::pynini-2.1.5-py310h38f39d4_6
python_abi conda-forge/osx-arm64::python_abi-3.10-5_cp310
zstd conda-forge/osx-arm64::zstd-1.5.6-hb46c0d2_0
The following packages will be UPDATED:
openssl pkgs/main::openssl-3.0.15-h80987f9_0 --> conda-forge::openssl-3.4.0-h81ee809_1
zlib pkgs/main::zlib-1.2.13-h18a0788_1 --> conda-forge::zlib-1.3.1-h8359307_2
The following packages will be SUPERSEDED by a higher-priority channel:
python pkgs/main::python-3.10.16-hb885b13_1 --> conda-forge::python-3.10.13-h2469fbe_1_cpython
sqlite pkgs/main::sqlite-3.45.3-h80987f9_0 --> conda-forge::sqlite-3.45.2-hf2abe2d_0
tk pkgs/main::tk-8.6.14-h6ba3021_0 --> conda-forge::tk-8.6.13-h5083fa2_1
Downloading and Extracting Packages:
Preparing transaction: done
Verifying transaction: done
Executing transaction: |
/
done
pip install -r requirements.txt
7. (cosyvoice) ysic@m4macmini CosyVoice % pip install -r requirements.txt
Looking in indexes: https://pypi.org/simple, https://download.pytorch.org/whl/cu121, https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/onnxruntime-cuda-12/pypi/simple/
Ignoring deepspeed: markers 'sys_platform == "linux"' don't match your environment
Ignoring onnxruntime-gpu: markers 'sys_platform == "linux"' don't match your environment
Ignoring tensorrt-cu12: markers 'sys_platform == "linux"' don't match your environment
Ignoring tensorrt-cu12-bindings: markers 'sys_platform == "linux"' don't match your environment
Ignoring tensorrt-cu12-libs: markers 'sys_platform == "linux"' don't match your environment
Collecting conformer==0.3.2 (from -r requirements.txt (line 3))
Using cached conformer-0.3.2-py3-none-any.whl.metadata (631 bytes)
Collecting diffusers==0.29.0 (from -r requirements.txt (line 5))
Using cached diffusers-0.29.0-py3-none-any.whl.metadata (19 kB)
Collecting gdown==5.1.0 (from -r requirements.txt (line 6))
Using cached gdown-5.1.0-py3-none-any.whl.metadata (5.7 kB)
Collecting gradio==5.4.0 (from -r requirements.txt (line 7))
Using cached gradio-5.4.0-py3-none-any.whl.metadata (16 kB)
Collecting grpcio==1.57.0 (from -r requirements.txt (line 8))
Using cached grpcio-1.57.0-cp310-cp310-macosx_12_0_universal2.whl.metadata (4.0 kB)
Collecting grpcio-tools==1.57.0 (from -r requirements.txt (line 9))
Using cached grpcio_tools-1.57.0-cp310-cp310-macosx_12_0_universal2.whl.metadata (6.2 kB)
Collecting hydra-core==1.3.2 (from -r requirements.txt (line 10))
Using cached hydra_core-1.3.2-py3-none-any.whl.metadata (5.5 kB)
Collecting HyperPyYAML==1.2.2 (from -r requirements.txt (line 11))
Using cached HyperPyYAML-1.2.2-py3-none-any.whl.metadata (7.6 kB)
Collecting inflect==7.3.1 (from -r requirements.txt (line 12))
Using cached inflect-7.3.1-py3-none-any.whl.metadata (21 kB)
Collecting librosa==0.10.2 (from -r requirements.txt (line 13))
Using cached librosa-0.10.2-py3-none-any.whl.metadata (8.6 kB)
Collecting lightning==2.2.4 (from -r requirements.txt (line 14))
Using cached lightning-2.2.4-py3-none-any.whl.metadata (53 kB)
Collecting matplotlib==3.7.5 (from -r requirements.txt (line 15))
Using cached matplotlib-3.7.5-cp310-cp310-macosx_11_0_arm64.whl.metadata (5.7 kB)
Collecting modelscope==1.15.0 (from -r requirements.txt (line 16))
Using cached modelscope-1.15.0-py3-none-any.whl.metadata (33 kB)
Collecting networkx==3.1 (from -r requirements.txt (line 17))
Using cached networkx-3.1-py3-none-any.whl.metadata (5.3 kB)
Collecting omegaconf==2.3.0 (from -r requirements.txt (line 18))
Using cached omegaconf-2.3.0-py3-none-any.whl.metadata (3.9 kB)
Collecting onnx==1.16.0 (from -r requirements.txt (line 19))
Using cached onnx-1.16.0-cp310-cp310-macosx_10_15_universal2.whl.metadata (16 kB)
Collecting onnxruntime==1.18.0 (from -r requirements.txt (line 21))
Using cached onnxruntime-1.18.0-cp310-cp310-macosx_11_0_universal2.whl.metadata (4.2 kB)
Collecting openai-whisper==20231117 (from -r requirements.txt (line 22))
Using cached openai_whisper-20231117-py3-none-any.whl
Collecting protobuf==4.25 (from -r requirements.txt (line 23))
Using cached protobuf-4.25.0-cp37-abi3-macosx_10_9_universal2.whl.metadata (541 bytes)
Collecting pydantic==2.7.0 (from -r requirements.txt (line 24))
Using cached pydantic-2.7.0-py3-none-any.whl.metadata (103 kB)
Collecting pyworld==0.3.4 (from -r requirements.txt (line 25))
Using cached pyworld-0.3.4-cp310-cp310-macosx_11_0_arm64.whl
Collecting rich==13.7.1 (from -r requirements.txt (line 26))
Using cached rich-13.7.1-py3-none-any.whl.metadata (18 kB)
Collecting soundfile==0.12.1 (from -r requirements.txt (line 27))
Using cached soundfile-0.12.1-py2.py3-none-macosx_11_0_arm64.whl.metadata (14 kB)
Collecting tensorboard==2.14.0 (from -r requirements.txt (line 28))
Using cached tensorboard-2.14.0-py3-none-any.whl.metadata (1.8 kB)
Collecting torch==2.3.1 (from -r requirements.txt (line 32))
Using cached torch-2.3.1-cp310-none-macosx_11_0_arm64.whl.metadata (26 kB)
Collecting torchaudio==2.3.1 (from -r requirements.txt (line 33))
Using cached torchaudio-2.3.1-cp310-cp310-macosx_11_0_arm64.whl.metadata (6.4 kB)
Collecting transformers==4.40.1 (from -r requirements.txt (line 34))
Using cached transformers-4.40.1-py3-none-any.whl.metadata (137 kB)
Collecting uvicorn==0.30.0 (from -r requirements.txt (line 35))
Using cached uvicorn-0.30.0-py3-none-any.whl.metadata (6.3 kB)
Collecting wget==3.2 (from -r requirements.txt (line 36))
Using cached wget-3.2-py3-none-any.whl
Collecting fastapi==0.115.6 (from -r requirements.txt (line 37))
Using cached fastapi-0.115.6-py3-none-any.whl.metadata (27 kB)
Collecting fastapi-cli==0.0.4 (from -r requirements.txt (line 38))
Using cached fastapi_cli-0.0.4-py3-none-any.whl.metadata (7.0 kB)
Collecting WeTextProcessing==1.0.3 (from -r requirements.txt (line 39))
Using cached WeTextProcessing-1.0.3-py3-none-any.whl.metadata (7.2 kB)
Collecting einops>=0.6.1 (from conformer==0.3.2->-r requirements.txt (line 3))
Using cached einops-0.8.0-py3-none-any.whl.metadata (12 kB)
Collecting importlib-metadata (from diffusers==0.29.0->-r requirements.txt (line 5))
Using cached importlib_metadata-8.6.1-py3-none-any.whl.metadata (4.7 kB)
Collecting filelock (from diffusers==0.29.0->-r requirements.txt (line 5))
Using cached filelock-3.17.0-py3-none-any.whl.metadata (2.9 kB)
Collecting huggingface-hub>=0.23.2 (from diffusers==0.29.0->-r requirements.txt (line 5))
Downloading huggingface_hub-0.28.0-py3-none-any.whl.metadata (13 kB)
Collecting numpy (from diffusers==0.29.0->-r requirements.txt (line 5))
Using cached numpy-2.2.2-cp310-cp310-macosx_14_0_arm64.whl.metadata (62 kB)
Collecting regex!=2019.12.17 (from diffusers==0.29.0->-r requirements.txt (line 5))
Using cached regex-2024.11.6-cp310-cp310-macosx_11_0_arm64.whl.metadata (40 kB)
Collecting requests (from diffusers==0.29.0->-r requirements.txt (line 5))
Using cached requests-2.32.3-py3-none-any.whl.metadata (4.6 kB)
Collecting safetensors>=0.3.1 (from diffusers==0.29.0->-r requirements.txt (line 5))
Using cached safetensors-0.5.2-cp38-abi3-macosx_11_0_arm64.whl.metadata (3.8 kB)
Collecting Pillow (from diffusers==0.29.0->-r requirements.txt (line 5))
Using cached pillow-11.1.0-cp310-cp310-macosx_11_0_arm64.whl.metadata (9.1 kB)
Collecting beautifulsoup4 (from gdown==5.1.0->-r requirements.txt (line 6))
Using cached beautifulsoup4-4.12.3-py3-none-any.whl.metadata (3.8 kB)
Collecting tqdm (from gdown==5.1.0->-r requirements.txt (line 6))
Using cached tqdm-4.67.1-py3-none-any.whl.metadata (57 kB)
Collecting aiofiles<24.0,>=22.0 (from gradio==5.4.0->-r requirements.txt (line 7))
Using cached aiofiles-23.2.1-py3-none-any.whl.metadata (9.7 kB)
Collecting anyio<5.0,>=3.0 (from gradio==5.4.0->-r requirements.txt (line 7))
Using cached anyio-4.8.0-py3-none-any.whl.metadata (4.6 kB)
Collecting ffmpy (from gradio==5.4.0->-r requirements.txt (line 7))
Using cached ffmpy-0.5.0-py3-none-any.whl.metadata (3.0 kB)
Collecting gradio-client==1.4.2 (from gradio==5.4.0->-r requirements.txt (line 7))
Using cached gradio_client-1.4.2-py3-none-any.whl.metadata (7.1 kB)
Collecting httpx>=0.24.1 (from gradio==5.4.0->-r requirements.txt (line 7))
Using cached httpx-0.28.1-py3-none-any.whl.metadata (7.1 kB)
Collecting jinja2<4.0 (from gradio==5.4.0->-r requirements.txt (line 7))
Using cached jinja2-3.1.5-py3-none-any.whl.metadata (2.6 kB)
Collecting markupsafe~=2.0 (from gradio==5.4.0->-r requirements.txt (line 7))
Using cached https://download.pytorch.org/whl/MarkupSafe-2.1.5-cp310-cp310-macosx_10_9_universal2.whl (18 kB)
Collecting orjson~=3.0 (from gradio==5.4.0->-r requirements.txt (line 7))
Using cached orjson-3.10.15-cp310-cp310-macosx_10_15_x86_64.macosx_11_0_arm64.macosx_10_15_universal2.whl.metadata (41 kB)
Collecting packaging (from gradio==5.4.0->-r requirements.txt (line 7))
Using cached packaging-24.2-py3-none-any.whl.metadata (3.2 kB)
Collecting pandas<3.0,>=1.0 (from gradio==5.4.0->-r requirements.txt (line 7))
Using cached pandas-2.2.3-cp310-cp310-macosx_11_0_arm64.whl.metadata (89 kB)
Collecting pydub (from gradio==5.4.0->-r requirements.txt (line 7))
Using cached pydub-0.25.1-py2.py3-none-any.whl.metadata (1.4 kB)
Collecting python-multipart==0.0.12 (from gradio==5.4.0->-r requirements.txt (line 7))
Using cached python_multipart-0.0.12-py3-none-any.whl.metadata (1.9 kB)
Collecting pyyaml<7.0,>=5.0 (from gradio==5.4.0->-r requirements.txt (line 7))
Using cached PyYAML-6.0.2-cp310-cp310-macosx_11_0_arm64.whl.metadata (2.1 kB)
Collecting ruff>=0.2.2 (from gradio==5.4.0->-r requirements.txt (line 7))
Using cached ruff-0.9.3-py3-none-macosx_11_0_arm64.whl.metadata (25 kB)
Collecting safehttpx<1.0,>=0.1.1 (from gradio==5.4.0->-r requirements.txt (line 7))
Using cached safehttpx-0.1.6-py3-none-any.whl.metadata (4.2 kB)
Collecting semantic-version~=2.0 (from gradio==5.4.0->-r requirements.txt (line 7))
Using cached semantic_version-2.10.0-py2.py3-none-any.whl.metadata (9.7 kB)
Collecting starlette<1.0,>=0.40.0 (from gradio==5.4.0->-r requirements.txt (line 7))
Using cached starlette-0.45.3-py3-none-any.whl.metadata (6.3 kB)
Collecting tomlkit==0.12.0 (from gradio==5.4.0->-r requirements.txt (line 7))
Using cached tomlkit-0.12.0-py3-none-any.whl.metadata (2.7 kB)
Collecting typer<1.0,>=0.12 (from gradio==5.4.0->-r requirements.txt (line 7))
Using cached typer-0.15.1-py3-none-any.whl.metadata (15 kB)
Collecting typing-extensions~=4.0 (from gradio==5.4.0->-r requirements.txt (line 7))
Downloading https://download.pytorch.org/whl/typing_extensions-4.12.2-py3-none-any.whl.metadata (3.0 kB)
Requirement already satisfied: setuptools in /opt/homebrew/Caskroom/miniconda/base/envs/cosyvoice/lib/python3.10/site-packages (from grpcio-tools==1.57.0->-r requirements.txt (line 9)) (75.8.0)
Collecting antlr4-python3-runtime==4.9.* (from hydra-core==1.3.2->-r requirements.txt (line 10))
Using cached antlr4_python3_runtime-4.9.3-py3-none-any.whl
Collecting ruamel.yaml>=0.17.28 (from HyperPyYAML==1.2.2->-r requirements.txt (line 11))
Using cached ruamel.yaml-0.18.10-py3-none-any.whl.metadata (23 kB)
Collecting more-itertools>=8.5.0 (from inflect==7.3.1->-r requirements.txt (line 12))
Using cached more_itertools-10.6.0-py3-none-any.whl.metadata (37 kB)
Collecting typeguard>=4.0.1 (from inflect==7.3.1->-r requirements.txt (line 12))
Using cached typeguard-4.4.1-py3-none-any.whl.metadata (3.7 kB)
Collecting audioread>=2.1.9 (from librosa==0.10.2->-r requirements.txt (line 13))
Using cached audioread-3.0.1-py3-none-any.whl.metadata (8.4 kB)
Collecting scipy>=1.2.0 (from librosa==0.10.2->-r requirements.txt (line 13))
Using cached scipy-1.15.1-cp310-cp310-macosx_14_0_arm64.whl.metadata (61 kB)
Collecting scikit-learn>=0.20.0 (from librosa==0.10.2->-r requirements.txt (line 13))
Using cached scikit_learn-1.6.1-cp310-cp310-macosx_12_0_arm64.whl.metadata (31 kB)
Collecting joblib>=0.14 (from librosa==0.10.2->-r requirements.txt (line 13))
Using cached joblib-1.4.2-py3-none-any.whl.metadata (5.4 kB)
Collecting decorator>=4.3.0 (from librosa==0.10.2->-r requirements.txt (line 13))
Using cached decorator-5.1.1-py3-none-any.whl.metadata (4.0 kB)
Collecting numba>=0.51.0 (from librosa==0.10.2->-r requirements.txt (line 13))
Using cached numba-0.61.0-cp310-cp310-macosx_11_0_arm64.whl.metadata (2.7 kB)
Collecting pooch>=1.1 (from librosa==0.10.2->-r requirements.txt (line 13))
Using cached pooch-1.8.2-py3-none-any.whl.metadata (10 kB)
Collecting soxr>=0.3.2 (from librosa==0.10.2->-r requirements.txt (line 13))
Using cached soxr-0.5.0.post1-cp310-cp310-macosx_11_0_arm64.whl.metadata (5.6 kB)
Collecting lazy-loader>=0.1 (from librosa==0.10.2->-r requirements.txt (line 13))
Using cached lazy_loader-0.4-py3-none-any.whl.metadata (7.6 kB)
Collecting msgpack>=1.0 (from librosa==0.10.2->-r requirements.txt (line 13))
Using cached msgpack-1.1.0-cp310-cp310-macosx_11_0_arm64.whl.metadata (8.4 kB)
Collecting fsspec<2025.0,>=2022.5.0 (from fsspec[http]<2025.0,>=2022.5.0->lightning==2.2.4->-r requirements.txt (line 14))
Using cached fsspec-2024.12.0-py3-none-any.whl.metadata (11 kB)
Collecting lightning-utilities<2.0,>=0.8.0 (from lightning==2.2.4->-r requirements.txt (line 14))
Using cached lightning_utilities-0.11.9-py3-none-any.whl.metadata (5.2 kB)
Collecting torchmetrics<3.0,>=0.7.0 (from lightning==2.2.4->-r requirements.txt (line 14))
Using cached torchmetrics-1.6.1-py3-none-any.whl.metadata (21 kB)
Collecting pytorch-lightning (from lightning==2.2.4->-r requirements.txt (line 14))
Using cached pytorch_lightning-2.5.0.post0-py3-none-any.whl.metadata (21 kB)
Collecting contourpy>=1.0.1 (from matplotlib==3.7.5->-r requirements.txt (line 15))
Using cached contourpy-1.3.1-cp310-cp310-macosx_11_0_arm64.whl.metadata (5.4 kB)
Collecting cycler>=0.10 (from matplotlib==3.7.5->-r requirements.txt (line 15))
Using cached cycler-0.12.1-py3-none-any.whl.metadata (3.8 kB)
Collecting fonttools>=4.22.0 (from matplotlib==3.7.5->-r requirements.txt (line 15))
Downloading fonttools-4.55.8-cp310-cp310-macosx_10_9_universal2.whl.metadata (101 kB)
Collecting kiwisolver>=1.0.1 (from matplotlib==3.7.5->-r requirements.txt (line 15))
Using cached kiwisolver-1.4.8-cp310-cp310-macosx_11_0_arm64.whl.metadata (6.2 kB)
Collecting numpy (from diffusers==0.29.0->-r requirements.txt (line 5))
Using cached numpy-1.26.4-cp310-cp310-macosx_11_0_arm64.whl.metadata (61 kB)
Collecting pyparsing>=2.3.1 (from matplotlib==3.7.5->-r requirements.txt (line 15))
Using cached pyparsing-3.2.1-py3-none-any.whl.metadata (5.0 kB)
Collecting python-dateutil>=2.7 (from matplotlib==3.7.5->-r requirements.txt (line 15))
Using cached python_dateutil-2.9.0.post0-py2.py3-none-any.whl.metadata (8.4 kB)
Collecting addict (from modelscope==1.15.0->-r requirements.txt (line 16))
Using cached addict-2.4.0-py3-none-any.whl.metadata (1.0 kB)
Collecting attrs (from modelscope==1.15.0->-r requirements.txt (line 16))
Using cached attrs-25.1.0-py3-none-any.whl.metadata (10 kB)
Collecting datasets<2.19.0,>=2.16.0 (from modelscope==1.15.0->-r requirements.txt (line 16))
Using cached datasets-2.18.0-py3-none-any.whl.metadata (20 kB)
Collecting gast>=0.2.2 (from modelscope==1.15.0->-r requirements.txt (line 16))
Using cached gast-0.6.0-py3-none-any.whl.metadata (1.3 kB)
Collecting oss2 (from modelscope==1.15.0->-r requirements.txt (line 16))
Using cached oss2-2.19.1-py3-none-any.whl
Collecting pyarrow!=9.0.0,>=6.0.0 (from modelscope==1.15.0->-r requirements.txt (line 16))
Using cached pyarrow-19.0.0-cp310-cp310-macosx_12_0_arm64.whl.metadata (3.3 kB)
Collecting simplejson>=3.3.0 (from modelscope==1.15.0->-r requirements.txt (line 16))
Using cached simplejson-3.19.3-cp310-cp310-macosx_11_0_arm64.whl.metadata (3.2 kB)
Collecting sortedcontainers>=1.5.9 (from modelscope==1.15.0->-r requirements.txt (line 16))
Using cached sortedcontainers-2.4.0-py2.py3-none-any.whl.metadata (10 kB)
Collecting urllib3>=1.26 (from modelscope==1.15.0->-r requirements.txt (line 16))
Using cached urllib3-2.3.0-py3-none-any.whl.metadata (6.5 kB)
Collecting yapf (from modelscope==1.15.0->-r requirements.txt (line 16))
Using cached yapf-0.43.0-py3-none-any.whl.metadata (46 kB)
Collecting coloredlogs (from onnxruntime==1.18.0->-r requirements.txt (line 21))
Downloading https://aiinfra.pkgs.visualstudio.com/2692857e-05ef-43b4-ba9c-ccf1c22c437c/_packaging/9387c3aa-d9ad-4513-968c-383f6f7f53b8/pypi/download/coloredlogs/15.0.1/coloredlogs-15.0.1-py2.py3-none-any.whl (46 kB)
Collecting flatbuffers (from onnxruntime==1.18.0->-r requirements.txt (line 21))
Using cached flatbuffers-25.1.24-py2.py3-none-any.whl.metadata (875 bytes)
Collecting sympy (from onnxruntime==1.18.0->-r requirements.txt (line 21))
Downloading https://aiinfra.pkgs.visualstudio.com/2692857e-05ef-43b4-ba9c-ccf1c22c437c/_packaging/9387c3aa-d9ad-4513-968c-383f6f7f53b8/pypi/download/sympy/1.13.3/sympy-1.13.3-py3-none-any.whl (6.2 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 6.2/6.2 MB 37.8 MB/s eta 0:00:00
Collecting tiktoken (from openai-whisper==20231117->-r requirements.txt (line 22))
Using cached tiktoken-0.8.0-cp310-cp310-macosx_11_0_arm64.whl.metadata (6.6 kB)
Collecting annotated-types>=0.4.0 (from pydantic==2.7.0->-r requirements.txt (line 24))
Using cached annotated_types-0.7.0-py3-none-any.whl.metadata (15 kB)
Collecting pydantic-core==2.18.1 (from pydantic==2.7.0->-r requirements.txt (line 24))
Using cached pydantic_core-2.18.1-cp310-cp310-macosx_11_0_arm64.whl.metadata (6.5 kB)
Collecting cython>=0.24 (from pyworld==0.3.4->-r requirements.txt (line 25))
Using cached Cython-3.0.11-py2.py3-none-any.whl.metadata (3.2 kB)
Collecting markdown-it-py>=2.2.0 (from rich==13.7.1->-r requirements.txt (line 26))
Using cached markdown_it_py-3.0.0-py3-none-any.whl.metadata (6.9 kB)
Collecting pygments<3.0.0,>=2.13.0 (from rich==13.7.1->-r requirements.txt (line 26))
Using cached pygments-2.19.1-py3-none-any.whl.metadata (2.5 kB)
Collecting cffi>=1.0 (from soundfile==0.12.1->-r requirements.txt (line 27))
Using cached cffi-1.17.1-cp310-cp310-macosx_11_0_arm64.whl.metadata (1.5 kB)
Collecting absl-py>=0.4 (from tensorboard==2.14.0->-r requirements.txt (line 28))
Using cached absl_py-2.1.0-py3-none-any.whl.metadata (2.3 kB)
Collecting google-auth<3,>=1.6.3 (from tensorboard==2.14.0->-r requirements.txt (line 28))
Using cached google_auth-2.38.0-py2.py3-none-any.whl.metadata (4.8 kB)
Collecting google-auth-oauthlib<1.1,>=0.5 (from tensorboard==2.14.0->-r requirements.txt (line 28))
Using cached google_auth_oauthlib-1.0.0-py2.py3-none-any.whl.metadata (2.7 kB)
Collecting markdown>=2.6.8 (from tensorboard==2.14.0->-r requirements.txt (line 28))
Using cached Markdown-3.7-py3-none-any.whl.metadata (7.0 kB)
Collecting tensorboard-data-server<0.8.0,>=0.7.0 (from tensorboard==2.14.0->-r requirements.txt (line 28))
Using cached tensorboard_data_server-0.7.2-py3-none-any.whl.metadata (1.1 kB)
Collecting werkzeug>=1.0.1 (from tensorboard==2.14.0->-r requirements.txt (line 28))
Using cached werkzeug-3.1.3-py3-none-any.whl.metadata (3.7 kB)
Requirement already satisfied: wheel>=0.26 in /opt/homebrew/Caskroom/miniconda/base/envs/cosyvoice/lib/python3.10/site-packages (from tensorboard==2.14.0->-r requirements.txt (line 28)) (0.45.1)
Collecting tokenizers<0.20,>=0.19 (from transformers==4.40.1->-r requirements.txt (line 34))
Using cached tokenizers-0.19.1-cp310-cp310-macosx_11_0_arm64.whl.metadata (6.7 kB)
Collecting click>=7.0 (from uvicorn==0.30.0->-r requirements.txt (line 35))
Using cached click-8.1.8-py3-none-any.whl.metadata (2.3 kB)
Collecting h11>=0.8 (from uvicorn==0.30.0->-r requirements.txt (line 35))
Using cached h11-0.14.0-py3-none-any.whl.metadata (8.2 kB)
Collecting starlette<1.0,>=0.40.0 (from gradio==5.4.0->-r requirements.txt (line 7))
Using cached starlette-0.41.3-py3-none-any.whl.metadata (6.0 kB)
Requirement already satisfied: pynini==2.1.5 in /opt/homebrew/Caskroom/miniconda/base/envs/cosyvoice/lib/python3.10/site-packages (from WeTextProcessing==1.0.3->-r requirements.txt (line 39)) (2.1.5)
Collecting importlib-resources (from WeTextProcessing==1.0.3->-r requirements.txt (line 39))
Using cached importlib_resources-6.5.2-py3-none-any.whl.metadata (3.9 kB)
Collecting websockets<13.0,>=10.0 (from gradio-client==1.4.2->gradio==5.4.0->-r requirements.txt (line 7))
Using cached websockets-12.0-cp310-cp310-macosx_11_0_arm64.whl.metadata (6.6 kB)
Collecting exceptiongroup>=1.0.2 (from anyio<5.0,>=3.0->gradio==5.4.0->-r requirements.txt (line 7))
Using cached exceptiongroup-1.2.2-py3-none-any.whl.metadata (6.6 kB)
Collecting idna>=2.8 (from anyio<5.0,>=3.0->gradio==5.4.0->-r requirements.txt (line 7))
Using cached idna-3.10-py3-none-any.whl.metadata (10 kB)
Collecting sniffio>=1.1 (from anyio<5.0,>=3.0->gradio==5.4.0->-r requirements.txt (line 7))
Using cached sniffio-1.3.1-py3-none-any.whl.metadata (3.9 kB)
Collecting pycparser (from cffi>=1.0->soundfile==0.12.1->-r requirements.txt (line 27))
Using cached pycparser-2.22-py3-none-any.whl.metadata (943 bytes)
Collecting pyarrow-hotfix (from datasets<2.19.0,>=2.16.0->modelscope==1.15.0->-r requirements.txt (line 16))
Using cached pyarrow_hotfix-0.6-py3-none-any.whl.metadata (3.6 kB)
Collecting dill<0.3.9,>=0.3.0 (from datasets<2.19.0,>=2.16.0->modelscope==1.15.0->-r requirements.txt (line 16))
Using cached dill-0.3.8-py3-none-any.whl.metadata (10 kB)
Collecting xxhash (from datasets<2.19.0,>=2.16.0->modelscope==1.15.0->-r requirements.txt (line 16))
Using cached xxhash-3.5.0-cp310-cp310-macosx_11_0_arm64.whl.metadata (12 kB)
Collecting multiprocess (from datasets<2.19.0,>=2.16.0->modelscope==1.15.0->-r requirements.txt (line 16))
Using cached multiprocess-0.70.17-py310-none-any.whl.metadata (7.2 kB)
Collecting fsspec<2025.0,>=2022.5.0 (from fsspec[http]<2025.0,>=2022.5.0->lightning==2.2.4->-r requirements.txt (line 14))
Using cached https://download.pytorch.org/whl/fsspec-2024.2.0-py3-none-any.whl (170 kB)
Collecting aiohttp (from datasets<2.19.0,>=2.16.0->modelscope==1.15.0->-r requirements.txt (line 16))
Using cached aiohttp-3.11.11-cp310-cp310-macosx_11_0_arm64.whl.metadata (7.7 kB)
Collecting cachetools<6.0,>=2.0.0 (from google-auth<3,>=1.6.3->tensorboard==2.14.0->-r requirements.txt (line 28))
Using cached cachetools-5.5.1-py3-none-any.whl.metadata (5.4 kB)
Collecting pyasn1-modules>=0.2.1 (from google-auth<3,>=1.6.3->tensorboard==2.14.0->-r requirements.txt (line 28))
Using cached pyasn1_modules-0.4.1-py3-none-any.whl.metadata (3.5 kB)
Collecting rsa<5,>=3.1.4 (from google-auth<3,>=1.6.3->tensorboard==2.14.0->-r requirements.txt (line 28))
Using cached rsa-4.9-py3-none-any.whl.metadata (4.2 kB)
Collecting requests-oauthlib>=0.7.0 (from google-auth-oauthlib<1.1,>=0.5->tensorboard==2.14.0->-r requirements.txt (line 28))
Using cached requests_oauthlib-2.0.0-py2.py3-none-any.whl.metadata (11 kB)
Collecting certifi (from httpx>=0.24.1->gradio==5.4.0->-r requirements.txt (line 7))
Using cached certifi-2024.12.14-py3-none-any.whl.metadata (2.3 kB)
Collecting httpcore==1.* (from httpx>=0.24.1->gradio==5.4.0->-r requirements.txt (line 7))
Using cached httpcore-1.0.7-py3-none-any.whl.metadata (21 kB)
Collecting mdurl~=0.1 (from markdown-it-py>=2.2.0->rich==13.7.1->-r requirements.txt (line 26))
Using cached mdurl-0.1.2-py3-none-any.whl.metadata (1.6 kB)
Collecting llvmlite<0.45,>=0.44.0dev0 (from numba>=0.51.0->librosa==0.10.2->-r requirements.txt (line 13))
Using cached llvmlite-0.44.0-cp310-cp310-macosx_11_0_arm64.whl.metadata (4.8 kB)
Collecting pytz>=2020.1 (from pandas<3.0,>=1.0->gradio==5.4.0->-r requirements.txt (line 7))
Using cached pytz-2024.2-py2.py3-none-any.whl.metadata (22 kB)
Collecting tzdata>=2022.7 (from pandas<3.0,>=1.0->gradio==5.4.0->-r requirements.txt (line 7))
Using cached tzdata-2025.1-py2.py3-none-any.whl.metadata (1.4 kB)
Collecting platformdirs>=2.5.0 (from pooch>=1.1->librosa==0.10.2->-r requirements.txt (line 13))
Using cached platformdirs-4.3.6-py3-none-any.whl.metadata (11 kB)
Collecting six>=1.5 (from python-dateutil>=2.7->matplotlib==3.7.5->-r requirements.txt (line 15))
Using cached six-1.17.0-py2.py3-none-any.whl.metadata (1.7 kB)
Collecting charset-normalizer<4,>=2 (from requests->diffusers==0.29.0->-r requirements.txt (line 5))
Using cached charset_normalizer-3.4.1-cp310-cp310-macosx_10_9_universal2.whl.metadata (35 kB)
Collecting ruamel.yaml.clib>=0.2.7 (from ruamel.yaml>=0.17.28->HyperPyYAML==1.2.2->-r requirements.txt (line 11))
Using cached ruamel.yaml.clib-0.2.12-cp310-cp310-macosx_13_0_arm64.whl.metadata (1.2 kB)
Collecting threadpoolctl>=3.1.0 (from scikit-learn>=0.20.0->librosa==0.10.2->-r requirements.txt (line 13))
Using cached threadpoolctl-3.5.0-py3-none-any.whl.metadata (13 kB)
Collecting shellingham>=1.3.0 (from typer<1.0,>=0.12->gradio==5.4.0->-r requirements.txt (line 7))
Using cached shellingham-1.5.4-py2.py3-none-any.whl.metadata (3.5 kB)
Collecting soupsieve>1.2 (from beautifulsoup4->gdown==5.1.0->-r requirements.txt (line 6))
Using cached soupsieve-2.6-py3-none-any.whl.metadata (4.6 kB)
Collecting humanfriendly>=9.1 (from coloredlogs->onnxruntime==1.18.0->-r requirements.txt (line 21))
Downloading https://aiinfra.pkgs.visualstudio.com/2692857e-05ef-43b4-ba9c-ccf1c22c437c/_packaging/9387c3aa-d9ad-4513-968c-383f6f7f53b8/pypi/download/humanfriendly/10/humanfriendly-10.0-py2.py3-none-any.whl (86 kB)
Collecting zipp>=3.20 (from importlib-metadata->diffusers==0.29.0->-r requirements.txt (line 5))
Using cached zipp-3.21.0-py3-none-any.whl.metadata (3.7 kB)
Collecting crcmod>=1.7 (from oss2->modelscope==1.15.0->-r requirements.txt (line 16))
Using cached crcmod-1.7-cp310-cp310-macosx_11_0_arm64.whl
Collecting pycryptodome>=3.4.7 (from oss2->modelscope==1.15.0->-r requirements.txt (line 16))
Using cached pycryptodome-3.21.0-cp36-abi3-macosx_10_9_universal2.whl.metadata (3.4 kB)
Collecting aliyun-python-sdk-kms>=2.4.1 (from oss2->modelscope==1.15.0->-r requirements.txt (line 16))
Using cached aliyun_python_sdk_kms-2.16.5-py2.py3-none-any.whl.metadata (1.5 kB)
Collecting aliyun-python-sdk-core>=2.13.12 (from oss2->modelscope==1.15.0->-r requirements.txt (line 16))
Using cached aliyun_python_sdk_core-2.16.0-py3-none-any.whl
Collecting PySocks!=1.5.7,>=1.5.6 (from requests[socks]->gdown==5.1.0->-r requirements.txt (line 6))
Using cached PySocks-1.7.1-py3-none-any.whl.metadata (13 kB)
Collecting mpmath<1.4,>=1.1.0 (from sympy->onnxruntime==1.18.0->-r requirements.txt (line 21))
Downloading https://aiinfra.pkgs.visualstudio.com/2692857e-05ef-43b4-ba9c-ccf1c22c437c/_packaging/9387c3aa-d9ad-4513-968c-383f6f7f53b8/pypi/download/mpmath/1.3/mpmath-1.3.0-py3-none-any.whl (536 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 536.2/536.2 kB 9.8 MB/s eta 0:00:00
Collecting tomli>=2.0.1 (from yapf->modelscope==1.15.0->-r requirements.txt (line 16))
Using cached tomli-2.2.1-py3-none-any.whl.metadata (10 kB)
Collecting aiohappyeyeballs>=2.3.0 (from aiohttp->datasets<2.19.0,>=2.16.0->modelscope==1.15.0->-r requirements.txt (line 16))
Using cached aiohappyeyeballs-2.4.4-py3-none-any.whl.metadata (6.1 kB)
Collecting aiosignal>=1.1.2 (from aiohttp->datasets<2.19.0,>=2.16.0->modelscope==1.15.0->-r requirements.txt (line 16))
Using cached aiosignal-1.3.2-py2.py3-none-any.whl.metadata (3.8 kB)
Collecting async-timeout<6.0,>=4.0 (from aiohttp->datasets<2.19.0,>=2.16.0->modelscope==1.15.0->-r requirements.txt (line 16))
Using cached async_timeout-5.0.1-py3-none-any.whl.metadata (5.1 kB)
Collecting frozenlist>=1.1.1 (from aiohttp->datasets<2.19.0,>=2.16.0->modelscope==1.15.0->-r requirements.txt (line 16))
Using cached frozenlist-1.5.0-cp310-cp310-macosx_11_0_arm64.whl.metadata (13 kB)
Collecting multidict<7.0,>=4.5 (from aiohttp->datasets<2.19.0,>=2.16.0->modelscope==1.15.0->-r requirements.txt (line 16))
Using cached multidict-6.1.0-cp310-cp310-macosx_11_0_arm64.whl.metadata (5.0 kB)
Collecting propcache>=0.2.0 (from aiohttp->datasets<2.19.0,>=2.16.0->modelscope==1.15.0->-r requirements.txt (line 16))
Using cached propcache-0.2.1-cp310-cp310-macosx_11_0_arm64.whl.metadata (9.2 kB)
Collecting yarl<2.0,>=1.17.0 (from aiohttp->datasets<2.19.0,>=2.16.0->modelscope==1.15.0->-r requirements.txt (line 16))
Using cached yarl-1.18.3-cp310-cp310-macosx_11_0_arm64.whl.metadata (69 kB)
Collecting jmespath<1.0.0,>=0.9.3 (from aliyun-python-sdk-core>=2.13.12->oss2->modelscope==1.15.0->-r requirements.txt (line 16))
Using cached jmespath-0.10.0-py2.py3-none-any.whl.metadata (8.0 kB)
Collecting cryptography>=3.0.0 (from aliyun-python-sdk-core>=2.13.12->oss2->modelscope==1.15.0->-r requirements.txt (line 16))
Using cached cryptography-44.0.0-cp39-abi3-macosx_10_9_universal2.whl.metadata (5.7 kB)
Collecting pyasn1<0.7.0,>=0.4.6 (from pyasn1-modules>=0.2.1->google-auth<3,>=1.6.3->tensorboard==2.14.0->-r requirements.txt (line 28))
Using cached pyasn1-0.6.1-py3-none-any.whl.metadata (8.4 kB)
Collecting oauthlib>=3.0.0 (from requests-oauthlib>=0.7.0->google-auth-oauthlib<1.1,>=0.5->tensorboard==2.14.0->-r requirements.txt (line 28))
Using cached oauthlib-3.2.2-py3-none-any.whl.metadata (7.5 kB)
INFO: pip is looking at multiple versions of multiprocess to determine which version is compatible with other requirements. This could take a while.
Collecting multiprocess (from datasets<2.19.0,>=2.16.0->modelscope==1.15.0->-r requirements.txt (line 16))
Using cached multiprocess-0.70.16-py310-none-any.whl.metadata (7.2 kB)
Using cached conformer-0.3.2-py3-none-any.whl (4.3 kB)
Using cached diffusers-0.29.0-py3-none-any.whl (2.2 MB)
Using cached gdown-5.1.0-py3-none-any.whl (17 kB)
Using cached gradio-5.4.0-py3-none-any.whl (56.7 MB)
Using cached grpcio-1.57.0-cp310-cp310-macosx_12_0_universal2.whl (9.0 MB)
Using cached grpcio_tools-1.57.0-cp310-cp310-macosx_12_0_universal2.whl (4.6 MB)
Using cached hydra_core-1.3.2-py3-none-any.whl (154 kB)
Using cached HyperPyYAML-1.2.2-py3-none-any.whl (16 kB)
Using cached inflect-7.3.1-py3-none-any.whl (34 kB)
Using cached librosa-0.10.2-py3-none-any.whl (260 kB)
Using cached lightning-2.2.4-py3-none-any.whl (2.0 MB)
Using cached matplotlib-3.7.5-cp310-cp310-macosx_11_0_arm64.whl (7.3 MB)
Using cached modelscope-1.15.0-py3-none-any.whl (5.7 MB)
Using cached networkx-3.1-py3-none-any.whl (2.1 MB)
Using cached omegaconf-2.3.0-py3-none-any.whl (79 kB)
Using cached onnx-1.16.0-cp310-cp310-macosx_10_15_universal2.whl (16.5 MB)
Using cached onnxruntime-1.18.0-cp310-cp310-macosx_11_0_universal2.whl (15.9 MB)
Using cached protobuf-4.25.0-cp37-abi3-macosx_10_9_universal2.whl (393 kB)
Using cached pydantic-2.7.0-py3-none-any.whl (407 kB)
Using cached rich-13.7.1-py3-none-any.whl (240 kB)
Using cached soundfile-0.12.1-py2.py3-none-macosx_11_0_arm64.whl (1.1 MB)
Using cached tensorboard-2.14.0-py3-none-any.whl (5.5 MB)
Using cached torch-2.3.1-cp310-none-macosx_11_0_arm64.whl (61.0 MB)
Using cached torchaudio-2.3.1-cp310-cp310-macosx_11_0_arm64.whl (1.8 MB)
Using cached transformers-4.40.1-py3-none-any.whl (9.0 MB)
Using cached uvicorn-0.30.0-py3-none-any.whl (62 kB)
Using cached fastapi-0.115.6-py3-none-any.whl (94 kB)
Using cached fastapi_cli-0.0.4-py3-none-any.whl (9.5 kB)
Using cached WeTextProcessing-1.0.3-py3-none-any.whl (2.0 MB)
Using cached gradio_client-1.4.2-py3-none-any.whl (319 kB)
Using cached pydantic_core-2.18.1-cp310-cp310-macosx_11_0_arm64.whl (1.8 MB)
Using cached python_multipart-0.0.12-py3-none-any.whl (23 kB)
Using cached tomlkit-0.12.0-py3-none-any.whl (37 kB)
Using cached absl_py-2.1.0-py3-none-any.whl (133 kB)
Using cached aiofiles-23.2.1-py3-none-any.whl (15 kB)
Using cached annotated_types-0.7.0-py3-none-any.whl (13 kB)
Using cached anyio-4.8.0-py3-none-any.whl (96 kB)
Using cached audioread-3.0.1-py3-none-any.whl (23 kB)
Using cached cffi-1.17.1-cp310-cp310-macosx_11_0_arm64.whl (178 kB)
Using cached click-8.1.8-py3-none-any.whl (98 kB)
Using cached contourpy-1.3.1-cp310-cp310-macosx_11_0_arm64.whl (253 kB)
Using cached cycler-0.12.1-py3-none-any.whl (8.3 kB)
Using cached Cython-3.0.11-py2.py3-none-any.whl (1.2 MB)
Using cached datasets-2.18.0-py3-none-any.whl (510 kB)
Using cached decorator-5.1.1-py3-none-any.whl (9.1 kB)
Using cached einops-0.8.0-py3-none-any.whl (43 kB)
Using cached filelock-3.17.0-py3-none-any.whl (16 kB)
Downloading fonttools-4.55.8-cp310-cp310-macosx_10_9_universal2.whl (2.8 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.8/2.8 MB 55.4 MB/s eta 0:00:00
Using cached gast-0.6.0-py3-none-any.whl (21 kB)
Using cached google_auth-2.38.0-py2.py3-none-any.whl (210 kB)
Using cached google_auth_oauthlib-1.0.0-py2.py3-none-any.whl (18 kB)
Using cached h11-0.14.0-py3-none-any.whl (58 kB)
Using cached httpx-0.28.1-py3-none-any.whl (73 kB)
Using cached httpcore-1.0.7-py3-none-any.whl (78 kB)
Downloading huggingface_hub-0.28.0-py3-none-any.whl (464 kB)
Using cached jinja2-3.1.5-py3-none-any.whl (134 kB)
Using cached joblib-1.4.2-py3-none-any.whl (301 kB)
Using cached kiwisolver-1.4.8-cp310-cp310-macosx_11_0_arm64.whl (65 kB)
Using cached lazy_loader-0.4-py3-none-any.whl (12 kB)
Using cached lightning_utilities-0.11.9-py3-none-any.whl (28 kB)
Using cached Markdown-3.7-py3-none-any.whl (106 kB)
Using cached markdown_it_py-3.0.0-py3-none-any.whl (87 kB)
Using cached more_itertools-10.6.0-py3-none-any.whl (63 kB)
Using cached msgpack-1.1.0-cp310-cp310-macosx_11_0_arm64.whl (81 kB)
Using cached numba-0.61.0-cp310-cp310-macosx_11_0_arm64.whl (2.8 MB)
Using cached numpy-1.26.4-cp310-cp310-macosx_11_0_arm64.whl (14.0 MB)
Using cached orjson-3.10.15-cp310-cp310-macosx_10_15_x86_64.macosx_11_0_arm64.macosx_10_15_universal2.whl (249 kB)
Using cached packaging-24.2-py3-none-any.whl (65 kB)
Using cached pandas-2.2.3-cp310-cp310-macosx_11_0_arm64.whl (11.3 MB)
Using cached pillow-11.1.0-cp310-cp310-macosx_11_0_arm64.whl (3.1 MB)
Using cached pooch-1.8.2-py3-none-any.whl (64 kB)
Using cached pyarrow-19.0.0-cp310-cp310-macosx_12_0_arm64.whl (30.7 MB)
Using cached pygments-2.19.1-py3-none-any.whl (1.2 MB)
Using cached pyparsing-3.2.1-py3-none-any.whl (107 kB)
Using cached python_dateutil-2.9.0.post0-py2.py3-none-any.whl (229 kB)
Using cached PyYAML-6.0.2-cp310-cp310-macosx_11_0_arm64.whl (171 kB)
Using cached regex-2024.11.6-cp310-cp310-macosx_11_0_arm64.whl (284 kB)
Using cached requests-2.32.3-py3-none-any.whl (64 kB)
Using cached ruamel.yaml-0.18.10-py3-none-any.whl (117 kB)
Using cached ruff-0.9.3-py3-none-macosx_11_0_arm64.whl (11.0 MB)
Using cached safehttpx-0.1.6-py3-none-any.whl (8.7 kB)
Using cached safetensors-0.5.2-cp38-abi3-macosx_11_0_arm64.whl (408 kB)
Using cached scikit_learn-1.6.1-cp310-cp310-macosx_12_0_arm64.whl (11.1 MB)
Using cached scipy-1.15.1-cp310-cp310-macosx_14_0_arm64.whl (24.8 MB)
Using cached semantic_version-2.10.0-py2.py3-none-any.whl (15 kB)
Using cached simplejson-3.19.3-cp310-cp310-macosx_11_0_arm64.whl (75 kB)
Using cached sortedcontainers-2.4.0-py2.py3-none-any.whl (29 kB)
Using cached soxr-0.5.0.post1-cp310-cp310-macosx_11_0_arm64.whl (160 kB)
Using cached starlette-0.41.3-py3-none-any.whl (73 kB)
Using cached tensorboard_data_server-0.7.2-py3-none-any.whl (2.4 kB)
Using cached tokenizers-0.19.1-cp310-cp310-macosx_11_0_arm64.whl (2.4 MB)
Using cached torchmetrics-1.6.1-py3-none-any.whl (927 kB)
Using cached tqdm-4.67.1-py3-none-any.whl (78 kB)
Using cached typeguard-4.4.1-py3-none-any.whl (35 kB)
Using cached typer-0.15.1-py3-none-any.whl (44 kB)
Downloading https://download.pytorch.org/whl/typing_extensions-4.12.2-py3-none-any.whl (37 kB)
Using cached urllib3-2.3.0-py3-none-any.whl (128 kB)
Using cached werkzeug-3.1.3-py3-none-any.whl (224 kB)
Using cached addict-2.4.0-py3-none-any.whl (3.8 kB)
Using cached attrs-25.1.0-py3-none-any.whl (63 kB)
Using cached beautifulsoup4-4.12.3-py3-none-any.whl (147 kB)
Using cached ffmpy-0.5.0-py3-none-any.whl (6.0 kB)
Using cached flatbuffers-25.1.24-py2.py3-none-any.whl (30 kB)
Using cached importlib_metadata-8.6.1-py3-none-any.whl (26 kB)
Using cached importlib_resources-6.5.2-py3-none-any.whl (37 kB)
Using cached pydub-0.25.1-py2.py3-none-any.whl (32 kB)
Using cached pytorch_lightning-2.5.0.post0-py3-none-any.whl (819 kB)
Using cached tiktoken-0.8.0-cp310-cp310-macosx_11_0_arm64.whl (982 kB)
Using cached yapf-0.43.0-py3-none-any.whl (256 kB)
Using cached aiohttp-3.11.11-cp310-cp310-macosx_11_0_arm64.whl (455 kB)
Using cached aliyun_python_sdk_kms-2.16.5-py2.py3-none-any.whl (99 kB)
Using cached cachetools-5.5.1-py3-none-any.whl (9.5 kB)
Using cached certifi-2024.12.14-py3-none-any.whl (164 kB)
Using cached charset_normalizer-3.4.1-cp310-cp310-macosx_10_9_universal2.whl (198 kB)
Using cached dill-0.3.8-py3-none-any.whl (116 kB)
Using cached exceptiongroup-1.2.2-py3-none-any.whl (16 kB)
Using cached idna-3.10-py3-none-any.whl (70 kB)
Using cached llvmlite-0.44.0-cp310-cp310-macosx_11_0_arm64.whl (26.2 MB)
Using cached mdurl-0.1.2-py3-none-any.whl (10.0 kB)
Using cached platformdirs-4.3.6-py3-none-any.whl (18 kB)
Using cached pyasn1_modules-0.4.1-py3-none-any.whl (181 kB)
Using cached pycryptodome-3.21.0-cp36-abi3-macosx_10_9_universal2.whl (2.5 MB)
Using cached PySocks-1.7.1-py3-none-any.whl (16 kB)
Using cached pytz-2024.2-py2.py3-none-any.whl (508 kB)
Using cached requests_oauthlib-2.0.0-py2.py3-none-any.whl (24 kB)
Using cached rsa-4.9-py3-none-any.whl (34 kB)
Using cached ruamel.yaml.clib-0.2.12-cp310-cp310-macosx_13_0_arm64.whl (131 kB)
Using cached shellingham-1.5.4-py2.py3-none-any.whl (9.8 kB)
Using cached six-1.17.0-py2.py3-none-any.whl (11 kB)
Using cached sniffio-1.3.1-py3-none-any.whl (10 kB)
Using cached soupsieve-2.6-py3-none-any.whl (36 kB)
Using cached threadpoolctl-3.5.0-py3-none-any.whl (18 kB)
Using cached tomli-2.2.1-py3-none-any.whl (14 kB)
Using cached tzdata-2025.1-py2.py3-none-any.whl (346 kB)
Using cached websockets-12.0-cp310-cp310-macosx_11_0_arm64.whl (121 kB)
Using cached zipp-3.21.0-py3-none-any.whl (9.6 kB)
Using cached multiprocess-0.70.16-py310-none-any.whl (134 kB)
Using cached pyarrow_hotfix-0.6-py3-none-any.whl (7.9 kB)
Using cached pycparser-2.22-py3-none-any.whl (117 kB)
Using cached xxhash-3.5.0-cp310-cp310-macosx_11_0_arm64.whl (30 kB)
Using cached aiohappyeyeballs-2.4.4-py3-none-any.whl (14 kB)
Using cached aiosignal-1.3.2-py2.py3-none-any.whl (7.6 kB)
Using cached async_timeout-5.0.1-py3-none-any.whl (6.2 kB)
Using cached cryptography-44.0.0-cp39-abi3-macosx_10_9_universal2.whl (6.5 MB)
Using cached frozenlist-1.5.0-cp310-cp310-macosx_11_0_arm64.whl (52 kB)
Using cached jmespath-0.10.0-py2.py3-none-any.whl (24 kB)
Using cached multidict-6.1.0-cp310-cp310-macosx_11_0_arm64.whl (29 kB)
Using cached oauthlib-3.2.2-py3-none-any.whl (151 kB)
Using cached propcache-0.2.1-cp310-cp310-macosx_11_0_arm64.whl (45 kB)
Using cached pyasn1-0.6.1-py3-none-any.whl (83 kB)
Using cached yarl-1.18.3-cp310-cp310-macosx_11_0_arm64.whl (92 kB)
Installing collected packages: wget, sortedcontainers, pytz, pydub, mpmath, flatbuffers, crcmod, antlr4-python3-runtime, addict, zipp, xxhash, websockets, urllib3, tzdata, typing-extensions, tqdm, tomlkit, tomli, threadpoolctl, tensorboard-data-server, sympy, soupsieve, sniffio, six, simplejson, shellingham, semantic-version, safetensors, ruff, ruamel.yaml.clib, regex, pyyaml, python-multipart, PySocks, pyparsing, pygments, pycryptodome, pycparser, pyasn1, pyarrow-hotfix, pyarrow, protobuf, propcache, platformdirs, Pillow, packaging, orjson, oauthlib, numpy, networkx, msgpack, more-itertools, mdurl, markupsafe, markdown, llvmlite, kiwisolver, joblib, jmespath, importlib-resources, idna, humanfriendly, h11, grpcio, gast, fsspec, frozenlist, fonttools, filelock, ffmpy, exceptiongroup, einops, dill, decorator, cython, cycler, click, charset-normalizer, certifi, cachetools, audioread, attrs, async-timeout, annotated-types, aiohappyeyeballs, aiofiles, absl-py, yapf, werkzeug, uvicorn, typeguard, soxr, scipy, ruamel.yaml, rsa, requests, pyworld, python-dateutil, pydantic-core, pyasn1-modules, onnx, omegaconf, numba, multiprocess, multidict, markdown-it-py, lightning-utilities, lazy-loader, jinja2, importlib-metadata, httpcore, grpcio-tools, contourpy, coloredlogs, cffi, beautifulsoup4, anyio, aiosignal, yarl, WeTextProcessing, torch, tiktoken, starlette, soundfile, scikit-learn, rich, requests-oauthlib, pydantic, pooch, pandas, onnxruntime, matplotlib, inflect, HyperPyYAML, hydra-core, huggingface-hub, httpx, google-auth, cryptography, typer, torchmetrics, torchaudio, tokenizers, safehttpx, openai-whisper, librosa, gradio-client, google-auth-oauthlib, gdown, fastapi, diffusers, conformer, aliyun-python-sdk-core, aiohttp, transformers, tensorboard, gradio, fastapi-cli, aliyun-python-sdk-kms, pytorch-lightning, oss2, datasets, modelscope, lightning
Successfully installed HyperPyYAML-1.2.2 Pillow-11.1.0 PySocks-1.7.1 WeTextProcessing-1.0.3 absl-py-2.1.0 addict-2.4.0 aiofiles-23.2.1 aiohappyeyeballs-2.4.4 aiohttp-3.11.11 aiosignal-1.3.2 aliyun-python-sdk-core-2.16.0 aliyun-python-sdk-kms-2.16.5 annotated-types-0.7.0 antlr4-python3-runtime-4.9.3 anyio-4.8.0 async-timeout-5.0.1 attrs-25.1.0 audioread-3.0.1 beautifulsoup4-4.12.3 cachetools-5.5.1 certifi-2024.12.14 cffi-1.17.1 charset-normalizer-3.4.1 click-8.1.8 coloredlogs-15.0.1 conformer-0.3.2 contourpy-1.3.1 crcmod-1.7 cryptography-44.0.0 cycler-0.12.1 cython-3.0.11 datasets-2.18.0 decorator-5.1.1 diffusers-0.29.0 dill-0.3.8 einops-0.8.0 exceptiongroup-1.2.2 fastapi-0.115.6 fastapi-cli-0.0.4 ffmpy-0.5.0 filelock-3.17.0 flatbuffers-25.1.24 fonttools-4.55.8 frozenlist-1.5.0 fsspec-2024.2.0 gast-0.6.0 gdown-5.1.0 google-auth-2.38.0 google-auth-oauthlib-1.0.0 gradio-5.4.0 gradio-client-1.4.2 grpcio-1.57.0 grpcio-tools-1.57.0 h11-0.14.0 httpcore-1.0.7 httpx-0.28.1 huggingface-hub-0.28.0 humanfriendly-10.0 hydra-core-1.3.2 idna-3.10 importlib-metadata-8.6.1 importlib-resources-6.5.2 inflect-7.3.1 jinja2-3.1.5 jmespath-0.10.0 joblib-1.4.2 kiwisolver-1.4.8 lazy-loader-0.4 librosa-0.10.2 lightning-2.2.4 lightning-utilities-0.11.9 llvmlite-0.44.0 markdown-3.7 markdown-it-py-3.0.0 markupsafe-2.1.5 matplotlib-3.7.5 mdurl-0.1.2 modelscope-1.15.0 more-itertools-10.6.0 mpmath-1.3.0 msgpack-1.1.0 multidict-6.1.0 multiprocess-0.70.16 networkx-3.1 numba-0.61.0 numpy-1.26.4 oauthlib-3.2.2 omegaconf-2.3.0 onnx-1.16.0 onnxruntime-1.18.0 openai-whisper-20231117 orjson-3.10.15 oss2-2.19.1 packaging-24.2 pandas-2.2.3 platformdirs-4.3.6 pooch-1.8.2 propcache-0.2.1 protobuf-4.25.0 pyarrow-19.0.0 pyarrow-hotfix-0.6 pyasn1-0.6.1 pyasn1-modules-0.4.1 pycparser-2.22 pycryptodome-3.21.0 pydantic-2.7.0 pydantic-core-2.18.1 pydub-0.25.1 pygments-2.19.1 pyparsing-3.2.1 python-dateutil-2.9.0.post0 python-multipart-0.0.12 pytorch-lightning-2.5.0.post0 pytz-2024.2 pyworld-0.3.4 pyyaml-6.0.2 regex-2024.11.6 requests-2.32.3 requests-oauthlib-2.0.0 rich-13.7.1 rsa-4.9 ruamel.yaml-0.18.10 ruamel.yaml.clib-0.2.12 ruff-0.9.3 safehttpx-0.1.6 safetensors-0.5.2 scikit-learn-1.6.1 scipy-1.15.1 semantic-version-2.10.0 shellingham-1.5.4 simplejson-3.19.3 six-1.17.0 sniffio-1.3.1 sortedcontainers-2.4.0 soundfile-0.12.1 soupsieve-2.6 soxr-0.5.0.post1 starlette-0.41.3 sympy-1.13.3 tensorboard-2.14.0 tensorboard-data-server-0.7.2 threadpoolctl-3.5.0 tiktoken-0.8.0 tokenizers-0.19.1 tomli-2.2.1 tomlkit-0.12.0 torch-2.3.1 torchaudio-2.3.1 torchmetrics-1.6.1 tqdm-4.67.1 transformers-4.40.1 typeguard-4.4.1 typer-0.15.1 typing-extensions-4.12.2 tzdata-2025.1 urllib3-2.3.0 uvicorn-0.30.0 websockets-12.0 werkzeug-3.1.3 wget-3.2 xxhash-3.5.0 yapf-0.43.0 yarl-1.18.3 zipp-3.21.0
8. モデルをダウンロードする
ここではGit LFSを使います(未インストールの場合はbrew install git-lfs && git lfs install
してください)
(cosyvoice) ysic@m4macmini CosyVoice % mkdir -p pretrained_models
(cosyvoice) ysic@m4macmini CosyVoice % git clone https://www.modelscope.cn/iic/CosyVoice2-0.5B.git pretrained_models/CosyVoice2-0.5B
Cloning into 'pretrained_models/CosyVoice2-0.5B'...
remote: Enumerating objects: 77, done.
remote: Counting objects: 100% (77/77), done.
remote: Compressing objects: 100% (61/61), done.
remote: Total 77 (delta 26), reused 51 (delta 13), pack-reused 0
Receiving objects: 100% (77/77), 1.72 MiB | 4.33 MiB/s, done.
Resolving deltas: 100% (26/26), done.
Filtering content: 100% (12/12), 4.80 GiB | 73.96 MiB/s, done.
生成してみる
READMEを参考に書いたコードは以下
import sys
sys.path.append("third_party/Matcha-TTS")
from cosyvoice.cli.cosyvoice import CosyVoice2
from cosyvoice.utils.file_utils import load_wav
import torchaudio
cosyvoice = CosyVoice2(
"pretrained_models/CosyVoice2-0.5B", load_jit=False, load_trt=False, fp16=False
)
prompt_speech_16k = load_wav(
"./asset/ITAcorpus_amitaro_yofukashi_1.1/44.1k/emo/EMOTION100_058.wav", 16000
)
def text_generator():
yield "あのイーハトーヴォのすきとおったかぜ、"
yield "夏でも底に冷たさをもつ青いそら、"
yield "うつくしい森でかざられたモリーオ市、"
yield "郊外のぎらぎらひかる草の波。"
for i, j in enumerate(
cosyvoice.inference_zero_shot(
text_generator(),
"すみません、この辺に詳しくないんです。",
prompt_speech_16k,
stream=False,
)
):
torchaudio.save("test.wav".format(i), j["tts_speech"], cosyvoice.sample_rate)
元となる音声はあみたろの声素材工房様よりITAコーパス読み上げ音声をお借りしました。
実行してみると、
(cosyvoice) ysic@m4macmini CosyVoice % python test.py
2025-01-30 16:44:21,566 - modelscope - INFO - PyTorch version 2.3.1 Found.
2025-01-30 16:44:21,566 - modelscope - INFO - Loading ast index from /Users/ysic/.cache/modelscope/ast_indexer
2025-01-30 16:44:21,587 - modelscope - INFO - Loading done! Current index file version is 1.15.0, with md5 a146a4c7ddc066c8ec70acfe431b42ad and a total number of 980 components indexed
failed to import ttsfrd, use WeTextProcessing instead
/opt/homebrew/Caskroom/miniconda/base/envs/cosyvoice/lib/python3.10/site-packages/diffusers/models/lora.py:393: FutureWarning: `LoRACompatibleLinear` is deprecated and will be removed in version 1.0.0. Use of `LoRACompatibleLinear` is deprecated. Please switch to PEFT backend by installing PEFT: `pip install peft`.
deprecate("LoRACompatibleLinear", "1.0.0", deprecation_message)
2025-01-30 16:44:24,698 INFO input frame rate=25
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
2025-01-30 16:44:25,374 WETEXT INFO building fst for zh_normalizer ...
2025-01-30 16:44:25,374 INFO building fst for zh_normalizer ...
2025-01-30 16:44:32,946 WETEXT INFO done
2025-01-30 16:44:32,946 INFO done
2025-01-30 16:44:32,946 WETEXT INFO fst path: /opt/homebrew/Caskroom/miniconda/base/envs/cosyvoice/lib/python3.10/site-packages/tn/zh_tn_tagger.fst
2025-01-30 16:44:32,946 INFO fst path: /opt/homebrew/Caskroom/miniconda/base/envs/cosyvoice/lib/python3.10/site-packages/tn/zh_tn_tagger.fst
2025-01-30 16:44:32,946 WETEXT INFO /opt/homebrew/Caskroom/miniconda/base/envs/cosyvoice/lib/python3.10/site-packages/tn/zh_tn_verbalizer.fst
2025-01-30 16:44:32,946 INFO /opt/homebrew/Caskroom/miniconda/base/envs/cosyvoice/lib/python3.10/site-packages/tn/zh_tn_verbalizer.fst
2025-01-30 16:44:32,948 WETEXT INFO found existing fst: /opt/homebrew/Caskroom/miniconda/base/envs/cosyvoice/lib/python3.10/site-packages/tn/en_tn_tagger.fst
2025-01-30 16:44:32,948 INFO found existing fst: /opt/homebrew/Caskroom/miniconda/base/envs/cosyvoice/lib/python3.10/site-packages/tn/en_tn_tagger.fst
2025-01-30 16:44:32,948 WETEXT INFO /opt/homebrew/Caskroom/miniconda/base/envs/cosyvoice/lib/python3.10/site-packages/tn/en_tn_verbalizer.fst
2025-01-30 16:44:32,948 INFO /opt/homebrew/Caskroom/miniconda/base/envs/cosyvoice/lib/python3.10/site-packages/tn/en_tn_verbalizer.fst
2025-01-30 16:44:32,948 WETEXT INFO skip building fst for en_normalizer ...
2025-01-30 16:44:32,948 INFO skip building fst for en_normalizer ...
2025-01-30 16:44:33,610 INFO get tts_text generator, will skip text_normalize!
0%| | 0/1 [00:00<?, ?it/s]2025-01-30 16:44:33,753 INFO get tts_text generator, will return _extract_text_token_generator!
2025-01-30 16:44:33,987 INFO synthesis text <generator object text_generator at 0x37e8ebca0>
2025-01-30 16:44:33,988 INFO append 5 text token 15 speech token
2025-01-30 16:44:33,988 INFO append 5 text token 15 speech token
2025-01-30 16:44:33,988 INFO not enough text token to decode, wait for more
2025-01-30 16:44:33,988 INFO append 5 text token 15 speech token
2025-01-30 16:44:33,989 INFO not enough text token to decode, wait for more
2025-01-30 16:44:33,989 INFO not enough text token to decode, wait for more
2025-01-30 16:44:33,989 INFO not enough text token to decode, wait for more
2025-01-30 16:44:33,989 INFO not enough text token to decode, wait for more
2025-01-30 16:44:33,989 INFO not enough text token to decode, wait for more
2025-01-30 16:44:33,989 INFO append 5 text token 15 speech token
2025-01-30 16:44:33,989 INFO not enough text token to decode, wait for more
2025-01-30 16:44:33,989 INFO not enough text token to decode, wait for more
2025-01-30 16:44:33,989 INFO not enough text token to decode, wait for more
2025-01-30 16:44:33,989 INFO not enough text token to decode, wait for more
2025-01-30 16:44:33,989 INFO not enough text token to decode, wait for more
2025-01-30 16:44:33,989 INFO append 5 text token 15 speech token
2025-01-30 16:44:34,171 INFO fill_token index 0 next fill_token index 16
2025-01-30 16:44:34,171 INFO get fill token, need to append more text token
2025-01-30 16:44:34,171 INFO not enough text token to decode, wait for more
2025-01-30 16:44:34,171 INFO get fill token, need to append more text token
2025-01-30 16:44:34,171 INFO not enough text token to decode, wait for more
2025-01-30 16:44:34,171 INFO get fill token, need to append more text token
2025-01-30 16:44:34,171 INFO not enough text token to decode, wait for more
2025-01-30 16:44:34,171 INFO get fill token, need to append more text token
2025-01-30 16:44:34,171 INFO not enough text token to decode, wait for more
2025-01-30 16:44:34,171 INFO get fill token, need to append more text token
2025-01-30 16:44:34,171 INFO append 5 text token
2025-01-30 16:44:34,657 INFO fill_token index 16 next fill_token index 32
2025-01-30 16:44:34,657 INFO get fill token, need to append more text token
2025-01-30 16:44:34,657 INFO not enough text token to decode, wait for more
2025-01-30 16:44:34,657 INFO get fill token, need to append more text token
2025-01-30 16:44:34,657 INFO not enough text token to decode, wait for more
2025-01-30 16:44:34,657 INFO get fill token, need to append more text token
2025-01-30 16:44:34,657 INFO not enough text token to decode, wait for more
2025-01-30 16:44:34,657 INFO get fill token, need to append more text token
2025-01-30 16:44:34,657 INFO not enough text token to decode, wait for more
2025-01-30 16:44:34,657 INFO get fill token, need to append more text token
2025-01-30 16:44:34,657 INFO append 5 text token
2025-01-30 16:44:35,160 INFO fill_token index 32 next fill_token index 48
2025-01-30 16:44:35,160 INFO get fill token, need to append more text token
2025-01-30 16:44:35,160 INFO not enough text token to decode, wait for more
2025-01-30 16:44:35,160 INFO get fill token, need to append more text token
2025-01-30 16:44:35,160 INFO not enough text token to decode, wait for more
2025-01-30 16:44:35,160 INFO get fill token, need to append more text token
2025-01-30 16:44:35,161 INFO not enough text token to decode, wait for more
2025-01-30 16:44:35,161 INFO get fill token, need to append more text token
2025-01-30 16:44:35,161 INFO not enough text token to decode, wait for more
2025-01-30 16:44:35,161 INFO get fill token, need to append more text token
2025-01-30 16:44:35,161 INFO append 5 text token
2025-01-30 16:44:35,661 INFO fill_token index 48 next fill_token index 64
2025-01-30 16:44:35,661 INFO get fill token, need to append more text token
2025-01-30 16:44:35,661 INFO not enough text token to decode, wait for more
2025-01-30 16:44:35,661 INFO get fill token, need to append more text token
2025-01-30 16:44:35,661 INFO not enough text token to decode, wait for more
2025-01-30 16:44:35,661 INFO get fill token, need to append more text token
2025-01-30 16:44:35,661 INFO not enough text token to decode, wait for more
2025-01-30 16:44:35,661 INFO get fill token, need to append more text token
2025-01-30 16:44:35,661 INFO not enough text token to decode, wait for more
2025-01-30 16:44:35,661 INFO get fill token, need to append more text token
2025-01-30 16:44:35,661 INFO append 5 text token
2025-01-30 16:44:36,152 INFO fill_token index 64 next fill_token index 80
2025-01-30 16:44:36,152 INFO get fill token, need to append more text token
2025-01-30 16:44:36,152 INFO not enough text token to decode, wait for more
2025-01-30 16:44:36,152 INFO get fill token, need to append more text token
2025-01-30 16:44:36,152 INFO not enough text token to decode, wait for more
2025-01-30 16:44:36,152 INFO get fill token, need to append more text token
2025-01-30 16:44:36,152 INFO not enough text token to decode, wait for more
2025-01-30 16:44:36,152 INFO get fill token, need to append more text token
2025-01-30 16:44:36,152 INFO not enough text token to decode, wait for more
2025-01-30 16:44:36,152 INFO get fill token, need to append more text token
2025-01-30 16:44:36,152 INFO append 5 text token
2025-01-30 16:44:36,657 INFO fill_token index 80 next fill_token index 96
2025-01-30 16:44:36,657 INFO get fill token, need to append more text token
2025-01-30 16:44:36,657 INFO not enough text token to decode, wait for more
2025-01-30 16:44:36,657 INFO get fill token, need to append more text token
2025-01-30 16:44:36,657 INFO not enough text token to decode, wait for more
2025-01-30 16:44:36,657 INFO get fill token, need to append more text token
2025-01-30 16:44:36,657 INFO not enough text token to decode, wait for more
2025-01-30 16:44:36,657 INFO get fill token, need to append more text token
2025-01-30 16:44:36,657 INFO not enough text token to decode, wait for more
2025-01-30 16:44:36,657 INFO get fill token, need to append more text token
2025-01-30 16:44:36,657 INFO append 5 text token
2025-01-30 16:44:37,193 INFO fill_token index 96 next fill_token index 112
2025-01-30 16:44:37,193 INFO get fill token, need to append more text token
2025-01-30 16:44:37,193 INFO not enough text token to decode, wait for more
2025-01-30 16:44:37,193 INFO get fill token, need to append more text token
2025-01-30 16:44:37,193 INFO not enough text token to decode, wait for more
2025-01-30 16:44:37,193 INFO get fill token, need to append more text token
2025-01-30 16:44:37,193 INFO not enough text token to decode, wait for more
2025-01-30 16:44:37,193 INFO get fill token, need to append more text token
2025-01-30 16:44:37,193 INFO not enough text token to decode, wait for more
2025-01-30 16:44:37,193 INFO get fill token, need to append more text token
2025-01-30 16:44:37,193 INFO append 5 text token
2025-01-30 16:44:37,722 INFO fill_token index 112 next fill_token index 128
2025-01-30 16:44:37,722 INFO get fill token, need to append more text token
2025-01-30 16:44:37,722 INFO not enough text token to decode, wait for more
2025-01-30 16:44:37,722 INFO get fill token, need to append more text token
2025-01-30 16:44:37,722 INFO not enough text token to decode, wait for more
2025-01-30 16:44:37,722 INFO get fill token, need to append more text token
2025-01-30 16:44:37,722 INFO not enough text token to decode, wait for more
2025-01-30 16:44:37,722 INFO get fill token, need to append more text token
2025-01-30 16:44:37,722 INFO not enough text token to decode, wait for more
2025-01-30 16:44:37,722 INFO get fill token, need to append more text token
2025-01-30 16:44:37,722 INFO append 5 text token
2025-01-30 16:44:38,304 INFO fill_token index 128 next fill_token index 144
2025-01-30 16:44:38,304 INFO get fill token, need to append more text token
2025-01-30 16:44:38,304 INFO not enough text token to decode, wait for more
2025-01-30 16:44:38,304 INFO get fill token, need to append more text token
2025-01-30 16:44:38,304 INFO not enough text token to decode, wait for more
2025-01-30 16:44:38,304 INFO get fill token, need to append more text token
2025-01-30 16:44:38,304 INFO not enough text token to decode, wait for more
2025-01-30 16:44:38,304 INFO get fill token, need to append more text token
2025-01-30 16:44:38,304 INFO not enough text token to decode, wait for more
2025-01-30 16:44:38,304 INFO get fill token, need to append more text token
2025-01-30 16:44:38,304 INFO append 5 text token
2025-01-30 16:44:39,038 INFO fill_token index 144 next fill_token index 160
2025-01-30 16:44:39,039 INFO get fill token, need to append more text token
2025-01-30 16:44:39,039 INFO not enough text token to decode, wait for more
2025-01-30 16:44:39,039 INFO get fill token, need to append more text token
2025-01-30 16:44:39,039 INFO not enough text token to decode, wait for more
2025-01-30 16:44:39,039 INFO no more text token, decode until met eos
2025-01-30 16:44:51,109 INFO yield speech len 10.96, rtf 1.5622105911700395
100%|██████████████████████████████████████████████| 1/1 [00:17<00:00, 17.36s/it]
生成された音声ファイルはこちら
まとめ
今回は試せなかったのですが、複数言語が入り混じったものや感情を込めた音声を生成できるみたいです。
公式のデモではこんなこともできるの!?と驚くものもあれば、まだまだ発展途上なところもあってこれからが楽しみですね。
次回はAlibaba Cloud上で提供されているIntelligent Speech Interactionを試してみようと思います。
Discussion