🔊

音声生成モデルのCosyVoiceを使ってみる

2025/02/05に公開

多言語の音声合成モデルCosyVoice2

アリババグループが開発している多言語の大規模音声合成モデルCosyVoice 2を試してみました。

https://funaudiollm.github.io/cosyvoice2/

ゼロショットインコンテキスト生成、クロスランゲージインコンテキスト生成、感情豊かな音声生成など、多様な音声生成シナリオに対応しているとのこと。

次の特徴があります。

  • 超低遅延
    CosyVoice 2はオフラインとストリーミングの両方のモデリングを統合し、双方向のストリーミング音声合成をサポートします。初期のパケット合成レイテンシは150msの低遅延で、品質低下も最小限に抑えられるとのことです。
  • 高精度
    バージョン1.0と比べて発音エラーを30-50%削減し、Seed-TTS評価セットのハードテストセットで最も低い文字エラー率を達成しています。
  • 安定性
    ゼロショット音声生成やクロスランゲージ音声合成において、一貫性のある優れた声質を得ることができます。バージョン1.0と比較してクロスランゲージ合成が大幅に改善されています。
  • 自然な音声
    CosyVoice 2では韻律、音質、感情の整合性が大幅に向上しました。MOS評価スコアは5.4から5.53に上昇し、商用レベルの大規模音声合成モデルと同等のスコアを達成しています。また、より細かい感情制御や方言アクセント調整をサポートするために、制御可能な音声合成機能がアップグレードされています。

https://github.com/FunAudioLLM/CosyVoice

手元のmacで実行してみます

準備

1. リポジトリをクローンする

git clone --recursive https://github.com/FunAudioLLM/CosyVoice.git && cd "$(basename "$_" .git)"

ysic@m4macmini ~ % git clone --recursive https://github.com/FunAudioLLM/CosyVoice.git && cd "$(basename "$_" .git)"
Cloning into 'CosyVoice'...
remote: Enumerating objects: 1801, done.
remote: Counting objects: 100% (1001/1001), done.
remote: Compressing objects: 100% (377/377), done.
remote: Total 1801 (delta 831), reused 627 (delta 624), pack-reused 800 (from 2)
Receiving objects: 100% (1801/1801), 1.49 MiB | 14.85 MiB/s, done.
Resolving deltas: 100% (1119/1119), done.
Submodule 'third_party/Matcha-TTS' (https://github.com/shivammehta25/Matcha-TTS.git) registered for path 'third_party/Matcha-TTS'
Cloning into '/Users/ysic/CosyVoice/third_party/Matcha-TTS'...
remote: Enumerating objects: 1068, done.
remote: Counting objects: 100% (480/480), done.
remote: Compressing objects: 100% (171/171), done.
remote: Total 1068 (delta 385), reused 318 (delta 309), pack-reused 588 (from 2)
Receiving objects: 100% (1068/1068), 64.11 MiB | 22.33 MiB/s, done.
Resolving deltas: 100% (517/517), done.
Submodule path 'third_party/Matcha-TTS': checked out 'dd9105b34bf2be2230f4aa1e4769fb586a3c824e'

2. Minicondaをインストール

brew install --cask miniconda

ysic@m4macmini CosyVoice % brew install --cask miniconda
==> Downloading https://formulae.brew.sh/api/cask.jws.json
==> Caveats
Please run the following to setup your shell:
  conda init "$(basename "${SHELL}")"

Alternatively, manually add the following to your shell init:
  eval "$(conda "shell.$(basename "${SHELL}")" hook)"

==> Downloading https://repo.anaconda.com/miniconda/Miniconda3-py312_24.11.1-0-MacOSX-arm64.sh
Already downloaded: /Users/ysic/Library/Caches/Homebrew/downloads/855aebb3ebaf629877f030f6fd69b50da4d9a930f0a07c0ec81f47c292298d18--Miniconda3-py312_24.11.1-0-MacOSX-arm64.sh
==> Installing Cask miniconda
==> Running installer script 'Miniconda3-py312_24.11.1-0-MacOSX-arm64.sh'
PREFIX=/opt/homebrew/Caskroom/miniconda/base
Unpacking payload ...

Installing base environment...

Preparing transaction: ...working... done
Executing transaction: ...working...
done
installation finished.
==> Linking Binary 'conda' to '/opt/homebrew/bin/conda'
🍺  miniconda was successfully installed!

3. condaをシェルで有効化します

eval "$(conda "shell.$(basename "${SHELL}")" hook)"

4. condaで仮想環境を作成します

conda create -n cosyvoice -y python=3.10

(base) ysic@m4macmini CosyVoice % conda create -n cosyvoice -y python=3.10
Channels:
 - defaults
Platform: osx-arm64
Collecting package metadata (repodata.json): done
Solving environment: done

## Package Plan ##

  environment location: /opt/homebrew/Caskroom/miniconda/base/envs/cosyvoice

  added / updated specs:
    - python=3.10

The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    ca-certificates-2024.12.31 |       hca03da5_0         129 KB
    pip-25.0                   |  py310hca03da5_0         2.3 MB
    python-3.10.16             |       hb885b13_1        12.0 MB
    setuptools-75.8.0          |  py310hca03da5_0         1.6 MB
    tzdata-2025a               |       h04d1e81_0         117 KB
    wheel-0.45.1               |  py310hca03da5_0         116 KB
    ------------------------------------------------------------
                                           Total:        16.3 MB

The following NEW packages will be INSTALLED:

  bzip2              pkgs/main/osx-arm64::bzip2-1.0.8-h80987f9_6
  ca-certificates    pkgs/main/osx-arm64::ca-certificates-2024.12.31-hca03da5_0
  libffi             pkgs/main/osx-arm64::libffi-3.4.4-hca03da5_1
  ncurses            pkgs/main/osx-arm64::ncurses-6.4-h313beb8_0
  openssl            pkgs/main/osx-arm64::openssl-3.0.15-h80987f9_0
  pip                pkgs/main/osx-arm64::pip-25.0-py310hca03da5_0
  python             pkgs/main/osx-arm64::python-3.10.16-hb885b13_1
  readline           pkgs/main/osx-arm64::readline-8.2-h1a28f6b_0
  setuptools         pkgs/main/osx-arm64::setuptools-75.8.0-py310hca03da5_0
  sqlite             pkgs/main/osx-arm64::sqlite-3.45.3-h80987f9_0
  tk                 pkgs/main/osx-arm64::tk-8.6.14-h6ba3021_0
  tzdata             pkgs/main/noarch::tzdata-2025a-h04d1e81_0
  wheel              pkgs/main/osx-arm64::wheel-0.45.1-py310hca03da5_0
  xz                 pkgs/main/osx-arm64::xz-5.4.6-h80987f9_1
  zlib               pkgs/main/osx-arm64::zlib-1.2.13-h18a0788_1

Downloading and Extracting Packages:

Preparing transaction: done
Verifying transaction: done
Executing transaction: done
#
# To activate this environment, use
#
#     $ conda activate cosyvoice
#
# To deactivate an active environment, use
#
#     $ conda deactivate

5. 仮想環境に入ります

conda activate cosyvoice

6. conda-forgeからpyniniをインストールします

conda install -y -c conda-forge pynini==2.1.5

(cosyvoice) ysic@m4macmini CosyVoice % conda install -y -c conda-forge pynini==2.1.5
Channels:
 - conda-forge
 - defaults
Platform: osx-arm64
Collecting package metadata (repodata.json): done
Solving environment: done

## Package Plan ##

  environment location: /opt/homebrew/Caskroom/miniconda/base/envs/cosyvoice

  added / updated specs:
    - pynini==2.1.5

The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    atk-1.0-2.38.0             |       hd03087b_2         339 KB  conda-forge
    cairo-1.18.2               |       h6a3b0d2_1         874 KB  conda-forge
    font-ttf-dejavu-sans-mono-2.37|       hab24e00_0         388 KB  conda-forge
    font-ttf-inconsolata-3.000 |       h77eed37_0          94 KB  conda-forge
    font-ttf-source-code-pro-2.038|       h77eed37_0         684 KB  conda-forge
    font-ttf-ubuntu-0.83       |       h77eed37_3         1.5 MB  conda-forge
    fontconfig-2.15.0          |       h1383a14_1         229 KB  conda-forge
    fonts-conda-ecosystem-1    |                0           4 KB  conda-forge
    fonts-conda-forge-1        |                0           4 KB  conda-forge
    freetype-2.12.1            |       hadb7bae_2         582 KB  conda-forge
    fribidi-1.0.10             |       h27ca646_0          59 KB  conda-forge
    gdk-pixbuf-2.42.12         |       h7ddc832_0         498 KB  conda-forge
    graphite2-1.3.13           |    hebf3989_1003          78 KB  conda-forge
    graphviz-12.0.0            |       hbf8cc41_0         4.8 MB  conda-forge
    gtk2-2.24.33               |       hc5c4cae_7         5.9 MB  conda-forge
    gts-0.7.6                  |       he42f4ea_4         297 KB  conda-forge
    harfbuzz-10.2.0            |       ha0dd535_0         1.4 MB  conda-forge
    icu-75.1                   |       hfee45f7_0        11.3 MB  conda-forge
    lerc-4.0.0                 |       h9a09cb3_0         211 KB  conda-forge
    libcxx-19.1.7              |       ha82da77_0         511 KB  conda-forge
    libdeflate-1.22            |       hd74edd7_0          53 KB  conda-forge
    libexpat-2.6.4             |       h286801f_0          63 KB  conda-forge
    libgd-2.3.3                |      hb2c3a21_11         153 KB  conda-forge
    libglib-2.82.2             |       hdff4504_1         3.5 MB  conda-forge
    libiconv-1.17              |       h0d3ecfb_2         661 KB  conda-forge
    libintl-0.22.5             |       h8414b35_3          79 KB  conda-forge
    libjpeg-turbo-3.0.0        |       hb547adb_1         535 KB  conda-forge
    libpng-1.6.46              |       h3783ad8_0         260 KB  conda-forge
    librsvg-2.58.4             |       h266df6f_2         4.5 MB  conda-forge
    libsqlite-3.45.2           |       h091b4b1_0         806 KB  conda-forge
    libtiff-4.7.0              |       hfce79cd_1         358 KB  conda-forge
    libwebp-base-1.5.0         |       h2471fea_0         283 KB  conda-forge
    libxml2-2.13.5             |       hbbdcc80_0         569 KB  conda-forge
    libzlib-1.3.1              |       h8359307_2          45 KB  conda-forge
    openfst-1.8.2              |       hdb0ca01_2         6.1 MB  conda-forge
    openssl-3.4.0              |       h81ee809_1         2.8 MB  conda-forge
    pango-1.56.1               |       h73f1e88_0         414 KB  conda-forge
    pcre2-10.44                |       h297a79d_2         604 KB  conda-forge
    pixman-0.44.2              |       h2f9eb0b_0         196 KB  conda-forge
    pynini-2.1.5               |  py310h38f39d4_6         1.1 MB  conda-forge
    python-3.10.13             |h2469fbe_1_cpython        11.1 MB  conda-forge
    python_abi-3.10            |          5_cp310           6 KB  conda-forge
    sqlite-3.45.2              |       hf2abe2d_0         793 KB  conda-forge
    tk-8.6.13                  |       h5083fa2_1         3.0 MB  conda-forge
    zlib-1.3.1                 |       h8359307_2          76 KB  conda-forge
    zstd-1.5.6                 |       hb46c0d2_0         396 KB  conda-forge
    ------------------------------------------------------------
                                           Total:        68.0 MB

The following NEW packages will be INSTALLED:

  atk-1.0            conda-forge/osx-arm64::atk-1.0-2.38.0-hd03087b_2
  cairo              conda-forge/osx-arm64::cairo-1.18.2-h6a3b0d2_1
  font-ttf-dejavu-s~ conda-forge/noarch::font-ttf-dejavu-sans-mono-2.37-hab24e00_0
  font-ttf-inconsol~ conda-forge/noarch::font-ttf-inconsolata-3.000-h77eed37_0
  font-ttf-source-c~ conda-forge/noarch::font-ttf-source-code-pro-2.038-h77eed37_0
  font-ttf-ubuntu    conda-forge/noarch::font-ttf-ubuntu-0.83-h77eed37_3
  fontconfig         conda-forge/osx-arm64::fontconfig-2.15.0-h1383a14_1
  fonts-conda-ecosy~ conda-forge/noarch::fonts-conda-ecosystem-1-0
  fonts-conda-forge  conda-forge/noarch::fonts-conda-forge-1-0
  freetype           conda-forge/osx-arm64::freetype-2.12.1-hadb7bae_2
  fribidi            conda-forge/osx-arm64::fribidi-1.0.10-h27ca646_0
  gdk-pixbuf         conda-forge/osx-arm64::gdk-pixbuf-2.42.12-h7ddc832_0
  graphite2          conda-forge/osx-arm64::graphite2-1.3.13-hebf3989_1003
  graphviz           conda-forge/osx-arm64::graphviz-12.0.0-hbf8cc41_0
  gtk2               conda-forge/osx-arm64::gtk2-2.24.33-hc5c4cae_7
  gts                conda-forge/osx-arm64::gts-0.7.6-he42f4ea_4
  harfbuzz           conda-forge/osx-arm64::harfbuzz-10.2.0-ha0dd535_0
  icu                conda-forge/osx-arm64::icu-75.1-hfee45f7_0
  lerc               conda-forge/osx-arm64::lerc-4.0.0-h9a09cb3_0
  libcxx             conda-forge/osx-arm64::libcxx-19.1.7-ha82da77_0
  libdeflate         conda-forge/osx-arm64::libdeflate-1.22-hd74edd7_0
  libexpat           conda-forge/osx-arm64::libexpat-2.6.4-h286801f_0
  libgd              conda-forge/osx-arm64::libgd-2.3.3-hb2c3a21_11
  libglib            conda-forge/osx-arm64::libglib-2.82.2-hdff4504_1
  libiconv           conda-forge/osx-arm64::libiconv-1.17-h0d3ecfb_2
  libintl            conda-forge/osx-arm64::libintl-0.22.5-h8414b35_3
  libjpeg-turbo      conda-forge/osx-arm64::libjpeg-turbo-3.0.0-hb547adb_1
  libpng             conda-forge/osx-arm64::libpng-1.6.46-h3783ad8_0
  librsvg            conda-forge/osx-arm64::librsvg-2.58.4-h266df6f_2
  libsqlite          conda-forge/osx-arm64::libsqlite-3.45.2-h091b4b1_0
  libtiff            conda-forge/osx-arm64::libtiff-4.7.0-hfce79cd_1
  libwebp-base       conda-forge/osx-arm64::libwebp-base-1.5.0-h2471fea_0
  libxml2            conda-forge/osx-arm64::libxml2-2.13.5-hbbdcc80_0
  libzlib            conda-forge/osx-arm64::libzlib-1.3.1-h8359307_2
  openfst            conda-forge/osx-arm64::openfst-1.8.2-hdb0ca01_2
  pango              conda-forge/osx-arm64::pango-1.56.1-h73f1e88_0
  pcre2              conda-forge/osx-arm64::pcre2-10.44-h297a79d_2
  pixman             conda-forge/osx-arm64::pixman-0.44.2-h2f9eb0b_0
  pynini             conda-forge/osx-arm64::pynini-2.1.5-py310h38f39d4_6
  python_abi         conda-forge/osx-arm64::python_abi-3.10-5_cp310
  zstd               conda-forge/osx-arm64::zstd-1.5.6-hb46c0d2_0

The following packages will be UPDATED:

  openssl              pkgs/main::openssl-3.0.15-h80987f9_0 --> conda-forge::openssl-3.4.0-h81ee809_1
  zlib                    pkgs/main::zlib-1.2.13-h18a0788_1 --> conda-forge::zlib-1.3.1-h8359307_2

The following packages will be SUPERSEDED by a higher-priority channel:

  python               pkgs/main::python-3.10.16-hb885b13_1 --> conda-forge::python-3.10.13-h2469fbe_1_cpython
  sqlite                pkgs/main::sqlite-3.45.3-h80987f9_0 --> conda-forge::sqlite-3.45.2-hf2abe2d_0
  tk                        pkgs/main::tk-8.6.14-h6ba3021_0 --> conda-forge::tk-8.6.13-h5083fa2_1

Downloading and Extracting Packages:

Preparing transaction: done
Verifying transaction: done
Executing transaction: |
/
done

7. pip install -r requirements.txt

(cosyvoice) ysic@m4macmini CosyVoice % pip install -r requirements.txt
Looking in indexes: https://pypi.org/simple, https://download.pytorch.org/whl/cu121, https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/onnxruntime-cuda-12/pypi/simple/
Ignoring deepspeed: markers 'sys_platform == "linux"' don't match your environment
Ignoring onnxruntime-gpu: markers 'sys_platform == "linux"' don't match your environment
Ignoring tensorrt-cu12: markers 'sys_platform == "linux"' don't match your environment
Ignoring tensorrt-cu12-bindings: markers 'sys_platform == "linux"' don't match your environment
Ignoring tensorrt-cu12-libs: markers 'sys_platform == "linux"' don't match your environment
Collecting conformer==0.3.2 (from -r requirements.txt (line 3))
  Using cached conformer-0.3.2-py3-none-any.whl.metadata (631 bytes)
Collecting diffusers==0.29.0 (from -r requirements.txt (line 5))
  Using cached diffusers-0.29.0-py3-none-any.whl.metadata (19 kB)
Collecting gdown==5.1.0 (from -r requirements.txt (line 6))
  Using cached gdown-5.1.0-py3-none-any.whl.metadata (5.7 kB)
Collecting gradio==5.4.0 (from -r requirements.txt (line 7))
  Using cached gradio-5.4.0-py3-none-any.whl.metadata (16 kB)
Collecting grpcio==1.57.0 (from -r requirements.txt (line 8))
  Using cached grpcio-1.57.0-cp310-cp310-macosx_12_0_universal2.whl.metadata (4.0 kB)
Collecting grpcio-tools==1.57.0 (from -r requirements.txt (line 9))
  Using cached grpcio_tools-1.57.0-cp310-cp310-macosx_12_0_universal2.whl.metadata (6.2 kB)
Collecting hydra-core==1.3.2 (from -r requirements.txt (line 10))
  Using cached hydra_core-1.3.2-py3-none-any.whl.metadata (5.5 kB)
Collecting HyperPyYAML==1.2.2 (from -r requirements.txt (line 11))
  Using cached HyperPyYAML-1.2.2-py3-none-any.whl.metadata (7.6 kB)
Collecting inflect==7.3.1 (from -r requirements.txt (line 12))
  Using cached inflect-7.3.1-py3-none-any.whl.metadata (21 kB)
Collecting librosa==0.10.2 (from -r requirements.txt (line 13))
  Using cached librosa-0.10.2-py3-none-any.whl.metadata (8.6 kB)
Collecting lightning==2.2.4 (from -r requirements.txt (line 14))
  Using cached lightning-2.2.4-py3-none-any.whl.metadata (53 kB)
Collecting matplotlib==3.7.5 (from -r requirements.txt (line 15))
  Using cached matplotlib-3.7.5-cp310-cp310-macosx_11_0_arm64.whl.metadata (5.7 kB)
Collecting modelscope==1.15.0 (from -r requirements.txt (line 16))
  Using cached modelscope-1.15.0-py3-none-any.whl.metadata (33 kB)
Collecting networkx==3.1 (from -r requirements.txt (line 17))
  Using cached networkx-3.1-py3-none-any.whl.metadata (5.3 kB)
Collecting omegaconf==2.3.0 (from -r requirements.txt (line 18))
  Using cached omegaconf-2.3.0-py3-none-any.whl.metadata (3.9 kB)
Collecting onnx==1.16.0 (from -r requirements.txt (line 19))
  Using cached onnx-1.16.0-cp310-cp310-macosx_10_15_universal2.whl.metadata (16 kB)
Collecting onnxruntime==1.18.0 (from -r requirements.txt (line 21))
  Using cached onnxruntime-1.18.0-cp310-cp310-macosx_11_0_universal2.whl.metadata (4.2 kB)
Collecting openai-whisper==20231117 (from -r requirements.txt (line 22))
  Using cached openai_whisper-20231117-py3-none-any.whl
Collecting protobuf==4.25 (from -r requirements.txt (line 23))
  Using cached protobuf-4.25.0-cp37-abi3-macosx_10_9_universal2.whl.metadata (541 bytes)
Collecting pydantic==2.7.0 (from -r requirements.txt (line 24))
  Using cached pydantic-2.7.0-py3-none-any.whl.metadata (103 kB)
Collecting pyworld==0.3.4 (from -r requirements.txt (line 25))
  Using cached pyworld-0.3.4-cp310-cp310-macosx_11_0_arm64.whl
Collecting rich==13.7.1 (from -r requirements.txt (line 26))
  Using cached rich-13.7.1-py3-none-any.whl.metadata (18 kB)
Collecting soundfile==0.12.1 (from -r requirements.txt (line 27))
  Using cached soundfile-0.12.1-py2.py3-none-macosx_11_0_arm64.whl.metadata (14 kB)
Collecting tensorboard==2.14.0 (from -r requirements.txt (line 28))
  Using cached tensorboard-2.14.0-py3-none-any.whl.metadata (1.8 kB)
Collecting torch==2.3.1 (from -r requirements.txt (line 32))
  Using cached torch-2.3.1-cp310-none-macosx_11_0_arm64.whl.metadata (26 kB)
Collecting torchaudio==2.3.1 (from -r requirements.txt (line 33))
  Using cached torchaudio-2.3.1-cp310-cp310-macosx_11_0_arm64.whl.metadata (6.4 kB)
Collecting transformers==4.40.1 (from -r requirements.txt (line 34))
  Using cached transformers-4.40.1-py3-none-any.whl.metadata (137 kB)
Collecting uvicorn==0.30.0 (from -r requirements.txt (line 35))
  Using cached uvicorn-0.30.0-py3-none-any.whl.metadata (6.3 kB)
Collecting wget==3.2 (from -r requirements.txt (line 36))
  Using cached wget-3.2-py3-none-any.whl
Collecting fastapi==0.115.6 (from -r requirements.txt (line 37))
  Using cached fastapi-0.115.6-py3-none-any.whl.metadata (27 kB)
Collecting fastapi-cli==0.0.4 (from -r requirements.txt (line 38))
  Using cached fastapi_cli-0.0.4-py3-none-any.whl.metadata (7.0 kB)
Collecting WeTextProcessing==1.0.3 (from -r requirements.txt (line 39))
  Using cached WeTextProcessing-1.0.3-py3-none-any.whl.metadata (7.2 kB)
Collecting einops>=0.6.1 (from conformer==0.3.2->-r requirements.txt (line 3))
  Using cached einops-0.8.0-py3-none-any.whl.metadata (12 kB)
Collecting importlib-metadata (from diffusers==0.29.0->-r requirements.txt (line 5))
  Using cached importlib_metadata-8.6.1-py3-none-any.whl.metadata (4.7 kB)
Collecting filelock (from diffusers==0.29.0->-r requirements.txt (line 5))
  Using cached filelock-3.17.0-py3-none-any.whl.metadata (2.9 kB)
Collecting huggingface-hub>=0.23.2 (from diffusers==0.29.0->-r requirements.txt (line 5))
  Downloading huggingface_hub-0.28.0-py3-none-any.whl.metadata (13 kB)
Collecting numpy (from diffusers==0.29.0->-r requirements.txt (line 5))
  Using cached numpy-2.2.2-cp310-cp310-macosx_14_0_arm64.whl.metadata (62 kB)
Collecting regex!=2019.12.17 (from diffusers==0.29.0->-r requirements.txt (line 5))
  Using cached regex-2024.11.6-cp310-cp310-macosx_11_0_arm64.whl.metadata (40 kB)
Collecting requests (from diffusers==0.29.0->-r requirements.txt (line 5))
  Using cached requests-2.32.3-py3-none-any.whl.metadata (4.6 kB)
Collecting safetensors>=0.3.1 (from diffusers==0.29.0->-r requirements.txt (line 5))
  Using cached safetensors-0.5.2-cp38-abi3-macosx_11_0_arm64.whl.metadata (3.8 kB)
Collecting Pillow (from diffusers==0.29.0->-r requirements.txt (line 5))
  Using cached pillow-11.1.0-cp310-cp310-macosx_11_0_arm64.whl.metadata (9.1 kB)
Collecting beautifulsoup4 (from gdown==5.1.0->-r requirements.txt (line 6))
  Using cached beautifulsoup4-4.12.3-py3-none-any.whl.metadata (3.8 kB)
Collecting tqdm (from gdown==5.1.0->-r requirements.txt (line 6))
  Using cached tqdm-4.67.1-py3-none-any.whl.metadata (57 kB)
Collecting aiofiles<24.0,>=22.0 (from gradio==5.4.0->-r requirements.txt (line 7))
  Using cached aiofiles-23.2.1-py3-none-any.whl.metadata (9.7 kB)
Collecting anyio<5.0,>=3.0 (from gradio==5.4.0->-r requirements.txt (line 7))
  Using cached anyio-4.8.0-py3-none-any.whl.metadata (4.6 kB)
Collecting ffmpy (from gradio==5.4.0->-r requirements.txt (line 7))
  Using cached ffmpy-0.5.0-py3-none-any.whl.metadata (3.0 kB)
Collecting gradio-client==1.4.2 (from gradio==5.4.0->-r requirements.txt (line 7))
  Using cached gradio_client-1.4.2-py3-none-any.whl.metadata (7.1 kB)
Collecting httpx>=0.24.1 (from gradio==5.4.0->-r requirements.txt (line 7))
  Using cached httpx-0.28.1-py3-none-any.whl.metadata (7.1 kB)
Collecting jinja2<4.0 (from gradio==5.4.0->-r requirements.txt (line 7))
  Using cached jinja2-3.1.5-py3-none-any.whl.metadata (2.6 kB)
Collecting markupsafe~=2.0 (from gradio==5.4.0->-r requirements.txt (line 7))
  Using cached https://download.pytorch.org/whl/MarkupSafe-2.1.5-cp310-cp310-macosx_10_9_universal2.whl (18 kB)
Collecting orjson~=3.0 (from gradio==5.4.0->-r requirements.txt (line 7))
  Using cached orjson-3.10.15-cp310-cp310-macosx_10_15_x86_64.macosx_11_0_arm64.macosx_10_15_universal2.whl.metadata (41 kB)
Collecting packaging (from gradio==5.4.0->-r requirements.txt (line 7))
  Using cached packaging-24.2-py3-none-any.whl.metadata (3.2 kB)
Collecting pandas<3.0,>=1.0 (from gradio==5.4.0->-r requirements.txt (line 7))
  Using cached pandas-2.2.3-cp310-cp310-macosx_11_0_arm64.whl.metadata (89 kB)
Collecting pydub (from gradio==5.4.0->-r requirements.txt (line 7))
  Using cached pydub-0.25.1-py2.py3-none-any.whl.metadata (1.4 kB)
Collecting python-multipart==0.0.12 (from gradio==5.4.0->-r requirements.txt (line 7))
  Using cached python_multipart-0.0.12-py3-none-any.whl.metadata (1.9 kB)
Collecting pyyaml<7.0,>=5.0 (from gradio==5.4.0->-r requirements.txt (line 7))
  Using cached PyYAML-6.0.2-cp310-cp310-macosx_11_0_arm64.whl.metadata (2.1 kB)
Collecting ruff>=0.2.2 (from gradio==5.4.0->-r requirements.txt (line 7))
  Using cached ruff-0.9.3-py3-none-macosx_11_0_arm64.whl.metadata (25 kB)
Collecting safehttpx<1.0,>=0.1.1 (from gradio==5.4.0->-r requirements.txt (line 7))
  Using cached safehttpx-0.1.6-py3-none-any.whl.metadata (4.2 kB)
Collecting semantic-version~=2.0 (from gradio==5.4.0->-r requirements.txt (line 7))
  Using cached semantic_version-2.10.0-py2.py3-none-any.whl.metadata (9.7 kB)
Collecting starlette<1.0,>=0.40.0 (from gradio==5.4.0->-r requirements.txt (line 7))
  Using cached starlette-0.45.3-py3-none-any.whl.metadata (6.3 kB)
Collecting tomlkit==0.12.0 (from gradio==5.4.0->-r requirements.txt (line 7))
  Using cached tomlkit-0.12.0-py3-none-any.whl.metadata (2.7 kB)
Collecting typer<1.0,>=0.12 (from gradio==5.4.0->-r requirements.txt (line 7))
  Using cached typer-0.15.1-py3-none-any.whl.metadata (15 kB)
Collecting typing-extensions~=4.0 (from gradio==5.4.0->-r requirements.txt (line 7))
  Downloading https://download.pytorch.org/whl/typing_extensions-4.12.2-py3-none-any.whl.metadata (3.0 kB)
Requirement already satisfied: setuptools in /opt/homebrew/Caskroom/miniconda/base/envs/cosyvoice/lib/python3.10/site-packages (from grpcio-tools==1.57.0->-r requirements.txt (line 9)) (75.8.0)
Collecting antlr4-python3-runtime==4.9.* (from hydra-core==1.3.2->-r requirements.txt (line 10))
  Using cached antlr4_python3_runtime-4.9.3-py3-none-any.whl
Collecting ruamel.yaml>=0.17.28 (from HyperPyYAML==1.2.2->-r requirements.txt (line 11))
  Using cached ruamel.yaml-0.18.10-py3-none-any.whl.metadata (23 kB)
Collecting more-itertools>=8.5.0 (from inflect==7.3.1->-r requirements.txt (line 12))
  Using cached more_itertools-10.6.0-py3-none-any.whl.metadata (37 kB)
Collecting typeguard>=4.0.1 (from inflect==7.3.1->-r requirements.txt (line 12))
  Using cached typeguard-4.4.1-py3-none-any.whl.metadata (3.7 kB)
Collecting audioread>=2.1.9 (from librosa==0.10.2->-r requirements.txt (line 13))
  Using cached audioread-3.0.1-py3-none-any.whl.metadata (8.4 kB)
Collecting scipy>=1.2.0 (from librosa==0.10.2->-r requirements.txt (line 13))
  Using cached scipy-1.15.1-cp310-cp310-macosx_14_0_arm64.whl.metadata (61 kB)
Collecting scikit-learn>=0.20.0 (from librosa==0.10.2->-r requirements.txt (line 13))
  Using cached scikit_learn-1.6.1-cp310-cp310-macosx_12_0_arm64.whl.metadata (31 kB)
Collecting joblib>=0.14 (from librosa==0.10.2->-r requirements.txt (line 13))
  Using cached joblib-1.4.2-py3-none-any.whl.metadata (5.4 kB)
Collecting decorator>=4.3.0 (from librosa==0.10.2->-r requirements.txt (line 13))
  Using cached decorator-5.1.1-py3-none-any.whl.metadata (4.0 kB)
Collecting numba>=0.51.0 (from librosa==0.10.2->-r requirements.txt (line 13))
  Using cached numba-0.61.0-cp310-cp310-macosx_11_0_arm64.whl.metadata (2.7 kB)
Collecting pooch>=1.1 (from librosa==0.10.2->-r requirements.txt (line 13))
  Using cached pooch-1.8.2-py3-none-any.whl.metadata (10 kB)
Collecting soxr>=0.3.2 (from librosa==0.10.2->-r requirements.txt (line 13))
  Using cached soxr-0.5.0.post1-cp310-cp310-macosx_11_0_arm64.whl.metadata (5.6 kB)
Collecting lazy-loader>=0.1 (from librosa==0.10.2->-r requirements.txt (line 13))
  Using cached lazy_loader-0.4-py3-none-any.whl.metadata (7.6 kB)
Collecting msgpack>=1.0 (from librosa==0.10.2->-r requirements.txt (line 13))
  Using cached msgpack-1.1.0-cp310-cp310-macosx_11_0_arm64.whl.metadata (8.4 kB)
Collecting fsspec<2025.0,>=2022.5.0 (from fsspec[http]<2025.0,>=2022.5.0->lightning==2.2.4->-r requirements.txt (line 14))
  Using cached fsspec-2024.12.0-py3-none-any.whl.metadata (11 kB)
Collecting lightning-utilities<2.0,>=0.8.0 (from lightning==2.2.4->-r requirements.txt (line 14))
  Using cached lightning_utilities-0.11.9-py3-none-any.whl.metadata (5.2 kB)
Collecting torchmetrics<3.0,>=0.7.0 (from lightning==2.2.4->-r requirements.txt (line 14))
  Using cached torchmetrics-1.6.1-py3-none-any.whl.metadata (21 kB)
Collecting pytorch-lightning (from lightning==2.2.4->-r requirements.txt (line 14))
  Using cached pytorch_lightning-2.5.0.post0-py3-none-any.whl.metadata (21 kB)
Collecting contourpy>=1.0.1 (from matplotlib==3.7.5->-r requirements.txt (line 15))
  Using cached contourpy-1.3.1-cp310-cp310-macosx_11_0_arm64.whl.metadata (5.4 kB)
Collecting cycler>=0.10 (from matplotlib==3.7.5->-r requirements.txt (line 15))
  Using cached cycler-0.12.1-py3-none-any.whl.metadata (3.8 kB)
Collecting fonttools>=4.22.0 (from matplotlib==3.7.5->-r requirements.txt (line 15))
  Downloading fonttools-4.55.8-cp310-cp310-macosx_10_9_universal2.whl.metadata (101 kB)
Collecting kiwisolver>=1.0.1 (from matplotlib==3.7.5->-r requirements.txt (line 15))
  Using cached kiwisolver-1.4.8-cp310-cp310-macosx_11_0_arm64.whl.metadata (6.2 kB)
Collecting numpy (from diffusers==0.29.0->-r requirements.txt (line 5))
  Using cached numpy-1.26.4-cp310-cp310-macosx_11_0_arm64.whl.metadata (61 kB)
Collecting pyparsing>=2.3.1 (from matplotlib==3.7.5->-r requirements.txt (line 15))
  Using cached pyparsing-3.2.1-py3-none-any.whl.metadata (5.0 kB)
Collecting python-dateutil>=2.7 (from matplotlib==3.7.5->-r requirements.txt (line 15))
  Using cached python_dateutil-2.9.0.post0-py2.py3-none-any.whl.metadata (8.4 kB)
Collecting addict (from modelscope==1.15.0->-r requirements.txt (line 16))
  Using cached addict-2.4.0-py3-none-any.whl.metadata (1.0 kB)
Collecting attrs (from modelscope==1.15.0->-r requirements.txt (line 16))
  Using cached attrs-25.1.0-py3-none-any.whl.metadata (10 kB)
Collecting datasets<2.19.0,>=2.16.0 (from modelscope==1.15.0->-r requirements.txt (line 16))
  Using cached datasets-2.18.0-py3-none-any.whl.metadata (20 kB)
Collecting gast>=0.2.2 (from modelscope==1.15.0->-r requirements.txt (line 16))
  Using cached gast-0.6.0-py3-none-any.whl.metadata (1.3 kB)
Collecting oss2 (from modelscope==1.15.0->-r requirements.txt (line 16))
  Using cached oss2-2.19.1-py3-none-any.whl
Collecting pyarrow!=9.0.0,>=6.0.0 (from modelscope==1.15.0->-r requirements.txt (line 16))
  Using cached pyarrow-19.0.0-cp310-cp310-macosx_12_0_arm64.whl.metadata (3.3 kB)
Collecting simplejson>=3.3.0 (from modelscope==1.15.0->-r requirements.txt (line 16))
  Using cached simplejson-3.19.3-cp310-cp310-macosx_11_0_arm64.whl.metadata (3.2 kB)
Collecting sortedcontainers>=1.5.9 (from modelscope==1.15.0->-r requirements.txt (line 16))
  Using cached sortedcontainers-2.4.0-py2.py3-none-any.whl.metadata (10 kB)
Collecting urllib3>=1.26 (from modelscope==1.15.0->-r requirements.txt (line 16))
  Using cached urllib3-2.3.0-py3-none-any.whl.metadata (6.5 kB)
Collecting yapf (from modelscope==1.15.0->-r requirements.txt (line 16))
  Using cached yapf-0.43.0-py3-none-any.whl.metadata (46 kB)
Collecting coloredlogs (from onnxruntime==1.18.0->-r requirements.txt (line 21))
  Downloading https://aiinfra.pkgs.visualstudio.com/2692857e-05ef-43b4-ba9c-ccf1c22c437c/_packaging/9387c3aa-d9ad-4513-968c-383f6f7f53b8/pypi/download/coloredlogs/15.0.1/coloredlogs-15.0.1-py2.py3-none-any.whl (46 kB)
Collecting flatbuffers (from onnxruntime==1.18.0->-r requirements.txt (line 21))
  Using cached flatbuffers-25.1.24-py2.py3-none-any.whl.metadata (875 bytes)
Collecting sympy (from onnxruntime==1.18.0->-r requirements.txt (line 21))
  Downloading https://aiinfra.pkgs.visualstudio.com/2692857e-05ef-43b4-ba9c-ccf1c22c437c/_packaging/9387c3aa-d9ad-4513-968c-383f6f7f53b8/pypi/download/sympy/1.13.3/sympy-1.13.3-py3-none-any.whl (6.2 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 6.2/6.2 MB 37.8 MB/s eta 0:00:00
Collecting tiktoken (from openai-whisper==20231117->-r requirements.txt (line 22))
  Using cached tiktoken-0.8.0-cp310-cp310-macosx_11_0_arm64.whl.metadata (6.6 kB)
Collecting annotated-types>=0.4.0 (from pydantic==2.7.0->-r requirements.txt (line 24))
  Using cached annotated_types-0.7.0-py3-none-any.whl.metadata (15 kB)
Collecting pydantic-core==2.18.1 (from pydantic==2.7.0->-r requirements.txt (line 24))
  Using cached pydantic_core-2.18.1-cp310-cp310-macosx_11_0_arm64.whl.metadata (6.5 kB)
Collecting cython>=0.24 (from pyworld==0.3.4->-r requirements.txt (line 25))
  Using cached Cython-3.0.11-py2.py3-none-any.whl.metadata (3.2 kB)
Collecting markdown-it-py>=2.2.0 (from rich==13.7.1->-r requirements.txt (line 26))
  Using cached markdown_it_py-3.0.0-py3-none-any.whl.metadata (6.9 kB)
Collecting pygments<3.0.0,>=2.13.0 (from rich==13.7.1->-r requirements.txt (line 26))
  Using cached pygments-2.19.1-py3-none-any.whl.metadata (2.5 kB)
Collecting cffi>=1.0 (from soundfile==0.12.1->-r requirements.txt (line 27))
  Using cached cffi-1.17.1-cp310-cp310-macosx_11_0_arm64.whl.metadata (1.5 kB)
Collecting absl-py>=0.4 (from tensorboard==2.14.0->-r requirements.txt (line 28))
  Using cached absl_py-2.1.0-py3-none-any.whl.metadata (2.3 kB)
Collecting google-auth<3,>=1.6.3 (from tensorboard==2.14.0->-r requirements.txt (line 28))
  Using cached google_auth-2.38.0-py2.py3-none-any.whl.metadata (4.8 kB)
Collecting google-auth-oauthlib<1.1,>=0.5 (from tensorboard==2.14.0->-r requirements.txt (line 28))
  Using cached google_auth_oauthlib-1.0.0-py2.py3-none-any.whl.metadata (2.7 kB)
Collecting markdown>=2.6.8 (from tensorboard==2.14.0->-r requirements.txt (line 28))
  Using cached Markdown-3.7-py3-none-any.whl.metadata (7.0 kB)
Collecting tensorboard-data-server<0.8.0,>=0.7.0 (from tensorboard==2.14.0->-r requirements.txt (line 28))
  Using cached tensorboard_data_server-0.7.2-py3-none-any.whl.metadata (1.1 kB)
Collecting werkzeug>=1.0.1 (from tensorboard==2.14.0->-r requirements.txt (line 28))
  Using cached werkzeug-3.1.3-py3-none-any.whl.metadata (3.7 kB)
Requirement already satisfied: wheel>=0.26 in /opt/homebrew/Caskroom/miniconda/base/envs/cosyvoice/lib/python3.10/site-packages (from tensorboard==2.14.0->-r requirements.txt (line 28)) (0.45.1)
Collecting tokenizers<0.20,>=0.19 (from transformers==4.40.1->-r requirements.txt (line 34))
  Using cached tokenizers-0.19.1-cp310-cp310-macosx_11_0_arm64.whl.metadata (6.7 kB)
Collecting click>=7.0 (from uvicorn==0.30.0->-r requirements.txt (line 35))
  Using cached click-8.1.8-py3-none-any.whl.metadata (2.3 kB)
Collecting h11>=0.8 (from uvicorn==0.30.0->-r requirements.txt (line 35))
  Using cached h11-0.14.0-py3-none-any.whl.metadata (8.2 kB)
Collecting starlette<1.0,>=0.40.0 (from gradio==5.4.0->-r requirements.txt (line 7))
  Using cached starlette-0.41.3-py3-none-any.whl.metadata (6.0 kB)
Requirement already satisfied: pynini==2.1.5 in /opt/homebrew/Caskroom/miniconda/base/envs/cosyvoice/lib/python3.10/site-packages (from WeTextProcessing==1.0.3->-r requirements.txt (line 39)) (2.1.5)
Collecting importlib-resources (from WeTextProcessing==1.0.3->-r requirements.txt (line 39))
  Using cached importlib_resources-6.5.2-py3-none-any.whl.metadata (3.9 kB)
Collecting websockets<13.0,>=10.0 (from gradio-client==1.4.2->gradio==5.4.0->-r requirements.txt (line 7))
  Using cached websockets-12.0-cp310-cp310-macosx_11_0_arm64.whl.metadata (6.6 kB)
Collecting exceptiongroup>=1.0.2 (from anyio<5.0,>=3.0->gradio==5.4.0->-r requirements.txt (line 7))
  Using cached exceptiongroup-1.2.2-py3-none-any.whl.metadata (6.6 kB)
Collecting idna>=2.8 (from anyio<5.0,>=3.0->gradio==5.4.0->-r requirements.txt (line 7))
  Using cached idna-3.10-py3-none-any.whl.metadata (10 kB)
Collecting sniffio>=1.1 (from anyio<5.0,>=3.0->gradio==5.4.0->-r requirements.txt (line 7))
  Using cached sniffio-1.3.1-py3-none-any.whl.metadata (3.9 kB)
Collecting pycparser (from cffi>=1.0->soundfile==0.12.1->-r requirements.txt (line 27))
  Using cached pycparser-2.22-py3-none-any.whl.metadata (943 bytes)
Collecting pyarrow-hotfix (from datasets<2.19.0,>=2.16.0->modelscope==1.15.0->-r requirements.txt (line 16))
  Using cached pyarrow_hotfix-0.6-py3-none-any.whl.metadata (3.6 kB)
Collecting dill<0.3.9,>=0.3.0 (from datasets<2.19.0,>=2.16.0->modelscope==1.15.0->-r requirements.txt (line 16))
  Using cached dill-0.3.8-py3-none-any.whl.metadata (10 kB)
Collecting xxhash (from datasets<2.19.0,>=2.16.0->modelscope==1.15.0->-r requirements.txt (line 16))
  Using cached xxhash-3.5.0-cp310-cp310-macosx_11_0_arm64.whl.metadata (12 kB)
Collecting multiprocess (from datasets<2.19.0,>=2.16.0->modelscope==1.15.0->-r requirements.txt (line 16))
  Using cached multiprocess-0.70.17-py310-none-any.whl.metadata (7.2 kB)
Collecting fsspec<2025.0,>=2022.5.0 (from fsspec[http]<2025.0,>=2022.5.0->lightning==2.2.4->-r requirements.txt (line 14))
  Using cached https://download.pytorch.org/whl/fsspec-2024.2.0-py3-none-any.whl (170 kB)
Collecting aiohttp (from datasets<2.19.0,>=2.16.0->modelscope==1.15.0->-r requirements.txt (line 16))
  Using cached aiohttp-3.11.11-cp310-cp310-macosx_11_0_arm64.whl.metadata (7.7 kB)
Collecting cachetools<6.0,>=2.0.0 (from google-auth<3,>=1.6.3->tensorboard==2.14.0->-r requirements.txt (line 28))
  Using cached cachetools-5.5.1-py3-none-any.whl.metadata (5.4 kB)
Collecting pyasn1-modules>=0.2.1 (from google-auth<3,>=1.6.3->tensorboard==2.14.0->-r requirements.txt (line 28))
  Using cached pyasn1_modules-0.4.1-py3-none-any.whl.metadata (3.5 kB)
Collecting rsa<5,>=3.1.4 (from google-auth<3,>=1.6.3->tensorboard==2.14.0->-r requirements.txt (line 28))
  Using cached rsa-4.9-py3-none-any.whl.metadata (4.2 kB)
Collecting requests-oauthlib>=0.7.0 (from google-auth-oauthlib<1.1,>=0.5->tensorboard==2.14.0->-r requirements.txt (line 28))
  Using cached requests_oauthlib-2.0.0-py2.py3-none-any.whl.metadata (11 kB)
Collecting certifi (from httpx>=0.24.1->gradio==5.4.0->-r requirements.txt (line 7))
  Using cached certifi-2024.12.14-py3-none-any.whl.metadata (2.3 kB)
Collecting httpcore==1.* (from httpx>=0.24.1->gradio==5.4.0->-r requirements.txt (line 7))
  Using cached httpcore-1.0.7-py3-none-any.whl.metadata (21 kB)
Collecting mdurl~=0.1 (from markdown-it-py>=2.2.0->rich==13.7.1->-r requirements.txt (line 26))
  Using cached mdurl-0.1.2-py3-none-any.whl.metadata (1.6 kB)
Collecting llvmlite<0.45,>=0.44.0dev0 (from numba>=0.51.0->librosa==0.10.2->-r requirements.txt (line 13))
  Using cached llvmlite-0.44.0-cp310-cp310-macosx_11_0_arm64.whl.metadata (4.8 kB)
Collecting pytz>=2020.1 (from pandas<3.0,>=1.0->gradio==5.4.0->-r requirements.txt (line 7))
  Using cached pytz-2024.2-py2.py3-none-any.whl.metadata (22 kB)
Collecting tzdata>=2022.7 (from pandas<3.0,>=1.0->gradio==5.4.0->-r requirements.txt (line 7))
  Using cached tzdata-2025.1-py2.py3-none-any.whl.metadata (1.4 kB)
Collecting platformdirs>=2.5.0 (from pooch>=1.1->librosa==0.10.2->-r requirements.txt (line 13))
  Using cached platformdirs-4.3.6-py3-none-any.whl.metadata (11 kB)
Collecting six>=1.5 (from python-dateutil>=2.7->matplotlib==3.7.5->-r requirements.txt (line 15))
  Using cached six-1.17.0-py2.py3-none-any.whl.metadata (1.7 kB)
Collecting charset-normalizer<4,>=2 (from requests->diffusers==0.29.0->-r requirements.txt (line 5))
  Using cached charset_normalizer-3.4.1-cp310-cp310-macosx_10_9_universal2.whl.metadata (35 kB)
Collecting ruamel.yaml.clib>=0.2.7 (from ruamel.yaml>=0.17.28->HyperPyYAML==1.2.2->-r requirements.txt (line 11))
  Using cached ruamel.yaml.clib-0.2.12-cp310-cp310-macosx_13_0_arm64.whl.metadata (1.2 kB)
Collecting threadpoolctl>=3.1.0 (from scikit-learn>=0.20.0->librosa==0.10.2->-r requirements.txt (line 13))
  Using cached threadpoolctl-3.5.0-py3-none-any.whl.metadata (13 kB)
Collecting shellingham>=1.3.0 (from typer<1.0,>=0.12->gradio==5.4.0->-r requirements.txt (line 7))
  Using cached shellingham-1.5.4-py2.py3-none-any.whl.metadata (3.5 kB)
Collecting soupsieve>1.2 (from beautifulsoup4->gdown==5.1.0->-r requirements.txt (line 6))
  Using cached soupsieve-2.6-py3-none-any.whl.metadata (4.6 kB)
Collecting humanfriendly>=9.1 (from coloredlogs->onnxruntime==1.18.0->-r requirements.txt (line 21))
  Downloading https://aiinfra.pkgs.visualstudio.com/2692857e-05ef-43b4-ba9c-ccf1c22c437c/_packaging/9387c3aa-d9ad-4513-968c-383f6f7f53b8/pypi/download/humanfriendly/10/humanfriendly-10.0-py2.py3-none-any.whl (86 kB)
Collecting zipp>=3.20 (from importlib-metadata->diffusers==0.29.0->-r requirements.txt (line 5))
  Using cached zipp-3.21.0-py3-none-any.whl.metadata (3.7 kB)
Collecting crcmod>=1.7 (from oss2->modelscope==1.15.0->-r requirements.txt (line 16))
  Using cached crcmod-1.7-cp310-cp310-macosx_11_0_arm64.whl
Collecting pycryptodome>=3.4.7 (from oss2->modelscope==1.15.0->-r requirements.txt (line 16))
  Using cached pycryptodome-3.21.0-cp36-abi3-macosx_10_9_universal2.whl.metadata (3.4 kB)
Collecting aliyun-python-sdk-kms>=2.4.1 (from oss2->modelscope==1.15.0->-r requirements.txt (line 16))
  Using cached aliyun_python_sdk_kms-2.16.5-py2.py3-none-any.whl.metadata (1.5 kB)
Collecting aliyun-python-sdk-core>=2.13.12 (from oss2->modelscope==1.15.0->-r requirements.txt (line 16))
  Using cached aliyun_python_sdk_core-2.16.0-py3-none-any.whl
Collecting PySocks!=1.5.7,>=1.5.6 (from requests[socks]->gdown==5.1.0->-r requirements.txt (line 6))
  Using cached PySocks-1.7.1-py3-none-any.whl.metadata (13 kB)
Collecting mpmath<1.4,>=1.1.0 (from sympy->onnxruntime==1.18.0->-r requirements.txt (line 21))
  Downloading https://aiinfra.pkgs.visualstudio.com/2692857e-05ef-43b4-ba9c-ccf1c22c437c/_packaging/9387c3aa-d9ad-4513-968c-383f6f7f53b8/pypi/download/mpmath/1.3/mpmath-1.3.0-py3-none-any.whl (536 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 536.2/536.2 kB 9.8 MB/s eta 0:00:00
Collecting tomli>=2.0.1 (from yapf->modelscope==1.15.0->-r requirements.txt (line 16))
  Using cached tomli-2.2.1-py3-none-any.whl.metadata (10 kB)
Collecting aiohappyeyeballs>=2.3.0 (from aiohttp->datasets<2.19.0,>=2.16.0->modelscope==1.15.0->-r requirements.txt (line 16))
  Using cached aiohappyeyeballs-2.4.4-py3-none-any.whl.metadata (6.1 kB)
Collecting aiosignal>=1.1.2 (from aiohttp->datasets<2.19.0,>=2.16.0->modelscope==1.15.0->-r requirements.txt (line 16))
  Using cached aiosignal-1.3.2-py2.py3-none-any.whl.metadata (3.8 kB)
Collecting async-timeout<6.0,>=4.0 (from aiohttp->datasets<2.19.0,>=2.16.0->modelscope==1.15.0->-r requirements.txt (line 16))
  Using cached async_timeout-5.0.1-py3-none-any.whl.metadata (5.1 kB)
Collecting frozenlist>=1.1.1 (from aiohttp->datasets<2.19.0,>=2.16.0->modelscope==1.15.0->-r requirements.txt (line 16))
  Using cached frozenlist-1.5.0-cp310-cp310-macosx_11_0_arm64.whl.metadata (13 kB)
Collecting multidict<7.0,>=4.5 (from aiohttp->datasets<2.19.0,>=2.16.0->modelscope==1.15.0->-r requirements.txt (line 16))
  Using cached multidict-6.1.0-cp310-cp310-macosx_11_0_arm64.whl.metadata (5.0 kB)
Collecting propcache>=0.2.0 (from aiohttp->datasets<2.19.0,>=2.16.0->modelscope==1.15.0->-r requirements.txt (line 16))
  Using cached propcache-0.2.1-cp310-cp310-macosx_11_0_arm64.whl.metadata (9.2 kB)
Collecting yarl<2.0,>=1.17.0 (from aiohttp->datasets<2.19.0,>=2.16.0->modelscope==1.15.0->-r requirements.txt (line 16))
  Using cached yarl-1.18.3-cp310-cp310-macosx_11_0_arm64.whl.metadata (69 kB)
Collecting jmespath<1.0.0,>=0.9.3 (from aliyun-python-sdk-core>=2.13.12->oss2->modelscope==1.15.0->-r requirements.txt (line 16))
  Using cached jmespath-0.10.0-py2.py3-none-any.whl.metadata (8.0 kB)
Collecting cryptography>=3.0.0 (from aliyun-python-sdk-core>=2.13.12->oss2->modelscope==1.15.0->-r requirements.txt (line 16))
  Using cached cryptography-44.0.0-cp39-abi3-macosx_10_9_universal2.whl.metadata (5.7 kB)
Collecting pyasn1<0.7.0,>=0.4.6 (from pyasn1-modules>=0.2.1->google-auth<3,>=1.6.3->tensorboard==2.14.0->-r requirements.txt (line 28))
  Using cached pyasn1-0.6.1-py3-none-any.whl.metadata (8.4 kB)
Collecting oauthlib>=3.0.0 (from requests-oauthlib>=0.7.0->google-auth-oauthlib<1.1,>=0.5->tensorboard==2.14.0->-r requirements.txt (line 28))
  Using cached oauthlib-3.2.2-py3-none-any.whl.metadata (7.5 kB)
INFO: pip is looking at multiple versions of multiprocess to determine which version is compatible with other requirements. This could take a while.
Collecting multiprocess (from datasets<2.19.0,>=2.16.0->modelscope==1.15.0->-r requirements.txt (line 16))
  Using cached multiprocess-0.70.16-py310-none-any.whl.metadata (7.2 kB)
Using cached conformer-0.3.2-py3-none-any.whl (4.3 kB)
Using cached diffusers-0.29.0-py3-none-any.whl (2.2 MB)
Using cached gdown-5.1.0-py3-none-any.whl (17 kB)
Using cached gradio-5.4.0-py3-none-any.whl (56.7 MB)
Using cached grpcio-1.57.0-cp310-cp310-macosx_12_0_universal2.whl (9.0 MB)
Using cached grpcio_tools-1.57.0-cp310-cp310-macosx_12_0_universal2.whl (4.6 MB)
Using cached hydra_core-1.3.2-py3-none-any.whl (154 kB)
Using cached HyperPyYAML-1.2.2-py3-none-any.whl (16 kB)
Using cached inflect-7.3.1-py3-none-any.whl (34 kB)
Using cached librosa-0.10.2-py3-none-any.whl (260 kB)
Using cached lightning-2.2.4-py3-none-any.whl (2.0 MB)
Using cached matplotlib-3.7.5-cp310-cp310-macosx_11_0_arm64.whl (7.3 MB)
Using cached modelscope-1.15.0-py3-none-any.whl (5.7 MB)
Using cached networkx-3.1-py3-none-any.whl (2.1 MB)
Using cached omegaconf-2.3.0-py3-none-any.whl (79 kB)
Using cached onnx-1.16.0-cp310-cp310-macosx_10_15_universal2.whl (16.5 MB)
Using cached onnxruntime-1.18.0-cp310-cp310-macosx_11_0_universal2.whl (15.9 MB)
Using cached protobuf-4.25.0-cp37-abi3-macosx_10_9_universal2.whl (393 kB)
Using cached pydantic-2.7.0-py3-none-any.whl (407 kB)
Using cached rich-13.7.1-py3-none-any.whl (240 kB)
Using cached soundfile-0.12.1-py2.py3-none-macosx_11_0_arm64.whl (1.1 MB)
Using cached tensorboard-2.14.0-py3-none-any.whl (5.5 MB)
Using cached torch-2.3.1-cp310-none-macosx_11_0_arm64.whl (61.0 MB)
Using cached torchaudio-2.3.1-cp310-cp310-macosx_11_0_arm64.whl (1.8 MB)
Using cached transformers-4.40.1-py3-none-any.whl (9.0 MB)
Using cached uvicorn-0.30.0-py3-none-any.whl (62 kB)
Using cached fastapi-0.115.6-py3-none-any.whl (94 kB)
Using cached fastapi_cli-0.0.4-py3-none-any.whl (9.5 kB)
Using cached WeTextProcessing-1.0.3-py3-none-any.whl (2.0 MB)
Using cached gradio_client-1.4.2-py3-none-any.whl (319 kB)
Using cached pydantic_core-2.18.1-cp310-cp310-macosx_11_0_arm64.whl (1.8 MB)
Using cached python_multipart-0.0.12-py3-none-any.whl (23 kB)
Using cached tomlkit-0.12.0-py3-none-any.whl (37 kB)
Using cached absl_py-2.1.0-py3-none-any.whl (133 kB)
Using cached aiofiles-23.2.1-py3-none-any.whl (15 kB)
Using cached annotated_types-0.7.0-py3-none-any.whl (13 kB)
Using cached anyio-4.8.0-py3-none-any.whl (96 kB)
Using cached audioread-3.0.1-py3-none-any.whl (23 kB)
Using cached cffi-1.17.1-cp310-cp310-macosx_11_0_arm64.whl (178 kB)
Using cached click-8.1.8-py3-none-any.whl (98 kB)
Using cached contourpy-1.3.1-cp310-cp310-macosx_11_0_arm64.whl (253 kB)
Using cached cycler-0.12.1-py3-none-any.whl (8.3 kB)
Using cached Cython-3.0.11-py2.py3-none-any.whl (1.2 MB)
Using cached datasets-2.18.0-py3-none-any.whl (510 kB)
Using cached decorator-5.1.1-py3-none-any.whl (9.1 kB)
Using cached einops-0.8.0-py3-none-any.whl (43 kB)
Using cached filelock-3.17.0-py3-none-any.whl (16 kB)
Downloading fonttools-4.55.8-cp310-cp310-macosx_10_9_universal2.whl (2.8 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.8/2.8 MB 55.4 MB/s eta 0:00:00
Using cached gast-0.6.0-py3-none-any.whl (21 kB)
Using cached google_auth-2.38.0-py2.py3-none-any.whl (210 kB)
Using cached google_auth_oauthlib-1.0.0-py2.py3-none-any.whl (18 kB)
Using cached h11-0.14.0-py3-none-any.whl (58 kB)
Using cached httpx-0.28.1-py3-none-any.whl (73 kB)
Using cached httpcore-1.0.7-py3-none-any.whl (78 kB)
Downloading huggingface_hub-0.28.0-py3-none-any.whl (464 kB)
Using cached jinja2-3.1.5-py3-none-any.whl (134 kB)
Using cached joblib-1.4.2-py3-none-any.whl (301 kB)
Using cached kiwisolver-1.4.8-cp310-cp310-macosx_11_0_arm64.whl (65 kB)
Using cached lazy_loader-0.4-py3-none-any.whl (12 kB)
Using cached lightning_utilities-0.11.9-py3-none-any.whl (28 kB)
Using cached Markdown-3.7-py3-none-any.whl (106 kB)
Using cached markdown_it_py-3.0.0-py3-none-any.whl (87 kB)
Using cached more_itertools-10.6.0-py3-none-any.whl (63 kB)
Using cached msgpack-1.1.0-cp310-cp310-macosx_11_0_arm64.whl (81 kB)
Using cached numba-0.61.0-cp310-cp310-macosx_11_0_arm64.whl (2.8 MB)
Using cached numpy-1.26.4-cp310-cp310-macosx_11_0_arm64.whl (14.0 MB)
Using cached orjson-3.10.15-cp310-cp310-macosx_10_15_x86_64.macosx_11_0_arm64.macosx_10_15_universal2.whl (249 kB)
Using cached packaging-24.2-py3-none-any.whl (65 kB)
Using cached pandas-2.2.3-cp310-cp310-macosx_11_0_arm64.whl (11.3 MB)
Using cached pillow-11.1.0-cp310-cp310-macosx_11_0_arm64.whl (3.1 MB)
Using cached pooch-1.8.2-py3-none-any.whl (64 kB)
Using cached pyarrow-19.0.0-cp310-cp310-macosx_12_0_arm64.whl (30.7 MB)
Using cached pygments-2.19.1-py3-none-any.whl (1.2 MB)
Using cached pyparsing-3.2.1-py3-none-any.whl (107 kB)
Using cached python_dateutil-2.9.0.post0-py2.py3-none-any.whl (229 kB)
Using cached PyYAML-6.0.2-cp310-cp310-macosx_11_0_arm64.whl (171 kB)
Using cached regex-2024.11.6-cp310-cp310-macosx_11_0_arm64.whl (284 kB)
Using cached requests-2.32.3-py3-none-any.whl (64 kB)
Using cached ruamel.yaml-0.18.10-py3-none-any.whl (117 kB)
Using cached ruff-0.9.3-py3-none-macosx_11_0_arm64.whl (11.0 MB)
Using cached safehttpx-0.1.6-py3-none-any.whl (8.7 kB)
Using cached safetensors-0.5.2-cp38-abi3-macosx_11_0_arm64.whl (408 kB)
Using cached scikit_learn-1.6.1-cp310-cp310-macosx_12_0_arm64.whl (11.1 MB)
Using cached scipy-1.15.1-cp310-cp310-macosx_14_0_arm64.whl (24.8 MB)
Using cached semantic_version-2.10.0-py2.py3-none-any.whl (15 kB)
Using cached simplejson-3.19.3-cp310-cp310-macosx_11_0_arm64.whl (75 kB)
Using cached sortedcontainers-2.4.0-py2.py3-none-any.whl (29 kB)
Using cached soxr-0.5.0.post1-cp310-cp310-macosx_11_0_arm64.whl (160 kB)
Using cached starlette-0.41.3-py3-none-any.whl (73 kB)
Using cached tensorboard_data_server-0.7.2-py3-none-any.whl (2.4 kB)
Using cached tokenizers-0.19.1-cp310-cp310-macosx_11_0_arm64.whl (2.4 MB)
Using cached torchmetrics-1.6.1-py3-none-any.whl (927 kB)
Using cached tqdm-4.67.1-py3-none-any.whl (78 kB)
Using cached typeguard-4.4.1-py3-none-any.whl (35 kB)
Using cached typer-0.15.1-py3-none-any.whl (44 kB)
Downloading https://download.pytorch.org/whl/typing_extensions-4.12.2-py3-none-any.whl (37 kB)
Using cached urllib3-2.3.0-py3-none-any.whl (128 kB)
Using cached werkzeug-3.1.3-py3-none-any.whl (224 kB)
Using cached addict-2.4.0-py3-none-any.whl (3.8 kB)
Using cached attrs-25.1.0-py3-none-any.whl (63 kB)
Using cached beautifulsoup4-4.12.3-py3-none-any.whl (147 kB)
Using cached ffmpy-0.5.0-py3-none-any.whl (6.0 kB)
Using cached flatbuffers-25.1.24-py2.py3-none-any.whl (30 kB)
Using cached importlib_metadata-8.6.1-py3-none-any.whl (26 kB)
Using cached importlib_resources-6.5.2-py3-none-any.whl (37 kB)
Using cached pydub-0.25.1-py2.py3-none-any.whl (32 kB)
Using cached pytorch_lightning-2.5.0.post0-py3-none-any.whl (819 kB)
Using cached tiktoken-0.8.0-cp310-cp310-macosx_11_0_arm64.whl (982 kB)
Using cached yapf-0.43.0-py3-none-any.whl (256 kB)
Using cached aiohttp-3.11.11-cp310-cp310-macosx_11_0_arm64.whl (455 kB)
Using cached aliyun_python_sdk_kms-2.16.5-py2.py3-none-any.whl (99 kB)
Using cached cachetools-5.5.1-py3-none-any.whl (9.5 kB)
Using cached certifi-2024.12.14-py3-none-any.whl (164 kB)
Using cached charset_normalizer-3.4.1-cp310-cp310-macosx_10_9_universal2.whl (198 kB)
Using cached dill-0.3.8-py3-none-any.whl (116 kB)
Using cached exceptiongroup-1.2.2-py3-none-any.whl (16 kB)
Using cached idna-3.10-py3-none-any.whl (70 kB)
Using cached llvmlite-0.44.0-cp310-cp310-macosx_11_0_arm64.whl (26.2 MB)
Using cached mdurl-0.1.2-py3-none-any.whl (10.0 kB)
Using cached platformdirs-4.3.6-py3-none-any.whl (18 kB)
Using cached pyasn1_modules-0.4.1-py3-none-any.whl (181 kB)
Using cached pycryptodome-3.21.0-cp36-abi3-macosx_10_9_universal2.whl (2.5 MB)
Using cached PySocks-1.7.1-py3-none-any.whl (16 kB)
Using cached pytz-2024.2-py2.py3-none-any.whl (508 kB)
Using cached requests_oauthlib-2.0.0-py2.py3-none-any.whl (24 kB)
Using cached rsa-4.9-py3-none-any.whl (34 kB)
Using cached ruamel.yaml.clib-0.2.12-cp310-cp310-macosx_13_0_arm64.whl (131 kB)
Using cached shellingham-1.5.4-py2.py3-none-any.whl (9.8 kB)
Using cached six-1.17.0-py2.py3-none-any.whl (11 kB)
Using cached sniffio-1.3.1-py3-none-any.whl (10 kB)
Using cached soupsieve-2.6-py3-none-any.whl (36 kB)
Using cached threadpoolctl-3.5.0-py3-none-any.whl (18 kB)
Using cached tomli-2.2.1-py3-none-any.whl (14 kB)
Using cached tzdata-2025.1-py2.py3-none-any.whl (346 kB)
Using cached websockets-12.0-cp310-cp310-macosx_11_0_arm64.whl (121 kB)
Using cached zipp-3.21.0-py3-none-any.whl (9.6 kB)
Using cached multiprocess-0.70.16-py310-none-any.whl (134 kB)
Using cached pyarrow_hotfix-0.6-py3-none-any.whl (7.9 kB)
Using cached pycparser-2.22-py3-none-any.whl (117 kB)
Using cached xxhash-3.5.0-cp310-cp310-macosx_11_0_arm64.whl (30 kB)
Using cached aiohappyeyeballs-2.4.4-py3-none-any.whl (14 kB)
Using cached aiosignal-1.3.2-py2.py3-none-any.whl (7.6 kB)
Using cached async_timeout-5.0.1-py3-none-any.whl (6.2 kB)
Using cached cryptography-44.0.0-cp39-abi3-macosx_10_9_universal2.whl (6.5 MB)
Using cached frozenlist-1.5.0-cp310-cp310-macosx_11_0_arm64.whl (52 kB)
Using cached jmespath-0.10.0-py2.py3-none-any.whl (24 kB)
Using cached multidict-6.1.0-cp310-cp310-macosx_11_0_arm64.whl (29 kB)
Using cached oauthlib-3.2.2-py3-none-any.whl (151 kB)
Using cached propcache-0.2.1-cp310-cp310-macosx_11_0_arm64.whl (45 kB)
Using cached pyasn1-0.6.1-py3-none-any.whl (83 kB)
Using cached yarl-1.18.3-cp310-cp310-macosx_11_0_arm64.whl (92 kB)
Installing collected packages: wget, sortedcontainers, pytz, pydub, mpmath, flatbuffers, crcmod, antlr4-python3-runtime, addict, zipp, xxhash, websockets, urllib3, tzdata, typing-extensions, tqdm, tomlkit, tomli, threadpoolctl, tensorboard-data-server, sympy, soupsieve, sniffio, six, simplejson, shellingham, semantic-version, safetensors, ruff, ruamel.yaml.clib, regex, pyyaml, python-multipart, PySocks, pyparsing, pygments, pycryptodome, pycparser, pyasn1, pyarrow-hotfix, pyarrow, protobuf, propcache, platformdirs, Pillow, packaging, orjson, oauthlib, numpy, networkx, msgpack, more-itertools, mdurl, markupsafe, markdown, llvmlite, kiwisolver, joblib, jmespath, importlib-resources, idna, humanfriendly, h11, grpcio, gast, fsspec, frozenlist, fonttools, filelock, ffmpy, exceptiongroup, einops, dill, decorator, cython, cycler, click, charset-normalizer, certifi, cachetools, audioread, attrs, async-timeout, annotated-types, aiohappyeyeballs, aiofiles, absl-py, yapf, werkzeug, uvicorn, typeguard, soxr, scipy, ruamel.yaml, rsa, requests, pyworld, python-dateutil, pydantic-core, pyasn1-modules, onnx, omegaconf, numba, multiprocess, multidict, markdown-it-py, lightning-utilities, lazy-loader, jinja2, importlib-metadata, httpcore, grpcio-tools, contourpy, coloredlogs, cffi, beautifulsoup4, anyio, aiosignal, yarl, WeTextProcessing, torch, tiktoken, starlette, soundfile, scikit-learn, rich, requests-oauthlib, pydantic, pooch, pandas, onnxruntime, matplotlib, inflect, HyperPyYAML, hydra-core, huggingface-hub, httpx, google-auth, cryptography, typer, torchmetrics, torchaudio, tokenizers, safehttpx, openai-whisper, librosa, gradio-client, google-auth-oauthlib, gdown, fastapi, diffusers, conformer, aliyun-python-sdk-core, aiohttp, transformers, tensorboard, gradio, fastapi-cli, aliyun-python-sdk-kms, pytorch-lightning, oss2, datasets, modelscope, lightning
Successfully installed HyperPyYAML-1.2.2 Pillow-11.1.0 PySocks-1.7.1 WeTextProcessing-1.0.3 absl-py-2.1.0 addict-2.4.0 aiofiles-23.2.1 aiohappyeyeballs-2.4.4 aiohttp-3.11.11 aiosignal-1.3.2 aliyun-python-sdk-core-2.16.0 aliyun-python-sdk-kms-2.16.5 annotated-types-0.7.0 antlr4-python3-runtime-4.9.3 anyio-4.8.0 async-timeout-5.0.1 attrs-25.1.0 audioread-3.0.1 beautifulsoup4-4.12.3 cachetools-5.5.1 certifi-2024.12.14 cffi-1.17.1 charset-normalizer-3.4.1 click-8.1.8 coloredlogs-15.0.1 conformer-0.3.2 contourpy-1.3.1 crcmod-1.7 cryptography-44.0.0 cycler-0.12.1 cython-3.0.11 datasets-2.18.0 decorator-5.1.1 diffusers-0.29.0 dill-0.3.8 einops-0.8.0 exceptiongroup-1.2.2 fastapi-0.115.6 fastapi-cli-0.0.4 ffmpy-0.5.0 filelock-3.17.0 flatbuffers-25.1.24 fonttools-4.55.8 frozenlist-1.5.0 fsspec-2024.2.0 gast-0.6.0 gdown-5.1.0 google-auth-2.38.0 google-auth-oauthlib-1.0.0 gradio-5.4.0 gradio-client-1.4.2 grpcio-1.57.0 grpcio-tools-1.57.0 h11-0.14.0 httpcore-1.0.7 httpx-0.28.1 huggingface-hub-0.28.0 humanfriendly-10.0 hydra-core-1.3.2 idna-3.10 importlib-metadata-8.6.1 importlib-resources-6.5.2 inflect-7.3.1 jinja2-3.1.5 jmespath-0.10.0 joblib-1.4.2 kiwisolver-1.4.8 lazy-loader-0.4 librosa-0.10.2 lightning-2.2.4 lightning-utilities-0.11.9 llvmlite-0.44.0 markdown-3.7 markdown-it-py-3.0.0 markupsafe-2.1.5 matplotlib-3.7.5 mdurl-0.1.2 modelscope-1.15.0 more-itertools-10.6.0 mpmath-1.3.0 msgpack-1.1.0 multidict-6.1.0 multiprocess-0.70.16 networkx-3.1 numba-0.61.0 numpy-1.26.4 oauthlib-3.2.2 omegaconf-2.3.0 onnx-1.16.0 onnxruntime-1.18.0 openai-whisper-20231117 orjson-3.10.15 oss2-2.19.1 packaging-24.2 pandas-2.2.3 platformdirs-4.3.6 pooch-1.8.2 propcache-0.2.1 protobuf-4.25.0 pyarrow-19.0.0 pyarrow-hotfix-0.6 pyasn1-0.6.1 pyasn1-modules-0.4.1 pycparser-2.22 pycryptodome-3.21.0 pydantic-2.7.0 pydantic-core-2.18.1 pydub-0.25.1 pygments-2.19.1 pyparsing-3.2.1 python-dateutil-2.9.0.post0 python-multipart-0.0.12 pytorch-lightning-2.5.0.post0 pytz-2024.2 pyworld-0.3.4 pyyaml-6.0.2 regex-2024.11.6 requests-2.32.3 requests-oauthlib-2.0.0 rich-13.7.1 rsa-4.9 ruamel.yaml-0.18.10 ruamel.yaml.clib-0.2.12 ruff-0.9.3 safehttpx-0.1.6 safetensors-0.5.2 scikit-learn-1.6.1 scipy-1.15.1 semantic-version-2.10.0 shellingham-1.5.4 simplejson-3.19.3 six-1.17.0 sniffio-1.3.1 sortedcontainers-2.4.0 soundfile-0.12.1 soupsieve-2.6 soxr-0.5.0.post1 starlette-0.41.3 sympy-1.13.3 tensorboard-2.14.0 tensorboard-data-server-0.7.2 threadpoolctl-3.5.0 tiktoken-0.8.0 tokenizers-0.19.1 tomli-2.2.1 tomlkit-0.12.0 torch-2.3.1 torchaudio-2.3.1 torchmetrics-1.6.1 tqdm-4.67.1 transformers-4.40.1 typeguard-4.4.1 typer-0.15.1 typing-extensions-4.12.2 tzdata-2025.1 urllib3-2.3.0 uvicorn-0.30.0 websockets-12.0 werkzeug-3.1.3 wget-3.2 xxhash-3.5.0 yapf-0.43.0 yarl-1.18.3 zipp-3.21.0

8. モデルをダウンロードする

ここではGit LFSを使います(未インストールの場合はbrew install git-lfs && git lfs installしてください)

(cosyvoice) ysic@m4macmini CosyVoice % mkdir -p pretrained_models
(cosyvoice) ysic@m4macmini CosyVoice % git clone https://www.modelscope.cn/iic/CosyVoice2-0.5B.git pretrained_models/CosyVoice2-0.5B
Cloning into 'pretrained_models/CosyVoice2-0.5B'...
remote: Enumerating objects: 77, done.
remote: Counting objects: 100% (77/77), done.
remote: Compressing objects: 100% (61/61), done.
remote: Total 77 (delta 26), reused 51 (delta 13), pack-reused 0
Receiving objects: 100% (77/77), 1.72 MiB | 4.33 MiB/s, done.
Resolving deltas: 100% (26/26), done.
Filtering content: 100% (12/12), 4.80 GiB | 73.96 MiB/s, done.

生成してみる

READMEを参考に書いたコードは以下

import sys

sys.path.append("third_party/Matcha-TTS")
from cosyvoice.cli.cosyvoice import CosyVoice2
from cosyvoice.utils.file_utils import load_wav
import torchaudio

cosyvoice = CosyVoice2(
    "pretrained_models/CosyVoice2-0.5B", load_jit=False, load_trt=False, fp16=False
)

prompt_speech_16k = load_wav(
    "./asset/ITAcorpus_amitaro_yofukashi_1.1/44.1k/emo/EMOTION100_058.wav", 16000
)

def text_generator():
    yield "あのイーハトーヴォのすきとおったかぜ、"
    yield "夏でも底に冷たさをもつ青いそら、"
    yield "うつくしい森でかざられたモリーオ市、"
    yield "郊外のぎらぎらひかる草の波。"

for i, j in enumerate(
    cosyvoice.inference_zero_shot(
        text_generator(),
        "すみません、この辺に詳しくないんです。",
        prompt_speech_16k,
        stream=False,
    )
):
    torchaudio.save("test.wav".format(i), j["tts_speech"], cosyvoice.sample_rate)

元となる音声はあみたろの声素材工房様よりITAコーパス読み上げ音声をお借りしました。

実行してみると、

(cosyvoice) ysic@m4macmini CosyVoice % python test.py
2025-01-30 16:44:21,566 - modelscope - INFO - PyTorch version 2.3.1 Found.
2025-01-30 16:44:21,566 - modelscope - INFO - Loading ast index from /Users/ysic/.cache/modelscope/ast_indexer
2025-01-30 16:44:21,587 - modelscope - INFO - Loading done! Current index file version is 1.15.0, with md5 a146a4c7ddc066c8ec70acfe431b42ad and a total number of 980 components indexed
failed to import ttsfrd, use WeTextProcessing instead
/opt/homebrew/Caskroom/miniconda/base/envs/cosyvoice/lib/python3.10/site-packages/diffusers/models/lora.py:393: FutureWarning: `LoRACompatibleLinear` is deprecated and will be removed in version 1.0.0. Use of `LoRACompatibleLinear` is deprecated. Please switch to PEFT backend by installing PEFT: `pip install peft`.
  deprecate("LoRACompatibleLinear", "1.0.0", deprecation_message)
2025-01-30 16:44:24,698 INFO input frame rate=25
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
2025-01-30 16:44:25,374 WETEXT INFO building fst for zh_normalizer ...
2025-01-30 16:44:25,374 INFO building fst for zh_normalizer ...
2025-01-30 16:44:32,946 WETEXT INFO done
2025-01-30 16:44:32,946 INFO done
2025-01-30 16:44:32,946 WETEXT INFO fst path: /opt/homebrew/Caskroom/miniconda/base/envs/cosyvoice/lib/python3.10/site-packages/tn/zh_tn_tagger.fst
2025-01-30 16:44:32,946 INFO fst path: /opt/homebrew/Caskroom/miniconda/base/envs/cosyvoice/lib/python3.10/site-packages/tn/zh_tn_tagger.fst
2025-01-30 16:44:32,946 WETEXT INFO           /opt/homebrew/Caskroom/miniconda/base/envs/cosyvoice/lib/python3.10/site-packages/tn/zh_tn_verbalizer.fst
2025-01-30 16:44:32,946 INFO           /opt/homebrew/Caskroom/miniconda/base/envs/cosyvoice/lib/python3.10/site-packages/tn/zh_tn_verbalizer.fst
2025-01-30 16:44:32,948 WETEXT INFO found existing fst: /opt/homebrew/Caskroom/miniconda/base/envs/cosyvoice/lib/python3.10/site-packages/tn/en_tn_tagger.fst
2025-01-30 16:44:32,948 INFO found existing fst: /opt/homebrew/Caskroom/miniconda/base/envs/cosyvoice/lib/python3.10/site-packages/tn/en_tn_tagger.fst
2025-01-30 16:44:32,948 WETEXT INFO                     /opt/homebrew/Caskroom/miniconda/base/envs/cosyvoice/lib/python3.10/site-packages/tn/en_tn_verbalizer.fst
2025-01-30 16:44:32,948 INFO                     /opt/homebrew/Caskroom/miniconda/base/envs/cosyvoice/lib/python3.10/site-packages/tn/en_tn_verbalizer.fst
2025-01-30 16:44:32,948 WETEXT INFO skip building fst for en_normalizer ...
2025-01-30 16:44:32,948 INFO skip building fst for en_normalizer ...
2025-01-30 16:44:33,610 INFO get tts_text generator, will skip text_normalize!
  0%|                                                                                                                                           | 0/1 [00:00<?, ?it/s]2025-01-30 16:44:33,753 INFO get tts_text generator, will return _extract_text_token_generator!
2025-01-30 16:44:33,987 INFO synthesis text <generator object text_generator at 0x37e8ebca0>
2025-01-30 16:44:33,988 INFO append 5 text token 15 speech token
2025-01-30 16:44:33,988 INFO append 5 text token 15 speech token
2025-01-30 16:44:33,988 INFO not enough text token to decode, wait for more
2025-01-30 16:44:33,988 INFO append 5 text token 15 speech token
2025-01-30 16:44:33,989 INFO not enough text token to decode, wait for more
2025-01-30 16:44:33,989 INFO not enough text token to decode, wait for more
2025-01-30 16:44:33,989 INFO not enough text token to decode, wait for more
2025-01-30 16:44:33,989 INFO not enough text token to decode, wait for more
2025-01-30 16:44:33,989 INFO not enough text token to decode, wait for more
2025-01-30 16:44:33,989 INFO append 5 text token 15 speech token
2025-01-30 16:44:33,989 INFO not enough text token to decode, wait for more
2025-01-30 16:44:33,989 INFO not enough text token to decode, wait for more
2025-01-30 16:44:33,989 INFO not enough text token to decode, wait for more
2025-01-30 16:44:33,989 INFO not enough text token to decode, wait for more
2025-01-30 16:44:33,989 INFO not enough text token to decode, wait for more
2025-01-30 16:44:33,989 INFO append 5 text token 15 speech token
2025-01-30 16:44:34,171 INFO fill_token index 0 next fill_token index 16
2025-01-30 16:44:34,171 INFO get fill token, need to append more text token
2025-01-30 16:44:34,171 INFO not enough text token to decode, wait for more
2025-01-30 16:44:34,171 INFO get fill token, need to append more text token
2025-01-30 16:44:34,171 INFO not enough text token to decode, wait for more
2025-01-30 16:44:34,171 INFO get fill token, need to append more text token
2025-01-30 16:44:34,171 INFO not enough text token to decode, wait for more
2025-01-30 16:44:34,171 INFO get fill token, need to append more text token
2025-01-30 16:44:34,171 INFO not enough text token to decode, wait for more
2025-01-30 16:44:34,171 INFO get fill token, need to append more text token
2025-01-30 16:44:34,171 INFO append 5 text token
2025-01-30 16:44:34,657 INFO fill_token index 16 next fill_token index 32
2025-01-30 16:44:34,657 INFO get fill token, need to append more text token
2025-01-30 16:44:34,657 INFO not enough text token to decode, wait for more
2025-01-30 16:44:34,657 INFO get fill token, need to append more text token
2025-01-30 16:44:34,657 INFO not enough text token to decode, wait for more
2025-01-30 16:44:34,657 INFO get fill token, need to append more text token
2025-01-30 16:44:34,657 INFO not enough text token to decode, wait for more
2025-01-30 16:44:34,657 INFO get fill token, need to append more text token
2025-01-30 16:44:34,657 INFO not enough text token to decode, wait for more
2025-01-30 16:44:34,657 INFO get fill token, need to append more text token
2025-01-30 16:44:34,657 INFO append 5 text token
2025-01-30 16:44:35,160 INFO fill_token index 32 next fill_token index 48
2025-01-30 16:44:35,160 INFO get fill token, need to append more text token
2025-01-30 16:44:35,160 INFO not enough text token to decode, wait for more
2025-01-30 16:44:35,160 INFO get fill token, need to append more text token
2025-01-30 16:44:35,160 INFO not enough text token to decode, wait for more
2025-01-30 16:44:35,160 INFO get fill token, need to append more text token
2025-01-30 16:44:35,161 INFO not enough text token to decode, wait for more
2025-01-30 16:44:35,161 INFO get fill token, need to append more text token
2025-01-30 16:44:35,161 INFO not enough text token to decode, wait for more
2025-01-30 16:44:35,161 INFO get fill token, need to append more text token
2025-01-30 16:44:35,161 INFO append 5 text token
2025-01-30 16:44:35,661 INFO fill_token index 48 next fill_token index 64
2025-01-30 16:44:35,661 INFO get fill token, need to append more text token
2025-01-30 16:44:35,661 INFO not enough text token to decode, wait for more
2025-01-30 16:44:35,661 INFO get fill token, need to append more text token
2025-01-30 16:44:35,661 INFO not enough text token to decode, wait for more
2025-01-30 16:44:35,661 INFO get fill token, need to append more text token
2025-01-30 16:44:35,661 INFO not enough text token to decode, wait for more
2025-01-30 16:44:35,661 INFO get fill token, need to append more text token
2025-01-30 16:44:35,661 INFO not enough text token to decode, wait for more
2025-01-30 16:44:35,661 INFO get fill token, need to append more text token
2025-01-30 16:44:35,661 INFO append 5 text token
2025-01-30 16:44:36,152 INFO fill_token index 64 next fill_token index 80
2025-01-30 16:44:36,152 INFO get fill token, need to append more text token
2025-01-30 16:44:36,152 INFO not enough text token to decode, wait for more
2025-01-30 16:44:36,152 INFO get fill token, need to append more text token
2025-01-30 16:44:36,152 INFO not enough text token to decode, wait for more
2025-01-30 16:44:36,152 INFO get fill token, need to append more text token
2025-01-30 16:44:36,152 INFO not enough text token to decode, wait for more
2025-01-30 16:44:36,152 INFO get fill token, need to append more text token
2025-01-30 16:44:36,152 INFO not enough text token to decode, wait for more
2025-01-30 16:44:36,152 INFO get fill token, need to append more text token
2025-01-30 16:44:36,152 INFO append 5 text token
2025-01-30 16:44:36,657 INFO fill_token index 80 next fill_token index 96
2025-01-30 16:44:36,657 INFO get fill token, need to append more text token
2025-01-30 16:44:36,657 INFO not enough text token to decode, wait for more
2025-01-30 16:44:36,657 INFO get fill token, need to append more text token
2025-01-30 16:44:36,657 INFO not enough text token to decode, wait for more
2025-01-30 16:44:36,657 INFO get fill token, need to append more text token
2025-01-30 16:44:36,657 INFO not enough text token to decode, wait for more
2025-01-30 16:44:36,657 INFO get fill token, need to append more text token
2025-01-30 16:44:36,657 INFO not enough text token to decode, wait for more
2025-01-30 16:44:36,657 INFO get fill token, need to append more text token
2025-01-30 16:44:36,657 INFO append 5 text token
2025-01-30 16:44:37,193 INFO fill_token index 96 next fill_token index 112
2025-01-30 16:44:37,193 INFO get fill token, need to append more text token
2025-01-30 16:44:37,193 INFO not enough text token to decode, wait for more
2025-01-30 16:44:37,193 INFO get fill token, need to append more text token
2025-01-30 16:44:37,193 INFO not enough text token to decode, wait for more
2025-01-30 16:44:37,193 INFO get fill token, need to append more text token
2025-01-30 16:44:37,193 INFO not enough text token to decode, wait for more
2025-01-30 16:44:37,193 INFO get fill token, need to append more text token
2025-01-30 16:44:37,193 INFO not enough text token to decode, wait for more
2025-01-30 16:44:37,193 INFO get fill token, need to append more text token
2025-01-30 16:44:37,193 INFO append 5 text token
2025-01-30 16:44:37,722 INFO fill_token index 112 next fill_token index 128
2025-01-30 16:44:37,722 INFO get fill token, need to append more text token
2025-01-30 16:44:37,722 INFO not enough text token to decode, wait for more
2025-01-30 16:44:37,722 INFO get fill token, need to append more text token
2025-01-30 16:44:37,722 INFO not enough text token to decode, wait for more
2025-01-30 16:44:37,722 INFO get fill token, need to append more text token
2025-01-30 16:44:37,722 INFO not enough text token to decode, wait for more
2025-01-30 16:44:37,722 INFO get fill token, need to append more text token
2025-01-30 16:44:37,722 INFO not enough text token to decode, wait for more
2025-01-30 16:44:37,722 INFO get fill token, need to append more text token
2025-01-30 16:44:37,722 INFO append 5 text token
2025-01-30 16:44:38,304 INFO fill_token index 128 next fill_token index 144
2025-01-30 16:44:38,304 INFO get fill token, need to append more text token
2025-01-30 16:44:38,304 INFO not enough text token to decode, wait for more
2025-01-30 16:44:38,304 INFO get fill token, need to append more text token
2025-01-30 16:44:38,304 INFO not enough text token to decode, wait for more
2025-01-30 16:44:38,304 INFO get fill token, need to append more text token
2025-01-30 16:44:38,304 INFO not enough text token to decode, wait for more
2025-01-30 16:44:38,304 INFO get fill token, need to append more text token
2025-01-30 16:44:38,304 INFO not enough text token to decode, wait for more
2025-01-30 16:44:38,304 INFO get fill token, need to append more text token
2025-01-30 16:44:38,304 INFO append 5 text token
2025-01-30 16:44:39,038 INFO fill_token index 144 next fill_token index 160
2025-01-30 16:44:39,039 INFO get fill token, need to append more text token
2025-01-30 16:44:39,039 INFO not enough text token to decode, wait for more
2025-01-30 16:44:39,039 INFO get fill token, need to append more text token
2025-01-30 16:44:39,039 INFO not enough text token to decode, wait for more
2025-01-30 16:44:39,039 INFO no more text token, decode until met eos
2025-01-30 16:44:51,109 INFO yield speech len 10.96, rtf 1.5622105911700395
100%|██████████████████████████████████████████████| 1/1 [00:17<00:00, 17.36s/it]

生成された音声ファイルはこちら

まとめ

今回は試せなかったのですが、複数言語が入り混じったものや感情を込めた音声を生成できるみたいです。

公式のデモではこんなこともできるの!?と驚くものもあれば、まだまだ発展途上なところもあってこれからが楽しみですね。

次回はAlibaba Cloud上で提供されているIntelligent Speech Interactionを試してみようと思います。

Discussion