背景

以下のスクリプトを実行する時に、

from espnet2.bin.tts_inference import Text2Speech
import soundfile
text2speech = Text2Speech.from_pretrained("kan-bayashi/ljspeech_vits")
text = "Hello, this is a text-to-speech test. Does my speech sound good?"
speech = text2speech(text)["wav"]
soundfile.write("output.wav", speech.numpy(), text2speech.fs, "PCM_16")

NLTKデータがダウンロードされます。

(.venv) PS D:\espnet-lab> python .\tts.py
Failed to import Flash Attention, using ESPnet default: No module named 'flash_attn'
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     C:\Users\zzxia\AppData\Roaming\nltk_data...
[nltk_data]   Unzipping taggers\averaged_perceptron_tagger.zip.
[nltk_data] Downloading package cmudict to
[nltk_data]     C:\Users\zzxia\AppData\Roaming\nltk_data...
[nltk_data]   Unzipping corpora\cmudict.zip.

nltk_dataフォルダを削除すると、スクリプトが再実行する時に、NLTKデータに関するエラーが出ます。

今はスクリプトを再実行しても、NLTKデータのダウンロードは再開されなく、↓の状態になっています。

(.venv) PS D:\espnet-lab> python tts.py
Failed to import Flash Attention, using ESPnet default: No module named 'flash_attn'
D:\espnet-lab\.venv\lib\site-packages\torch\nn\utils\weight_norm.py:143: FutureWarning: `torch.nn.utils.weight_norm` is deprecated in favor of `torch.nn.utils.parametrizations.weight_norm`.
  WeightNorm.apply(module, name, dim)
D:\espnet-lab\.venv\lib\site-packages\espnet2\gan_tts\vits\monotonic_align\__init__.py:19: UserWarning: Cython version is not available. Fallback to 'EXPERIMETAL' numba version. If you want to use the cython version, please build it as follows: `cd espnet2/gan_tts/vits/monotonic_align; python setup.py build_ext --inplace`
  warnings.warn(
WARNING:root:It seems weight norm is not applied in the pretrained model but the current model uses it. To keep the compatibility, we remove the norm from the current model. This may cause unexpected behavior due to the parameter mismatch in finetuning. To avoid this issue, please change the following parameters in config to false:
 - discriminator_params.follow_official_norm
 - discriminator_params.scale_discriminator_params.use_weight_norm
 - discriminator_params.scale_discriminator_params.use_spectral_norm

See also:
 - https://github.com/espnet/espnet/pull/5240
 - https://github.com/espnet/espnet/pull/5249
Traceback (most recent call last):
  File "D:\espnet-lab\tts.py", line 5, in <module>
    speech = text2speech(text)["wav"]
  File "D:\espnet-lab\.venv\lib\site-packages\torch\utils\_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "D:\espnet-lab\.venv\lib\site-packages\espnet2\bin\tts_inference.py", line 173, in __call__
    text = self.preprocess_fn("<dummy>", dict(text=text))["text"]
  File "D:\espnet-lab\.venv\lib\site-packages\espnet2\train\preprocessor.py", line 548, in __call__
    data = self._text_process(data)
  File "D:\espnet-lab\.venv\lib\site-packages\espnet2\train\preprocessor.py", line 483, in _text_process
    tokens = self.tokenizer.text2tokens(text)
  File "D:\espnet-lab\.venv\lib\site-packages\espnet2\text\phoneme_tokenizer.py", line 623, in text2tokens
    tokens = self.g2p(line)
  File "D:\espnet-lab\.venv\lib\site-packages\espnet2\text\phoneme_tokenizer.py", line 260, in __call__
    phones = self.g2p(text)
  File "D:\espnet-lab\.venv\lib\site-packages\g2p_en\g2p.py", line 162, in __call__
    tokens = pos_tag(words)  # tuples of (word, tag)
  File "D:\espnet-lab\.venv\lib\site-packages\nltk\tag\__init__.py", line 168, in pos_tag
    tagger = _get_tagger(lang)
  File "D:\espnet-lab\.venv\lib\site-packages\nltk\tag\__init__.py", line 110, in _get_tagger
    tagger = PerceptronTagger()
  File "D:\espnet-lab\.venv\lib\site-packages\nltk\tag\perceptron.py", line 183, in __init__
    self.load_from_json(lang)
  File "D:\espnet-lab\.venv\lib\site-packages\nltk\tag\perceptron.py", line 273, in load_from_json
    loc = find(f"taggers/averaged_perceptron_tagger_{lang}/")
  File "D:\espnet-lab\.venv\lib\site-packages\nltk\data.py", line 579, in find
    raise LookupError(resource_not_found)
LookupError:
**********************************************************************
  Resource averaged_perceptron_tagger_eng not found.
  Please use the NLTK Downloader to obtain the resource:

  >>> import nltk
  >>> nltk.download('averaged_perceptron_tagger_eng')

  For more information see: https://www.nltk.org/data.html

  Attempted to load taggers/averaged_perceptron_tagger_eng/

  Searched in:
    - 'C:\\Users\\zzxia/nltk_data'
    - 'D:\\espnet-lab\\.venv\\nltk_data'
    - 'D:\\espnet-lab\\.venv\\share\\nltk_data'
    - 'D:\\espnet-lab\\.venv\\lib\\nltk_data'
    - 'C:\\Users\\zzxia\\AppData\\Roaming\\nltk_data'
    - 'C:\\nltk_data'
    - 'D:\\nltk_data'
    - 'E:\\nltk_data'
**********************************************************************

原因を調べたいです。

豚&紙箱

ESPnetはどこでNLTKを使う

上記エラーが出る時のコールスタックに記載されているが、ESPnetはg2p関数を呼び出す時に、g2p_enパッケージを使います。g2p_enがnltkを使います。

豚&紙箱

g2p_en/g2p.pyの冒頭に、NLTKデータをチェックして、存在しなかったらダウンロードするソースコードがあります。なぜ実行しなかったでしょう。

豚&紙箱

デバッグしたところ、nltk.data.findはなんと成功しました。

豚&紙箱

C:\Users\zzxia\AppData\Roaming\nltk_dataは存在します。

> ls C:\Users\zzxia\AppData\Roaming\nltk_data


    Directory: C:\Users\zzxia\AppData\Roaming\nltk_data


Mode                 LastWriteTime         Length Name
----                 -------------         ------ ----
d-----          8/1/2025  11:23 PM                corpora
d-----          8/1/2025  11:23 PM                taggers

豚&紙箱

では、なぜ冒頭のLookupErrorが出るでしょう。

豚&紙箱

nltkはnltk_data\taggers\averaged_perceptron_tagger_engフォルダを探しています。

でも、そんなフォルダは存在しません。

> ls C:\Users\zzxia\AppData\Roaming\nltk_data\taggers\


    Directory: C:\Users\zzxia\AppData\Roaming\nltk_data\taggers


Mode                 LastWriteTime         Length Name
----                 -------------         ------ ----
d-----          8/1/2025  11:23 PM                averaged_perceptron_tagger
-a----          8/1/2025  11:23 PM        2526731 averaged_perceptron_tagger.zip

豚&紙箱

nltk_dataフォルダを削除して、スクリプトを再実行したら、NLTKデータのダウンロードは再開しました。ても、LookupErrorはまだ出ます。

(.venv) PS D:\espnet-lab> python tts.py
Failed to import Flash Attention, using ESPnet default: No module named 'flash_attn'
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     C:\Users\zzxia\AppData\Roaming\nltk_data...
[nltk_data]   Unzipping taggers\averaged_perceptron_tagger.zip.
[nltk_data] Downloading package cmudict to
[nltk_data]     C:\Users\zzxia\AppData\Roaming\nltk_data...
[nltk_data]   Unzipping corpora\cmudict.zip.
D:\espnet-lab\.venv\lib\site-packages\torch\nn\utils\weight_norm.py:143: FutureWarning: `torch.nn.utils.weight_norm` is deprecated in favor of `torch.nn.utils.parametrizations.weight_norm`.
  WeightNorm.apply(module, name, dim)
D:\espnet-lab\.venv\lib\site-packages\espnet2\gan_tts\vits\monotonic_align\__init__.py:19: UserWarning: Cython version is not available. Fallback to 'EXPERIMETAL' numba version. If you want to use the cython version, please build it as follows: `cd espnet2/gan_tts/vits/monotonic_align; python setup.py build_ext --inplace`
  warnings.warn(
WARNING:root:It seems weight norm is not applied in the pretrained model but the current model uses it. To keep the compatibility, we remove the norm from the current model. This may cause unexpected behavior due to the parameter mismatch in finetuning. To avoid this issue, please change the following parameters in config to false:
 - discriminator_params.follow_official_norm
 - discriminator_params.scale_discriminator_params.use_weight_norm
 - discriminator_params.scale_discriminator_params.use_spectral_norm

See also:
 - https://github.com/espnet/espnet/pull/5240
 - https://github.com/espnet/espnet/pull/5249
Traceback (most recent call last):
  File "D:\espnet-lab\tts.py", line 5, in <module>
    speech = text2speech(text)["wav"]
  File "D:\espnet-lab\.venv\lib\site-packages\torch\utils\_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "D:\espnet-lab\.venv\lib\site-packages\espnet2\bin\tts_inference.py", line 173, in __call__
    text = self.preprocess_fn("<dummy>", dict(text=text))["text"]
  File "D:\espnet-lab\.venv\lib\site-packages\espnet2\train\preprocessor.py", line 548, in __call__
    data = self._text_process(data)
  File "D:\espnet-lab\.venv\lib\site-packages\espnet2\train\preprocessor.py", line 483, in _text_process
    tokens = self.tokenizer.text2tokens(text)
  File "D:\espnet-lab\.venv\lib\site-packages\espnet2\text\phoneme_tokenizer.py", line 623, in text2tokens
    tokens = self.g2p(line)
  File "D:\espnet-lab\.venv\lib\site-packages\espnet2\text\phoneme_tokenizer.py", line 260, in __call__
    phones = self.g2p(text)
  File "D:\espnet-lab\.venv\lib\site-packages\g2p_en\g2p.py", line 162, in __call__
    tokens = pos_tag(words)  # tuples of (word, tag)
  File "D:\espnet-lab\.venv\lib\site-packages\nltk\tag\__init__.py", line 168, in pos_tag
    tagger = _get_tagger(lang)
  File "D:\espnet-lab\.venv\lib\site-packages\nltk\tag\__init__.py", line 110, in _get_tagger
    tagger = PerceptronTagger()
  File "D:\espnet-lab\.venv\lib\site-packages\nltk\tag\perceptron.py", line 183, in __init__
    self.load_from_json(lang)
  File "D:\espnet-lab\.venv\lib\site-packages\nltk\tag\perceptron.py", line 273, in load_from_json
    loc = find(f"taggers/averaged_perceptron_tagger_{lang}/")
  File "D:\espnet-lab\.venv\lib\site-packages\nltk\data.py", line 579, in find
    raise LookupError(resource_not_found)
LookupError:
**********************************************************************
  Resource averaged_perceptron_tagger_eng not found.
  Please use the NLTK Downloader to obtain the resource:

  >>> import nltk
  >>> nltk.download('averaged_perceptron_tagger_eng')

  For more information see: https://www.nltk.org/data.html

  Attempted to load taggers/averaged_perceptron_tagger_eng/

  Searched in:
    - 'C:\\Users\\zzxia/nltk_data'
    - 'D:\\espnet-lab\\.venv\\nltk_data'
    - 'D:\\espnet-lab\\.venv\\share\\nltk_data'
    - 'D:\\espnet-lab\\.venv\\lib\\nltk_data'
    - 'C:\\Users\\zzxia\\AppData\\Roaming\\nltk_data'
    - 'C:\\nltk_data'
    - 'D:\\nltk_data'
    - 'E:\\nltk_data'
**********************************************************************

豚&紙箱

 原因nltkは3.8.2から、taggers/averaged_perceptron_tagger_{lang}/を探すようになっています。
https://github.com/nltk/nltk/blob/0753ee5bb0096b4a4b3dd587e784ce07e7f34dab/nltk/tag/perceptron.py#L273
でも、g2p_enはそれに合わせて更新していません。
https://github.com/Kyubyong/g2p/blob/c6439c274c42b9724a7fee1dc07ca6a4c68a0538/g2p_en/g2p.py#L20-L23
これが原因ですね。
nltkが3.8.2をリリースしたのは2024年です。

しかし、g2p_enは2019年から更新していません。
nltkのバージョンを3.8.1にダウングレードしたら解決できるかもしれません。

豚&紙箱

(.venv) PS D:\espnet-lab> pip install nltk==3.8.1
Collecting nltk==3.8.1
  Downloading nltk-3.8.1-py3-none-any.whl.metadata (2.8 kB)
Requirement already satisfied: click in d:\espnet-lab\.venv\lib\site-packages (from nltk==3.8.1) (8.2.1)
Requirement already satisfied: joblib in d:\espnet-lab\.venv\lib\site-packages (from nltk==3.8.1) (1.5.1)
Requirement already satisfied: regex>=2021.8.3 in d:\espnet-lab\.venv\lib\site-packages (from nltk==3.8.1) (2024.11.6)
Requirement already satisfied: tqdm in d:\espnet-lab\.venv\lib\site-packages (from nltk==3.8.1) (4.67.1)
Requirement already satisfied: colorama in d:\espnet-lab\.venv\lib\site-packages (from click->nltk==3.8.1) (0.4.6)
Downloading nltk-3.8.1-py3-none-any.whl (1.5 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.5/1.5 MB 10.0 MB/s eta 0:00:00
Installing collected packages: nltk
  Attempting uninstall: nltk
    Found existing installation: nltk 3.9.1
    Uninstalling nltk-3.9.1:
      Successfully uninstalled nltk-3.9.1
Successfully installed nltk-3.8.1

豚&紙箱

エラーが解消できました！

(.venv) PS D:\espnet-lab> python tts.py
Failed to import Flash Attention, using ESPnet default: No module named 'flash_attn'
D:\espnet-lab\.venv\lib\site-packages\torch\nn\utils\weight_norm.py:143: FutureWarning: `torch.nn.utils.weight_norm` is deprecated in favor of `torch.nn.utils.parametrizations.weight_norm`.
  WeightNorm.apply(module, name, dim)
D:\espnet-lab\.venv\lib\site-packages\espnet2\gan_tts\vits\monotonic_align\__init__.py:19: UserWarning: Cython version is not available. Fallback to 'EXPERIMETAL' numba version. If you want to use the cython version, please build it as follows: `cd espnet2/gan_tts/vits/monotonic_align; python setup.py build_ext --inplace`
  warnings.warn(
WARNING:root:It seems weight norm is not applied in the pretrained model but the current model uses it. To keep the compatibility, we remove the norm from the current model. This may cause unexpected behavior due to the parameter mismatch in finetuning. To avoid this issue, please change the following parameters in config to false:
 - discriminator_params.follow_official_norm
 - discriminator_params.scale_discriminator_params.use_weight_norm
 - discriminator_params.scale_discriminator_params.use_spectral_norm

See also:
 - https://github.com/espnet/espnet/pull/5240
 - https://github.com/espnet/espnet/pull/5249

豚&紙箱

espnetのsetup.pyに、修正を提案したいですね。今は↓になっています。
https://github.com/espnet/espnet/blob/ee7109c5cd9dbd94eae1beb1f62ce3f515439605/setup.py#L26
nltk>=3.4.5,<3.8.2に変えたほうがいいです。

豚&紙箱

NLTK 3.8.1より一個上のバージョンで試しました。やはりエラーはまた出ます。

PyPIに3.8.2は上がっていなく、一個上は3.9b1です。

(.venv) PS D:\espnet-lab> pip install nltk==3.8.2
ERROR: Ignored the following yanked versions: 3.6.4
ERROR: Could not find a version that satisfies the requirement nltk==3.8.2 (from versions: 2.0b4, 2.0b5, 2.0b6, 2.0b7, 2.0b8, 2.0b9, 2.0.1rc1, 2.0.1rc3, 2.0.1rc4, 2.0.1, 2.0.2, 2.0.3, 2.0.4, 2.0.5, 3.0.0b1, 3.0.0b2, 3.0.0, 3.0.1, 3.0.2, 3.0.3, 3.0.4, 3.0.5, 3.1, 3.2, 3.2.1, 3.2.2, 3.2.3, 3.2.4, 3.2.5, 3.3.0, 3.4, 3.4.1, 3.4.2, 3.4.3, 3.4.4, 3.4.5, 3.5b1, 3.5, 3.6, 3.6.1, 3.6.2, 3.6.3, 3.6.5, 3.6.6, 3.6.7, 3.7, 3.8, 3.8.1, 3.9b1, 3.9, 3.9.1)
ERROR: No matching distribution found for nltk==3.8.2
(.venv) PS D:\espnet-lab> pip install nltk==3.9b1
Collecting nltk==3.9b1
  Downloading nltk-3.9b1-py3-none-any.whl.metadata (2.9 kB)
Requirement already satisfied: click in d:\espnet-lab\.venv\lib\site-packages (from nltk==3.9b1) (8.2.1)
Requirement already satisfied: joblib in d:\espnet-lab\.venv\lib\site-packages (from nltk==3.9b1) (1.5.1)
Requirement already satisfied: regex>=2021.8.3 in d:\espnet-lab\.venv\lib\site-packages (from nltk==3.9b1) (2024.11.6)
Requirement already satisfied: tqdm in d:\espnet-lab\.venv\lib\site-packages (from nltk==3.9b1) (4.67.1)
Requirement already satisfied: colorama in d:\espnet-lab\.venv\lib\site-packages (from click->nltk==3.9b1) (0.4.6)
Downloading nltk-3.9b1-py3-none-any.whl (1.5 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.5/1.5 MB 8.0 MB/s eta 0:00:00
Installing collected packages: nltk
  Attempting uninstall: nltk
    Found existing installation: nltk 3.8.1
    Uninstalling nltk-3.8.1:
      Successfully uninstalled nltk-3.8.1
Successfully installed nltk-3.9b1
(.venv) PS D:\espnet-lab> python tts.py
Failed to import Flash Attention, using ESPnet default: No module named 'flash_attn'
D:\espnet-lab\.venv\lib\site-packages\torch\nn\utils\weight_norm.py:143: FutureWarning: `torch.nn.utils.weight_norm` is deprecated in favor of `torch.nn.utils.parametrizations.weight_norm`.
  WeightNorm.apply(module, name, dim)
D:\espnet-lab\.venv\lib\site-packages\espnet2\gan_tts\vits\monotonic_align\__init__.py:19: UserWarning: Cython version is not available. Fallback to 'EXPERIMETAL' numba version. If you want to use the cython version, please build it as follows: `cd espnet2/gan_tts/vits/monotonic_align; python setup.py build_ext --inplace`
  warnings.warn(
WARNING:root:It seems weight norm is not applied in the pretrained model but the current model uses it. To keep the compatibility, we remove the norm from the current model. This may cause unexpected behavior due to the parameter mismatch in finetuning. To avoid this issue, please change the following parameters in config to false:
 - discriminator_params.follow_official_norm
 - discriminator_params.scale_discriminator_params.use_weight_norm
 - discriminator_params.scale_discriminator_params.use_spectral_norm

See also:
 - https://github.com/espnet/espnet/pull/5240
 - https://github.com/espnet/espnet/pull/5249
Traceback (most recent call last):
  File "D:\espnet-lab\tts.py", line 5, in <module>
    speech = text2speech(text)["wav"]
  File "D:\espnet-lab\.venv\lib\site-packages\torch\utils\_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "D:\espnet-lab\.venv\lib\site-packages\espnet2\bin\tts_inference.py", line 173, in __call__
    text = self.preprocess_fn("<dummy>", dict(text=text))["text"]
  File "D:\espnet-lab\.venv\lib\site-packages\espnet2\train\preprocessor.py", line 548, in __call__
    data = self._text_process(data)
  File "D:\espnet-lab\.venv\lib\site-packages\espnet2\train\preprocessor.py", line 483, in _text_process
    tokens = self.tokenizer.text2tokens(text)
  File "D:\espnet-lab\.venv\lib\site-packages\espnet2\text\phoneme_tokenizer.py", line 623, in text2tokens
    tokens = self.g2p(line)
  File "D:\espnet-lab\.venv\lib\site-packages\espnet2\text\phoneme_tokenizer.py", line 260, in __call__
    phones = self.g2p(text)
  File "D:\espnet-lab\.venv\lib\site-packages\g2p_en\g2p.py", line 162, in __call__
    tokens = pos_tag(words)  # tuples of (word, tag)
  File "D:\espnet-lab\.venv\lib\site-packages\nltk\tag\__init__.py", line 165, in pos_tag
    tagger = _get_tagger(lang)
  File "D:\espnet-lab\.venv\lib\site-packages\nltk\tag\__init__.py", line 107, in _get_tagger
    tagger = PerceptronTagger()
  File "D:\espnet-lab\.venv\lib\site-packages\nltk\tag\perceptron.py", line 183, in __init__
    self.load_from_json(lang)
  File "D:\espnet-lab\.venv\lib\site-packages\nltk\tag\perceptron.py", line 273, in load_from_json
    loc = find(f"taggers/averaged_perceptron_tagger_{lang}/")
  File "D:\espnet-lab\.venv\lib\site-packages\nltk\data.py", line 582, in find
    raise LookupError(resource_not_found)
LookupError:
**********************************************************************
  Resource averaged_perceptron_tagger_eng not found.
  Please use the NLTK Downloader to obtain the resource:

  >>> import nltk
  >>> nltk.download('averaged_perceptron_tagger_eng')

  For more information see: https://www.nltk.org/data.html

  Attempted to load taggers/averaged_perceptron_tagger_eng/

  Searched in:
    - 'C:\\Users\\zzxia/nltk_data'
    - 'D:\\espnet-lab\\.venv\\nltk_data'
    - 'D:\\espnet-lab\\.venv\\share\\nltk_data'
    - 'D:\\espnet-lab\\.venv\\lib\\nltk_data'
    - 'C:\\Users\\zzxia\\AppData\\Roaming\\nltk_data'
    - 'C:\\nltk_data'
    - 'D:\\nltk_data'
    - 'E:\\nltk_data'
**********************************************************************

豚&紙箱

プルリクエストを提出しました。