ESPnet推論を実行する時のNLTKエラー

背景
以下のスクリプトを実行する時に、
from espnet2.bin.tts_inference import Text2Speech
import soundfile
text2speech = Text2Speech.from_pretrained("kan-bayashi/ljspeech_vits")
text = "Hello, this is a text-to-speech test. Does my speech sound good?"
speech = text2speech(text)["wav"]
soundfile.write("output.wav", speech.numpy(), text2speech.fs, "PCM_16")
NLTKデータがダウンロードされます。
(.venv) PS D:\espnet-lab> python .\tts.py
Failed to import Flash Attention, using ESPnet default: No module named 'flash_attn'
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data] C:\Users\zzxia\AppData\Roaming\nltk_data...
[nltk_data] Unzipping taggers\averaged_perceptron_tagger.zip.
[nltk_data] Downloading package cmudict to
[nltk_data] C:\Users\zzxia\AppData\Roaming\nltk_data...
[nltk_data] Unzipping corpora\cmudict.zip.
nltk_data
フォルダを削除すると、スクリプトが再実行する時に、NLTKデータに関するエラーが出ます。
今はスクリプトを再実行しても、NLTKデータのダウンロードは再開されなく、↓の状態になっています。
(.venv) PS D:\espnet-lab> python tts.py
Failed to import Flash Attention, using ESPnet default: No module named 'flash_attn'
D:\espnet-lab\.venv\lib\site-packages\torch\nn\utils\weight_norm.py:143: FutureWarning: `torch.nn.utils.weight_norm` is deprecated in favor of `torch.nn.utils.parametrizations.weight_norm`.
WeightNorm.apply(module, name, dim)
D:\espnet-lab\.venv\lib\site-packages\espnet2\gan_tts\vits\monotonic_align\__init__.py:19: UserWarning: Cython version is not available. Fallback to 'EXPERIMETAL' numba version. If you want to use the cython version, please build it as follows: `cd espnet2/gan_tts/vits/monotonic_align; python setup.py build_ext --inplace`
warnings.warn(
WARNING:root:It seems weight norm is not applied in the pretrained model but the current model uses it. To keep the compatibility, we remove the norm from the current model. This may cause unexpected behavior due to the parameter mismatch in finetuning. To avoid this issue, please change the following parameters in config to false:
- discriminator_params.follow_official_norm
- discriminator_params.scale_discriminator_params.use_weight_norm
- discriminator_params.scale_discriminator_params.use_spectral_norm
See also:
- https://github.com/espnet/espnet/pull/5240
- https://github.com/espnet/espnet/pull/5249
Traceback (most recent call last):
File "D:\espnet-lab\tts.py", line 5, in <module>
speech = text2speech(text)["wav"]
File "D:\espnet-lab\.venv\lib\site-packages\torch\utils\_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "D:\espnet-lab\.venv\lib\site-packages\espnet2\bin\tts_inference.py", line 173, in __call__
text = self.preprocess_fn("<dummy>", dict(text=text))["text"]
File "D:\espnet-lab\.venv\lib\site-packages\espnet2\train\preprocessor.py", line 548, in __call__
data = self._text_process(data)
File "D:\espnet-lab\.venv\lib\site-packages\espnet2\train\preprocessor.py", line 483, in _text_process
tokens = self.tokenizer.text2tokens(text)
File "D:\espnet-lab\.venv\lib\site-packages\espnet2\text\phoneme_tokenizer.py", line 623, in text2tokens
tokens = self.g2p(line)
File "D:\espnet-lab\.venv\lib\site-packages\espnet2\text\phoneme_tokenizer.py", line 260, in __call__
phones = self.g2p(text)
File "D:\espnet-lab\.venv\lib\site-packages\g2p_en\g2p.py", line 162, in __call__
tokens = pos_tag(words) # tuples of (word, tag)
File "D:\espnet-lab\.venv\lib\site-packages\nltk\tag\__init__.py", line 168, in pos_tag
tagger = _get_tagger(lang)
File "D:\espnet-lab\.venv\lib\site-packages\nltk\tag\__init__.py", line 110, in _get_tagger
tagger = PerceptronTagger()
File "D:\espnet-lab\.venv\lib\site-packages\nltk\tag\perceptron.py", line 183, in __init__
self.load_from_json(lang)
File "D:\espnet-lab\.venv\lib\site-packages\nltk\tag\perceptron.py", line 273, in load_from_json
loc = find(f"taggers/averaged_perceptron_tagger_{lang}/")
File "D:\espnet-lab\.venv\lib\site-packages\nltk\data.py", line 579, in find
raise LookupError(resource_not_found)
LookupError:
**********************************************************************
Resource averaged_perceptron_tagger_eng not found.
Please use the NLTK Downloader to obtain the resource:
>>> import nltk
>>> nltk.download('averaged_perceptron_tagger_eng')
For more information see: https://www.nltk.org/data.html
Attempted to load taggers/averaged_perceptron_tagger_eng/
Searched in:
- 'C:\\Users\\zzxia/nltk_data'
- 'D:\\espnet-lab\\.venv\\nltk_data'
- 'D:\\espnet-lab\\.venv\\share\\nltk_data'
- 'D:\\espnet-lab\\.venv\\lib\\nltk_data'
- 'C:\\Users\\zzxia\\AppData\\Roaming\\nltk_data'
- 'C:\\nltk_data'
- 'D:\\nltk_data'
- 'E:\\nltk_data'
**********************************************************************
原因を調べたいです。

ESPnetはどこでNLTKを使う
上記エラーが出る時のコールスタックに記載されているが、ESPnetはg2p
関数を呼び出す時に、g2p_en
パッケージを使います。g2p_en
がnltk
を使います。

g2p_en/g2p.py
の冒頭に、NLTKデータをチェックして、存在しなかったらダウンロードするソースコードがあります。なぜ実行しなかったでしょう。

デバッグしたところ、nltk.data.find
はなんと成功しました。

C:\Users\zzxia\AppData\Roaming\nltk_data
は存在します。
> ls C:\Users\zzxia\AppData\Roaming\nltk_data
Directory: C:\Users\zzxia\AppData\Roaming\nltk_data
Mode LastWriteTime Length Name
---- ------------- ------ ----
d----- 8/1/2025 11:23 PM corpora
d----- 8/1/2025 11:23 PM taggers

では、なぜ冒頭のLookupError
が出るでしょう。

nltk
はnltk_data\taggers\averaged_perceptron_tagger_eng
フォルダを探しています。
でも、そんなフォルダは存在しません。
> ls C:\Users\zzxia\AppData\Roaming\nltk_data\taggers\
Directory: C:\Users\zzxia\AppData\Roaming\nltk_data\taggers
Mode LastWriteTime Length Name
---- ------------- ------ ----
d----- 8/1/2025 11:23 PM averaged_perceptron_tagger
-a---- 8/1/2025 11:23 PM 2526731 averaged_perceptron_tagger.zip

nltk_data
フォルダを削除して、スクリプトを再実行したら、NLTKデータのダウンロードは再開しました。ても、LookupError
はまだ出ます。
(.venv) PS D:\espnet-lab> python tts.py
Failed to import Flash Attention, using ESPnet default: No module named 'flash_attn'
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data] C:\Users\zzxia\AppData\Roaming\nltk_data...
[nltk_data] Unzipping taggers\averaged_perceptron_tagger.zip.
[nltk_data] Downloading package cmudict to
[nltk_data] C:\Users\zzxia\AppData\Roaming\nltk_data...
[nltk_data] Unzipping corpora\cmudict.zip.
D:\espnet-lab\.venv\lib\site-packages\torch\nn\utils\weight_norm.py:143: FutureWarning: `torch.nn.utils.weight_norm` is deprecated in favor of `torch.nn.utils.parametrizations.weight_norm`.
WeightNorm.apply(module, name, dim)
D:\espnet-lab\.venv\lib\site-packages\espnet2\gan_tts\vits\monotonic_align\__init__.py:19: UserWarning: Cython version is not available. Fallback to 'EXPERIMETAL' numba version. If you want to use the cython version, please build it as follows: `cd espnet2/gan_tts/vits/monotonic_align; python setup.py build_ext --inplace`
warnings.warn(
WARNING:root:It seems weight norm is not applied in the pretrained model but the current model uses it. To keep the compatibility, we remove the norm from the current model. This may cause unexpected behavior due to the parameter mismatch in finetuning. To avoid this issue, please change the following parameters in config to false:
- discriminator_params.follow_official_norm
- discriminator_params.scale_discriminator_params.use_weight_norm
- discriminator_params.scale_discriminator_params.use_spectral_norm
See also:
- https://github.com/espnet/espnet/pull/5240
- https://github.com/espnet/espnet/pull/5249
Traceback (most recent call last):
File "D:\espnet-lab\tts.py", line 5, in <module>
speech = text2speech(text)["wav"]
File "D:\espnet-lab\.venv\lib\site-packages\torch\utils\_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "D:\espnet-lab\.venv\lib\site-packages\espnet2\bin\tts_inference.py", line 173, in __call__
text = self.preprocess_fn("<dummy>", dict(text=text))["text"]
File "D:\espnet-lab\.venv\lib\site-packages\espnet2\train\preprocessor.py", line 548, in __call__
data = self._text_process(data)
File "D:\espnet-lab\.venv\lib\site-packages\espnet2\train\preprocessor.py", line 483, in _text_process
tokens = self.tokenizer.text2tokens(text)
File "D:\espnet-lab\.venv\lib\site-packages\espnet2\text\phoneme_tokenizer.py", line 623, in text2tokens
tokens = self.g2p(line)
File "D:\espnet-lab\.venv\lib\site-packages\espnet2\text\phoneme_tokenizer.py", line 260, in __call__
phones = self.g2p(text)
File "D:\espnet-lab\.venv\lib\site-packages\g2p_en\g2p.py", line 162, in __call__
tokens = pos_tag(words) # tuples of (word, tag)
File "D:\espnet-lab\.venv\lib\site-packages\nltk\tag\__init__.py", line 168, in pos_tag
tagger = _get_tagger(lang)
File "D:\espnet-lab\.venv\lib\site-packages\nltk\tag\__init__.py", line 110, in _get_tagger
tagger = PerceptronTagger()
File "D:\espnet-lab\.venv\lib\site-packages\nltk\tag\perceptron.py", line 183, in __init__
self.load_from_json(lang)
File "D:\espnet-lab\.venv\lib\site-packages\nltk\tag\perceptron.py", line 273, in load_from_json
loc = find(f"taggers/averaged_perceptron_tagger_{lang}/")
File "D:\espnet-lab\.venv\lib\site-packages\nltk\data.py", line 579, in find
raise LookupError(resource_not_found)
LookupError:
**********************************************************************
Resource averaged_perceptron_tagger_eng not found.
Please use the NLTK Downloader to obtain the resource:
>>> import nltk
>>> nltk.download('averaged_perceptron_tagger_eng')
For more information see: https://www.nltk.org/data.html
Attempted to load taggers/averaged_perceptron_tagger_eng/
Searched in:
- 'C:\\Users\\zzxia/nltk_data'
- 'D:\\espnet-lab\\.venv\\nltk_data'
- 'D:\\espnet-lab\\.venv\\share\\nltk_data'
- 'D:\\espnet-lab\\.venv\\lib\\nltk_data'
- 'C:\\Users\\zzxia\\AppData\\Roaming\\nltk_data'
- 'C:\\nltk_data'
- 'D:\\nltk_data'
- 'E:\\nltk_data'
**********************************************************************

原因
nltk
は3.8.2から、taggers/averaged_perceptron_tagger_{lang}/
を探すようになっています。
でも、g2p_en
はそれに合わせて更新していません。
これが原因ですね。
nltk
が3.8.2をリリースしたのは2024年です。
しかし、g2p_en
は2019年から更新していません。
nltk
のバージョンを3.8.1にダウングレードしたら解決できるかもしれません。

(.venv) PS D:\espnet-lab> pip install nltk==3.8.1
Collecting nltk==3.8.1
Downloading nltk-3.8.1-py3-none-any.whl.metadata (2.8 kB)
Requirement already satisfied: click in d:\espnet-lab\.venv\lib\site-packages (from nltk==3.8.1) (8.2.1)
Requirement already satisfied: joblib in d:\espnet-lab\.venv\lib\site-packages (from nltk==3.8.1) (1.5.1)
Requirement already satisfied: regex>=2021.8.3 in d:\espnet-lab\.venv\lib\site-packages (from nltk==3.8.1) (2024.11.6)
Requirement already satisfied: tqdm in d:\espnet-lab\.venv\lib\site-packages (from nltk==3.8.1) (4.67.1)
Requirement already satisfied: colorama in d:\espnet-lab\.venv\lib\site-packages (from click->nltk==3.8.1) (0.4.6)
Downloading nltk-3.8.1-py3-none-any.whl (1.5 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.5/1.5 MB 10.0 MB/s eta 0:00:00
Installing collected packages: nltk
Attempting uninstall: nltk
Found existing installation: nltk 3.9.1
Uninstalling nltk-3.9.1:
Successfully uninstalled nltk-3.9.1
Successfully installed nltk-3.8.1

エラーが解消できました!
(.venv) PS D:\espnet-lab> python tts.py
Failed to import Flash Attention, using ESPnet default: No module named 'flash_attn'
D:\espnet-lab\.venv\lib\site-packages\torch\nn\utils\weight_norm.py:143: FutureWarning: `torch.nn.utils.weight_norm` is deprecated in favor of `torch.nn.utils.parametrizations.weight_norm`.
WeightNorm.apply(module, name, dim)
D:\espnet-lab\.venv\lib\site-packages\espnet2\gan_tts\vits\monotonic_align\__init__.py:19: UserWarning: Cython version is not available. Fallback to 'EXPERIMETAL' numba version. If you want to use the cython version, please build it as follows: `cd espnet2/gan_tts/vits/monotonic_align; python setup.py build_ext --inplace`
warnings.warn(
WARNING:root:It seems weight norm is not applied in the pretrained model but the current model uses it. To keep the compatibility, we remove the norm from the current model. This may cause unexpected behavior due to the parameter mismatch in finetuning. To avoid this issue, please change the following parameters in config to false:
- discriminator_params.follow_official_norm
- discriminator_params.scale_discriminator_params.use_weight_norm
- discriminator_params.scale_discriminator_params.use_spectral_norm
See also:
- https://github.com/espnet/espnet/pull/5240
- https://github.com/espnet/espnet/pull/5249

espnet
のsetup.py
に、修正を提案したいですね。今は↓になっています。
nltk>=3.4.5,<3.8.2
に変えたほうがいいです。

NLTK 3.8.1より一個上のバージョンで試しました。やはりエラーはまた出ます。
PyPIに3.8.2は上がっていなく、一個上は3.9b1です。
(.venv) PS D:\espnet-lab> pip install nltk==3.8.2
ERROR: Ignored the following yanked versions: 3.6.4
ERROR: Could not find a version that satisfies the requirement nltk==3.8.2 (from versions: 2.0b4, 2.0b5, 2.0b6, 2.0b7, 2.0b8, 2.0b9, 2.0.1rc1, 2.0.1rc3, 2.0.1rc4, 2.0.1, 2.0.2, 2.0.3, 2.0.4, 2.0.5, 3.0.0b1, 3.0.0b2, 3.0.0, 3.0.1, 3.0.2, 3.0.3, 3.0.4, 3.0.5, 3.1, 3.2, 3.2.1, 3.2.2, 3.2.3, 3.2.4, 3.2.5, 3.3.0, 3.4, 3.4.1, 3.4.2, 3.4.3, 3.4.4, 3.4.5, 3.5b1, 3.5, 3.6, 3.6.1, 3.6.2, 3.6.3, 3.6.5, 3.6.6, 3.6.7, 3.7, 3.8, 3.8.1, 3.9b1, 3.9, 3.9.1)
ERROR: No matching distribution found for nltk==3.8.2
(.venv) PS D:\espnet-lab> pip install nltk==3.9b1
Collecting nltk==3.9b1
Downloading nltk-3.9b1-py3-none-any.whl.metadata (2.9 kB)
Requirement already satisfied: click in d:\espnet-lab\.venv\lib\site-packages (from nltk==3.9b1) (8.2.1)
Requirement already satisfied: joblib in d:\espnet-lab\.venv\lib\site-packages (from nltk==3.9b1) (1.5.1)
Requirement already satisfied: regex>=2021.8.3 in d:\espnet-lab\.venv\lib\site-packages (from nltk==3.9b1) (2024.11.6)
Requirement already satisfied: tqdm in d:\espnet-lab\.venv\lib\site-packages (from nltk==3.9b1) (4.67.1)
Requirement already satisfied: colorama in d:\espnet-lab\.venv\lib\site-packages (from click->nltk==3.9b1) (0.4.6)
Downloading nltk-3.9b1-py3-none-any.whl (1.5 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.5/1.5 MB 8.0 MB/s eta 0:00:00
Installing collected packages: nltk
Attempting uninstall: nltk
Found existing installation: nltk 3.8.1
Uninstalling nltk-3.8.1:
Successfully uninstalled nltk-3.8.1
Successfully installed nltk-3.9b1
(.venv) PS D:\espnet-lab> python tts.py
Failed to import Flash Attention, using ESPnet default: No module named 'flash_attn'
D:\espnet-lab\.venv\lib\site-packages\torch\nn\utils\weight_norm.py:143: FutureWarning: `torch.nn.utils.weight_norm` is deprecated in favor of `torch.nn.utils.parametrizations.weight_norm`.
WeightNorm.apply(module, name, dim)
D:\espnet-lab\.venv\lib\site-packages\espnet2\gan_tts\vits\monotonic_align\__init__.py:19: UserWarning: Cython version is not available. Fallback to 'EXPERIMETAL' numba version. If you want to use the cython version, please build it as follows: `cd espnet2/gan_tts/vits/monotonic_align; python setup.py build_ext --inplace`
warnings.warn(
WARNING:root:It seems weight norm is not applied in the pretrained model but the current model uses it. To keep the compatibility, we remove the norm from the current model. This may cause unexpected behavior due to the parameter mismatch in finetuning. To avoid this issue, please change the following parameters in config to false:
- discriminator_params.follow_official_norm
- discriminator_params.scale_discriminator_params.use_weight_norm
- discriminator_params.scale_discriminator_params.use_spectral_norm
See also:
- https://github.com/espnet/espnet/pull/5240
- https://github.com/espnet/espnet/pull/5249
Traceback (most recent call last):
File "D:\espnet-lab\tts.py", line 5, in <module>
speech = text2speech(text)["wav"]
File "D:\espnet-lab\.venv\lib\site-packages\torch\utils\_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "D:\espnet-lab\.venv\lib\site-packages\espnet2\bin\tts_inference.py", line 173, in __call__
text = self.preprocess_fn("<dummy>", dict(text=text))["text"]
File "D:\espnet-lab\.venv\lib\site-packages\espnet2\train\preprocessor.py", line 548, in __call__
data = self._text_process(data)
File "D:\espnet-lab\.venv\lib\site-packages\espnet2\train\preprocessor.py", line 483, in _text_process
tokens = self.tokenizer.text2tokens(text)
File "D:\espnet-lab\.venv\lib\site-packages\espnet2\text\phoneme_tokenizer.py", line 623, in text2tokens
tokens = self.g2p(line)
File "D:\espnet-lab\.venv\lib\site-packages\espnet2\text\phoneme_tokenizer.py", line 260, in __call__
phones = self.g2p(text)
File "D:\espnet-lab\.venv\lib\site-packages\g2p_en\g2p.py", line 162, in __call__
tokens = pos_tag(words) # tuples of (word, tag)
File "D:\espnet-lab\.venv\lib\site-packages\nltk\tag\__init__.py", line 165, in pos_tag
tagger = _get_tagger(lang)
File "D:\espnet-lab\.venv\lib\site-packages\nltk\tag\__init__.py", line 107, in _get_tagger
tagger = PerceptronTagger()
File "D:\espnet-lab\.venv\lib\site-packages\nltk\tag\perceptron.py", line 183, in __init__
self.load_from_json(lang)
File "D:\espnet-lab\.venv\lib\site-packages\nltk\tag\perceptron.py", line 273, in load_from_json
loc = find(f"taggers/averaged_perceptron_tagger_{lang}/")
File "D:\espnet-lab\.venv\lib\site-packages\nltk\data.py", line 582, in find
raise LookupError(resource_not_found)
LookupError:
**********************************************************************
Resource averaged_perceptron_tagger_eng not found.
Please use the NLTK Downloader to obtain the resource:
>>> import nltk
>>> nltk.download('averaged_perceptron_tagger_eng')
For more information see: https://www.nltk.org/data.html
Attempted to load taggers/averaged_perceptron_tagger_eng/
Searched in:
- 'C:\\Users\\zzxia/nltk_data'
- 'D:\\espnet-lab\\.venv\\nltk_data'
- 'D:\\espnet-lab\\.venv\\share\\nltk_data'
- 'D:\\espnet-lab\\.venv\\lib\\nltk_data'
- 'C:\\Users\\zzxia\\AppData\\Roaming\\nltk_data'
- 'C:\\nltk_data'
- 'D:\\nltk_data'
- 'E:\\nltk_data'
**********************************************************************

プルリクエストを提出しました。