🎙️

OpenAIのWhisperをWindows環境で試す(CUDA環境有り)

2022/09/23に公開

試した環境

Python 3.10.5
Windows10 Pro 21H2 (19044.2006)
CPU:Core i9-11900KF
Mem:32GB
GPU:RTX3060 12GB

CPUで動かす

winget install Git.git
winget install ffmpeg
#パス読み込みのためログアウトor再起動
python -m pip install git+https://github.com/openai/whisper.git
python -m  whisper --help

これで動作すればOK

認識テストに使う英語の音声ファイルをこちらからDLした。(セキュリティの警告出ます...)
https://shtooka.net/view_package.php?packid=eng-wims-mary-conversation

python -m whisper eng-d9743fe2.mp3 --model medium
100%|█████████████████████████████████████| 1.42G/1.42G [00:30<00:00, 49.8MiB/s]
C:\Users\user\AppData\Roaming\Python\Python310\site-packages\whisper\transcribe.py:70: UserWarning: FP16 is not supported on CPU; using FP32 instead
  warnings.warn("FP16 is not supported on CPU; using FP32 instead")
Detecting language using up to the first 30 seconds. Use `--language` to specify the language
Detected language: english
[00:00.000 --> 00:04.640]  I wonder if you could tell us about a challenging technical problem that you've come across
[00:04.640 --> 00:34.000]  and how you addressed it.

CUDAで動かす

未導入の場合はこちらを参照してCUDA環境を整えておく
https://zenn.dev/ryu2021/articles/3d5737408b06fe
CUDA環境は11.7、CUDNNは8.5.0.96_cuda11を使った。

#CUDA対応torchを再導入
python -m pip install --upgrade --force-reinstall torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu116
#CUDAで認識させる
python -m whisper eng-d9743fe2.mp3 --model medium --device cuda --language en
[00:00.000 --> 00:04.640]  I wonder if you could tell us about a challenging technical problem that you've come across
[00:04.640 --> 00:34.000]  and how you addressed it.

Discussion