Closed4日前にクローズ3

Whisper-LiveをWSL2で動かす。

speechtotext

音声認識

Whisper

morioka

ソースはこれ。

これを wsl2/ubuntu22.04 で動かす。RTX3060-12Gで。

いくつかパッケージが不足していた。

diff --git a/requirements/client.txt b/requirements/client.txt
index 359cc04..163a5a5 100644
--- a/requirements/client.txt
+++ b/requirements/client.txt
@@ -1,4 +1,5 @@
 PyAudio
 ffmpeg-python
 scipy
-websocket-client
\ No newline at end of file
+websocket-client
+sounddevice
diff --git a/scripts/setup.sh b/scripts/setup.sh
index 1b5cb53..330c88c 100644
--- a/scripts/setup.sh
+++ b/scripts/setup.sh
@@ -1,3 +1,5 @@
 #! /bin/bash

 apt-get install portaudio19-dev ffmpeg wget -y
+apt-get install pavucontrol -y
+

morioka

いままで、Windows側のNVIDIAドライバだけで何とかなっていたが、WSL2側にもCUDAを入れる必要があるようで。やっと腰を上げる。

CUDA on WSL User Guide

上記の手順に従う。このうちの「Option 1: Installation of Linux x86 CUDA Toolkit using WSL-Ubuntu Package - Recommended」

CUDA Toolkit 12.6 Update 3 Downloads | NVIDIA Developer

具体的にはこれ

# remove the old GPG key
sudo apt-key del 7fa2af80

# CUDA Toolkit Installer
wget https://developer.download.nvidia.com/compute/cuda/repos/wsl-ubuntu/x86_64/cuda-wsl-ubuntu.pin
sudo mv cuda-wsl-ubuntu.pin /etc/apt/preferences.d/cuda-repository-pin-600
wget https://developer.download.nvidia.com/compute/cuda/12.6.3/local_installers/cuda-repo-wsl-ubuntu-12-6-local_12.6.3-1_amd64.deb
sudo dpkg -i cuda-repo-wsl-ubuntu-12-6-local_12.6.3-1_amd64.deb
sudo cp /var/cuda-repo-wsl-ubuntu-12-6-local/cuda-*-keyring.gpg /usr/share/keyrings/
sudo apt-get update
sudo apt-get -y install cuda-toolkit-12-6

そして cuda-toolkit から cudaをインストール? (CUIで利用条件に同意が要求される)。これで cuda やcudnnが入ったようす。

 sudo apt install nvidia-cudnn

morioka

あとは READMEのとおり

python3 run_server.py --port 9090 --backend faster_whisper

from whisper_live.client import TranscriptionClient
client = TranscriptionClient(
  "localhost",
  9090,
  lang="ja",
  translate=False,
  model="small",  
  use_vad=False)

client("tests/jfk.wav")

このスクラップは4日前にクローズされました