Open3
途中: 話者分離と音声認識 (pyannote.audioを使わないで)
pyannote.audio 以外のdiarizationはないか? huggingface発行のトークンを抱えたくない。
(dialization) morioka@legion:~/whisper-diarization$ python diarize.py -a /mnt/e/Users/yasuh/Videos/AGDRec/AGDRec_20241023_12 0221.mp4
Downloading: "https://dl.fbaipublicfiles.com/demucs/hybrid_transformer/955717e8-8726e21a.th" to /home/morioka/.cache/torch/h ub/checkpoints/955717e8-8726e21a.th
100%|██████████████████████████████████████████████████████████████████████████████████| 80.2M/80.2M [00:01<00:00, 68.7MB/s]
Selected model is a bag of 1 models. You will see that many progress bars per track.
Separated tracks will be stored in /home/morioka/whisper-diarization/temp_outputs/htdemucs
Separating track /mnt/e/Users/yasuh/Videos/AGDRec/AGDRec_20241023_120221.mp4
100%|██████████████████████████████████████████████████████████████████████| 3334.5/3334.5 [01:38<00:00, 33.69seconds/s]
config.json: 100%|█████████████████████████████████████████████████████████████████████| 2.64k/2.64k [00:00<00:00, 17.1MB/s]
tokenizer.json: 100%|██████████████████████████████████████████████████████████████████| 2.13M/2.13M [00:00<00:00, 5.35MB/s]
vocabulary.txt: 100%|█████████████████████████████████████████████████████████████████████| 422k/422k [00:00<00:00, 902kB/s]
model.bin: 100%|███████████████████████████████████████████████████████████████████████| 1.53G/1.53G [00:36<00:00, 42.0MB/s]
Unable to load any of {libcudnn_ops.so.9.1.0, libcudnn_ops.so.9.1, libcudnn_ops.so.9, libcudnn_ops.so}00:00<00:36, 41.8MB/s]
Invalid handle. Cannot load symbol cudnnCreateTensorDescriptor█████████████████████████| 1.53G/1.53G [00:36<00:00, 43.3MB/s]
Aborted (core dumped)
GPUでこける。とりあえず、cpuで実行してみるか。...進みが著しく遅い。
NeMoのdiarization modelで話者分離したあとにwhisperを適用するものらしい。
cpuでもあかん。
[NeMo I 2024-11-18 23:10:26 nemo_logging:381] Filtered duration for loading collection is 0.00 hours.
[NeMo I 2024-11-18 23:10:26 nemo_logging:381] Dataset loaded with 10824 items, total duration of 1.47 hours.
[NeMo I 2024-11-18 23:10:26 nemo_logging:381] # 10824 files loaded accounting to # 1 labels
[5/5] extract embeddings: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 170/170 [01:27<00:00, 1.95it/s]
[NeMo I 2024-11-18 23:11:54 nemo_logging:381] Saved embedding files to /home/morioka/whisper-diarization/temp_outputs/speaker_outputs/embeddings
[NeMo W 2024-11-18 23:11:54 nemo_logging:393] cuda=False, using CPU for eigen decomposition. This might slow down the clustering process.
clustering: 0%| | 0/1 [00:00<?, ?it/s]
Intel oneMKL ERROR: Parameter 8 was incorrect on entry to SSYEVD. | 0/2 [00:00<?, ?window/s]
Clustering Sub-Windows: 0%| | 0/2 [00:01<?, ?window/s]
clustering: 0%| | 0/1 [00:01<?, ?it/s]
Traceback (most recent call last):
File "/home/morioka/whisper-diarization/diarize.py", line 203, in <module>
msdd_model.diarize()
File "/home/morioka/.pyenv/versions/dialization/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/morioka/.pyenv/versions/dialization/lib/python3.11/site-packages/nemo/collections/asr/models/msdd_models.py", line 1183, in diarize
self.clustering_embedding.prepare_cluster_embs_infer()
...
File "/home/morioka/.pyenv/versions/dialization/lib/python3.11/site-packages/nemo/collections/asr/parts/utils/offline_clustering.py", line 855, in getSpectralEmbeddings
_, diffusion_map_ = eigDecompose(laplacian, cuda=cuda, device=affinity_mat.device)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/morioka/.pyenv/versions/dialization/lib/python3.11/site-packages/nemo/collections/asr/parts/utils/offline_clustering.py", line 553, in eigDecompose
lambdas, diffusion_map = eigh(laplacian)
^^^^^^^^^^^^^^^
RuntimeError: false INTERNAL ASSERT FAILED at "../aten/src/ATen/native/BatchLinearAlgebra.cpp":1538, please report a bug to PyTorch. linalg.eigh: Argument 8 has illegal value. Most certainly there is a bug in the implementation calling the backend library.
...