matsu

OpenAI TTS 比較
https://gotohayato.com/content/575/

matsu

VAD など

matsu

stable-ts、VAD を使って書き起こし

matsu

30秒以上の音声を推論させるには pipeline を使う

matsu

ImportError: cannot import name 'BatchedInferencePipeline' from 'faster_whisper'

matsu

whisper の高速化

matsu

ffmpeg で音声ファイルをたくさん結合する

ffmpeg -safe 0 -f concat -i list.txt -c copy output.mp4

matsu

AmiVoice ステレオ（2チャネル）の場合は1チャネルのみが音声認識対象となる。

チャネルごとに異なる音源が録音されている特殊な音源の場合、片方しか音声認識されないことになるので注意。
参考

https://docs.amivoice.com/amivoice-api/manual/audio-format#channels

https://acp.amivoice.com/blog/2023-04-17-111456/?_ga=2.73570900.751174800.1727262864-331714657.1725514537

matsu

ffmpeg でステレオ音源をモノラルに変換

matsu

sox で2チャネルステレオ音声ファイルをモノラル音声ファイル✕2に変換

matsu

CTranslate2 でモデルを量子化する Python スクリプト

import ctranslate2
from ctranslate2.converters import TransformersConverter

model_name_or_path = "primeline/whisper-large-v3-german "
output_dir = "whisper-large-v3-german --copy_files"

converter = TransformersConverter(model_name_or_path)
converter.convert(output_dir, quantization="float16", force=True)

matsu

Ctranslate2 で量子化した Whisper-large-v3 の推論 model.transcribe() 時に下記エラー
ValueError: Invalid input features shape: expected an input with shape (1, 128, 3000), but got an input with shape (1, 80, 3000) instead
対処方法

faster-whisper リポジトリの large-v3 モデルにある json ファイルを配置してからモデルをロードしたら通った。

https://huggingface.co/Systran/faster-whisper-large-v3/tree/main
参考
I encoutered the same problem. you should update all the json files in the offical one:

https://huggingface.co/Systran/faster-whisper-large-v3/tree/main

it is different with large v2
https://github.com/SYSTRAN/faster-whisper/issues/582

matsu

ssh で github, huggingface に繋がらない場合、ssh-agent がそもそも起動していない可能性

下記コマンドで ssh を使った接続を試行（オプション v でログ出力）

ssh -vT git@github.com
ssh -vT git@hf.co

identity file に作成した秘密鍵が表示されない（読み込まれていない？）

debug1: identity file /Users/[user_name]/.ssh/id_rsa type -1

対応策として ssh-agent に SSH キーの追加を施行したが、ssh-agent が起動していない旨のエラー

ssh-add ~/.ssh/id_ed25519

PowerShell で ssh-agent を起動する方法

Set-Service -Name ssh-agent -StartupType Manual
Start-Service ssh-agent

matsu

Style-Bert-VITS2 のユーザー辞書について

matsu

`ssh -vT git@hf.co` で `Permission denied (publickey)` が出るときの原因

id_rsa 以外の名前で鍵を作ったときは、-i オプションで明示的に秘密鍵を示してあげましょう。

ssh -vT git@hf.co -i C:\Users\taish\.ssh\huggingface-matsuap

matsu

huggingface に ssh で git するときは config の user は git じゃないといけないみたい

Host hf.co
HostName hf.co
IdentityFile ~/.ssh/id_hogehoge <=作った秘密鍵
User git

matsu

 TTS の「いった」「おこなった」問題「東京都に行った」は、「東京/都/に/行った」と分割でき、この場合は「とうきょうとにいった」と読みますが、「東/京都/に/行った」の場合は「ひがしきょうとにいった」と読みます。これらの両方の場合において、「行った」は「いった」と読みますが、「東京都に行った事業の報告をする」という文脈では、2 番目の意味（「何かを実行したこと」）となり、「いった」ではなく「おこなった」と読みます。
https://aws.amazon.com/jp/blogs/news/amazon-polly-japanese-text-optimization/

matsu

Davinci resolve で一つのタイムラインに他のタイムラインを挿入したときに音がならない場合、挿入元タイムラインのオーディオがステレオとモノラルが混在している。

https://soranoue.net/know-how/audio-2/

matsu

Fusionノードを別プロジェクトで使い回したいときはメディアインではなくローダーを使う

matsu

ローダーのフルパス指定はパスマップで省略できる

matsu

huggingface にデプロイした fastapi のエンドポイント

matsu

[VSCode] venvで作成した仮想環境がkernel一覧に表示されないときの対処法

matsu

 Bert-VITS2 に脆弱性が報告されたが、その後修正されているfish.audio の bert-vits2 には、パストラバーサルの脆弱性が存在します。

https://jvndb.jvn.jp/ja/contents/2024/JVNDB-2024-007791.html
bert-vits2の場合、webui_preprocess.pyファイルの特定の箇所でこの脆弱性が確認されている。攻撃者はこの脆弱性を悪用して、サーバー上で任意のコマンドを実行する可能性がある。これにより、機密情報の漏洩やシステムの制御権限の奪取など、深刻な被害が生じる恐れがある。

https://xexeq.jp/blogs/media/topics8651

 修正 PR マージ済みhttps://github.com/fishaudio/Bert-VITS2/pull/393/commits/beee4b7816471d142eec3bba96167a3ff88f6581

matsu

アジャイル型開発における未完成の責任　東京地判令3.11.25（平30ワ25117）

matsu

Style-Bert-VITS2 の事前学習

matsu

 Style-Bert-VITS2 の学習時に使う事前学習済みモデル最後にBert-VITS2の公式事前学習モデルがあるので、使うバージョンに応じてダウンロードする。

ver 2.3の場合

ここからG_0.pth D_0.pth DUR_0.pth WD_0.pthの4つをダウンロードする。

https://ayousanz.hatenadiary.jp/?page=1708753287

最後に、Data/model_name/modelsフォルダを作り、そこに先程ダウンロードしたG_0.pth D_0.pth DUR_0.pthの3つ（ver 2.3の場合はWD_0.pthも含めた4つ）をコピーする。

モデル学習のときにこれらのファイルは上書きされてしまうようなので、別のモデルを学習するときのために、これらのファイルは削除せずにどこかに取っておくと良い。
初期化用スクリプトでは別のリポジトリから .safetensors をダウンロードしているみたい。
def download_pretrained_models():
    files = ["G_0.safetensors", "D_0.safetensors", "DUR_0.safetensors"]
    local_path = Path("pretrained")
    for file in files:
        if not Path(local_path).joinpath(file).exists():
            logger.info(f"Downloading pretrained {file}")
            hf_hub_download(
                "litagin/Style-Bert-VITS2-1.0-base", file, local_dir=local_path
            )


def download_jp_extra_pretrained_models():
    files = ["G_0.safetensors", "D_0.safetensors", "WD_0.safetensors"]
    local_path = Path("pretrained_jp_extra")
    for file in files:
        if not Path(local_path).joinpath(file).exists():
            logger.info(f"Downloading JP-Extra pretrained {file}")
            hf_hub_download(
                "litagin/Style-Bert-VITS2-2.0-base-JP-Extra", file, local_dir=local_path
            )
https://github.com/litagin02/Style-Bert-VITS2/blob/bcc57e0cb9055acc9718c1a74288e943c2916c49/initialize.py#L31

matsu

Early Stopping

matsu

 warmup の効果と数warmup に費やす step 数ですが、全体の10％程度を指定することが多いです。
https://qiita.com/kma-jp/items/81db6d5c549e50707e30#warmup

matsu

 Whisper のファインチューニングでは事前学習よりも小さい学習率を設定した方がよいみたいAccording to Jong Wook Kim, one of the authors of the Whisper paper, a practical learning rate to consider while fine-tuning is a value that is 40x smaller than what has been used for pre-training, and linearly decay it to zero over the course of training. (Discord thread where this has been mentioned)

https://qiita.com/kma-jp/items/81db6d5c549e50707e30#warmup

matsu

 lr_scheduler_type = cosine_with_restarts は warmup あり“cosine_with_restarts” = get_cosine_with_hard_restarts_schedule_with_warmup
https://huggingface.co/docs/transformers/v4.45.2/ja/main_classes/optimizer_schedules#transformers.SchedulerType

matsu

各 lr_scheduler

matsu

 勾配蓄積勾配蓄積メソッドは、GPUのメモリ容量の制約によって課せられる制限を超えた効果的なバッチサイズを実現するために、勾配を小さな増分で計算することを目的としています。このアプローチでは、モデルを順方向および逆方向に小さなバッチで反復的に計算し、その過程で勾配を蓄積します。十分な数の勾配が蓄積されたら、モデルの最適化ステップを実行します。勾配蓄積を使用することで、GPUのメモリ容量による制約を超えて効果的なバッチサイズを増やすことができますが、勾配蓄積によって導入される追加の順方向および逆方向の計算はトレーニングプロセスを遅くする可能性があることに注意が必要です。
https://huggingface.co/docs/transformers/ja/perf_train_gpu_one

matsu

チェックポイントから `WhisperProcessor` をロードするには、抽出機とトークナイザーを個別にロードしてからインスタンス化する

from transformers import WhisperFeatureExtractor
from transformers import WhisperTokenizer
from transformers import WhisperProcessor

feature_extractor = WhisperFeatureExtractor.from_pretrained(checkpoint_path)
tokenizer = WhisperTokenizer.from_pretrained(model_path, language="Japanese", task="transcribe")
processor = WhisperProcessor(feature_extractor, tokenizer)

matsu

`cosine_with_restarts` のピークは `learning_rate`

matsu

 cosine_with_restarts の num_cycles は設定できない仕様Fair point. But we also don't want to provide too many arguments as our training examples are supposed to be beginner-friendly. So, we keep the examples fairly easily customizable.
https://github.com/huggingface/diffusers/issues/3195

matsu

`lr_scheduler_kwargs` で与えられるみたい

matsu

Colab 上で Style-Bert-VITS2

matsu

音素バランス文

https://voice-statistics.github.io/

matsu

 音声コーパスは JVS今後，音声を新規収録される場合には声優統計コーパス付属のバランス文ではなく，句読点情報が付与された JSUT コーパスまたは JVS コーパスの voiceactress100 (以降， voiceactress100 と呼びます) の使用を推奨します．

声優統計コーパスのバランス文と voiceactress100 には一部の単語が異なっており，完全な互換性がありません

voiceactress100 は声優統計コーパスバランス文を音声処理の研究者が洗練したものです

声優統計コーパスの一部の読みはバランス文と一致していません．また，アクセントや句読点位置も統一されていません

上記の理由により，今後，音声を新規収録される場合には声優統計コーパス付属のバランス文ではなく voiceactress100 の使用を推奨します

その際のイントネーションはプロ話者によって発話された JVS コーパスに従うことを推奨します

感情表現を活用したい場合，またはプロ話者かつ高サンプリングレートのデータを使用したい場合のみ声優統計コーパスの音声データを用いるのが良いと私たち (日本声優統計学会) は考えます
https://github.com/voice-statistics/voice-statistics.github.com/blob/master/assets/doc/vs_to_jvs.md

 声優統計コーパス→JSUTコーパス→JVSコーパス→つくよみちゃんコーパスの成り立ちについてhttps://tyc.rei-yumesaki.net/material/corpus/voiceactress100/

 JVS コーパスhttps://sites.google.com/site/shinnosuketakamichi/research-topics/jvs_corpus

 収録時間は音源の約 3 倍気合で読み上げを行った結果、

読み上げ自体にかかった時間: 52分

読み上げた文字数: 3,971文字 (句読点含む)

作成された音声ファイルの合計時間： 967秒 (16.1分)

でした。

上記の結果から、作業全体の時間は録音の3倍くらいの時間が掛かっていることがわかります。これだけ時間がかかった理由として、以下の作業が挙げられます。

自身の発声した録音を確認のために再生する

単純計算で、これだけで発声自体の2倍の時間が掛かります

一人で作業をしているため、ある程度のクオリティを担保するためには、自身が発した音声を聞き直す必要がありました

原稿を読んで発声内容を確認する

少し長いフレーズや読みに難しい単語がある場合、まず頭の中で予行練習をする必要があります

今回のコーパスには外国の地名や人名が含まれていたため、実際に声に出して確認する必要がありました

何らかの理由により再録音する

言いよどみ、読み間違い、発音ミス、イントネーションのミス、無視できないほどのリップノイズなどが起きた場合は、録音し直しました

雑音が入ってしまった場合にも録音し直しました (救急車の音、飼っている犬の足音など)

体感としては、読み間違いや失敗による再録音にかなりの時間を使ってしまったという印象でした。一方で、音声ファイルの管理や原稿の準備にはほとんど時間を使わなかったので、この点は効率化が出来たと思っています。
https://zenn.dev/yag_ays/articles/8d2fc207975953

matsu

DWANGO MEDIA VILLAGE - SEIREN VOICE

https://dmv.nico/ja/
https://seiren-voice.dmv.nico/

matsu

Gradio の DataFrame がスクロールでどんどん大きくなる問題

matsu

Faster-whisper のストリーミング化

この ipynb は Colab 上ではうまく走らなかったので、Windows ローカルで実行した。

検証環境

Windows11
Powershell / WSL2 Ubuntu
ngrok インストール
https://download.ngrok.com/windows?tab=download

ポイント

nc (netcat) コマンドが Windows 11/Powershell にないので、WSL2 の Ubuntu 上でストリーミング送信を実行した。

sox -d -e signed -b 16 -c 1 -r 16000 -t raw - | nc 0.tcp.jp.ngrok.io 18714

whisper_online_server.py を立ち上げるときに、モデルサイズを指定できる
例：

python .\whisper_online_server.py --language ja --min-chunk-size 1 --model tiny

ngrok はクレジットカード登録しないと tcp トンネリング機能が使えない

matsu

 コーパス研究開発の場面でよく使われる音素バランス文としては「ATR503文」「声優統計コーパス」「JSUTコーパス」などが代表的だ。
一つ目が使用料の問題だ。ATR503文の読み上げデータ集は国際電気通信基礎技術研究所（ATR）が発売している商品で、価格は読み上げ音声付きのデータで数十万円に上る。
そこで登場したのが無料の声優統計コーパスやJSUTコーパスだ。声優統計コーパスは声優やアナウンサーの読み上げ音声約2時間、JSUTは女性の読み上げ音声を約10時間収録したデータも無償公開。研究開発のハードルになっていた使用料をなくした。
これらの課題を解決するため、SSSは音素バランス文の新規開発を企画。2020年7月にクラウドファンディングで目標額の261％もの資金を集め、明治大学と九州工業大学の協力の下、音素バランス文の開発に乗り出した。
https://www.itmedia.co.jp/news/articles/2106/18/news099.html

 ATR503https://research.nii.ac.jp/src/ATR503.html

 ITA コーパスhttps://github.com/mmorise/ita-corpus

 Deepgramhttps://deepgram.com/

https://zenn.dev/smartcamp/articles/8b24739ab8cba3

 ROHANhttps://github.com/mmorise/rohan4600

matsu

 話者分離 WhoSpeaks
 Coqui model v2.0.2https://huggingface.co/coqui/XTTS-v2/tree/v2.0.2

matsu

 whisper streaming
 資料https://ufallab.ms.mff.cuni.cz/~machacek/pre-prints/AACL23-2.11.2023-Turning-Whisper-oral.pdf

 エラー対処Unable to load any of {libcudnn_ops.so.9.1.0, libcudnn_ops.so.9.1, libcudnn_ops.so.9, libcudnn_ops.so}
FIXED IT!

under the whisperx_installer environment, running

pip install ctranslate2==4.4.0

fixed it, as per this post m-bain/whisperX#901
https://github.com/lukaszliniewicz/Pandrator/issues/57

matsu

Cursor で Google colab GPU リソースを使う

matsu

 VSCode から Runpod に SSH 接続ssh-keygen して公開鍵を Runpod に登録してから、デプロイしないとつながらないみたい。

https://blog.runpod.io/how-to-connect-vscode-to-runpod/

https://www.youtube.com/watch?v=vEVDoW-uMHI&ab_channel=にゃんたのAI実践チャンネル

 jupyter カーネルが選べない
 解決方法エクステンションで Python と Jupyter をリモートサーバーにインストールする

 jupyter notebook --no-browser --port=8080 & でエラー
 エラーModuleNotFoundError: No module named 'jupyter_server.contents'

 対処pip uninstall traitlets
pip install traitlets==5.9.0
https://github.com/microsoft/azuredatastudio/issues/24436

matsu

 Colab カスタム GCE VM に接続しますDeveloper Manager をオン
https://console.cloud.google.com/dm/deployments

DEPLOY MARKETPLACE SOLUTION から Colab を検索しオン
Colab の VM を作成
Colab の カスタム GCE VM に接続します に情報を入力し接続

https://research.google.com/colaboratory/marketplace.html

https://webbigdata.jp/post-9712/

matsu

dataset の concatenate

from datasets import load_from_disk

target_dataset_names = [
    '0to499',
    '500to999',
    '1000to1499',
    '1500to1999',
]

loaded_datasets = []
for target_dataset_name in target_dataset_names:
    preprocessed_dataset_path = dataset_dir / 'preprocessed' / base_model.split('/')[-1] / target_dataset_name
    loaded_datasets.append(load_from_disk(preprocessed_dataset_path, keep_in_memory=True))

from datasets import DatasetDict
from datasets import concatenate_datasets

finetune_dataset = DatasetDict({
    'train': concatenate_datasets([i['train'] for i in loaded_datasets]),
    'validation': concatenate_datasets([i['validation'] for i in loaded_datasets]),
    'test': concatenate_datasets([i['test'] for i in loaded_datasets]),
})

matsu

https://amemy-blog.com/?p=2718

matsu

 Huggingface で uvicorn アプリがスタート状態でスタックする
 原因port 番号は 7860 でないとだめ。

 参考Well, I think I figured it out for anyone experiencing the same error in the future. As far as I can tell the app has to be hosted on port 7860 and it has to have another response than 404 for ‘/’
https://discuss.huggingface.co/t/space-is-stuck-at-starting-status-hugging-face/92854/14

matsu

ffmpeg で g711 を wav に変換

ffmpeg.exe -f mulaw -ar 44100 -i output.g711u output_file3.wav

matsu

tokenizer.json の中に vocab.json と merges.txt の内容が書かれている

matsu

 ffmpegを使うここに記載の解法
ffmpeg -i ./hoge.webm -c:a pcm_f32le ./hoge.wav
-iに続いて入力ファイル名

-c:aに続いてコーデックの選択

https://trac.ffmpeg.org/wiki/audio typesに一覧あり

matsu

FFmpegで音声/動画ファイルを同間隔に分割する

ffmpeg -i <入力ファイルパス> -f segment -segment_time <間隔（秒）> -c copy <出力ファイル名>_%03d.wav

matsu

ffmpeg 時間分割

https://nico-lab.net/cutting_ffmpeg/

matsu

VSCode のターミナルで自動的に .venv を activate する方法

ワークスペース設定で terminal.integrated.profiles.windows を検索し下記の通り設定する。

{
    "terminal.integrated.profiles.windows": {

        "PowerShell": {
            "source": "PowerShell",
            "icon": "terminal-powershell",
            "args": ["-NoExit", "-Command", ".venv/Scripts/activate"]
        },
        "Command Prompt": {
            "path": [
                "${env:windir}\\Sysnative\\cmd.exe",
                "${env:windir}\\System32\\cmd.exe"
            ],
            "args": [],
            "icon": "terminal-cmd"
        },
        "Git Bash": {
            "source": "Git Bash"
        }
    }
}

matsu

Huggingface の Spaces に PyAudio がインストールできない

portaudio が必要なので下記の packages.txt をアップロード

portaudio19-dev

matsu

cu118 の torch を pip install したのに cudnn エラーが出る

エラー

Error: Could not locate cudnn_ops_infer64_8.dll. Please make sure it is in your library path!

対処方法

 pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124 --force-reinstall --no-cache

参考

matsu

 ImportError: cannot import name 'ModelFilter' from 'huggingface_hub' while importing from nemo.collections.asr.models import EncDecMultiTaskModel対応方法

huggingface-hub==0.23.2 を再インストールする。
I guess you could use huggingface-hub==0.23.2, so that it works
https://github.com/NVIDIA/NeMo/issues/9793

matsu

 Unable to load any of {libcudnn_ops.so.9.1.0, libcudnn_ops.so.9.1, libcudnn_ops.so.9, libcudnn_ops.so}WhisperX は cuDNN==9.1 なので ctranslate2==4.4.0 をインストールするか、BetterWhisperX（ cuDNN==9.2 , ctranlate2==4.5.0 ベース）をインストールする。
https://github.com/m-bain/whisperX/issues/901#issuecomment-2431935668

matsu

 got an unexpected keyword argument 'threshold'"pip3 install --no-deps --force-reinstall "faster-whisper @ https://github.com/SYSTRAN/faster-whisper/archive/refs/heads/master.tar.gz"
https://github.com/jianchang512/pyvideotrans/issues/692

https://github.com/SYSTRAN/faster-whisper/blob/master/faster_whisper/vad.py

matsu

Huggingface リポジトリモデルリストアップ

Loading pipeline from 🤗 hub
A bunch of pretrained pipelines are available on 🤗 Huggingface model hub and can be listed by looking for the pyannote-audio-pipeline tag.

from huggingface_hub import HfApi
available_pipelines = [p.modelId for p in HfApi().list_models(filter="pyannote-audio-pipeline")]
list(filter(lambda p: p.startswith("pyannote/"), available_pipelines))
['pyannote/overlapped-speech-detection',
 'pyannote/speaker-diarization',
 'pyannote/speaker-segmentation',
 'pyannote/voice-activity-detection',
 'pyannote/speaker-diarization-3.0',
 'pyannote/speaker-diarization-3.1']

matsu

最適なチェックポイントのみを保存したいとき

save_total_limit = 2

load_best_model_at_end = True

save_strategy = “epoch” or “steps”

evaluation_strategy = “epoch” or “steps”

matsu

【python】pip installの際に「error: Microsoft Visual C++ 14.0 is required」が発生した場合の対応方法

matsu

virtualenvのactivateで「PSSecurityException」が発生する原因

matsu

libcuda.so が見つからない

export LD_LIBRARY_PATH=/usr/lib/wsl/lib:$LD_LIBRARY_PATH

matsu

 WSL2 上で lspci しても GPU (Nvidia) が見えない見えない仕様らしい。
そう、WSL 2 の Linux には、PCI デバイスとしての GPU は直接見えていません。NVIDIA ドライバを Windows にだけインストールしたことからもわかるように、GPU は Windows のデバイスとして管理されており、WSL 2 の Linux からは、それを /dev/dxg というデバイスを通じて利用します。この /dev/dxg は所謂準仮想化デバイスです。ホストの Windows と、WSL の軽量ユーティリティ VM は、VMBus という仮想バスで接続されており、Linux 側での /dev/dxg への操作は、 dxgkrnl という準仮想化ドライバによって VMBus 経由でホストの GPU へ伝わります。
https://v2v.at-peak.jp/recording-meeting/958eb9cb-eba2-422a-8416-140d07778530

matsu

 [cuda_driver.cpp:load_lib@36] libcuda.so lib not found.I think the correct line to add to the .bashrc file is:

export LD_LIBRARY_PATH=/usr/lib/wsl/lib:$LD_LIBRARY_PATH
https://github.com/microsoft/WSL/issues/8587#issuecomment-1229170859

matsu

 OpenGL.error.Error: Attempt to retrieve context when no valid context
export PYOPENGL_PLATFORM=glx
https://bwgift.hatenadiary.jp/entry/2025/01/03/162711

matsu

ITAコーパスの録音方法まとめ

matsu

Huggingface リポジトリを clone するときは

 git clone git@hf.co:datasets/AtPeak/insurance

matsu

Mac+Cursor+Vimでキーの長押し問題

matsu

MacでKarabiner-Elementsを使う方法と効かない場合

matsu

Mac + US配列キーボードで ` (バッククォート) を打つと § (セクション) が入力される

matsu

 Colab の CUDA とcuDNN のバージョン確認CUDA のバージョンは以下のコマンドで確認できます
!nvcc --version

実行結果(2024年10月4日現在)
nvcc: NVIDIA (R) Cuda compiler driver

Copyright (c) 2005-2023 NVIDIA Corporation

Built on Tue_Aug_15_22:02:13_PDT_2023

Cuda compilation tools, release 12.2, V12.2.140

Build cuda_12.2.r12.2/compiler.33191640_0

release 12.2 とあるので、 CUDA のバージョンは 12.2 です
cuDNN のバージョン確認は以下のコマンドです
!cat /usr/include/cudnn_version.h | grep CUDNN_MAJOR -A 2

実行結果(2024年10月4日現在)

 #define CUDNN_MAJOR 8

#define CUDNN_MINOR 9

#define CUDNN_PATCHLEVEL 6#define CUDNN_VERSION (CUDNN_MAJOR * 1000 + CUDNN_MINOR * 100 + CUDNN_PATCHLEVEL)
/* cannot use constexpr here since this is a C-only file */

上から3行がメジャーバージョン、マイナーバージョン、パッチバージョンを示します
つまり、上記の記載はバージョン 8.9.6 を示しています
https://qiita.com/RyoWakabayashi/items/005f171568c53b15c2a1

matsu

シナ「プ」スで困った話～Macの濁点、半濁点問題

matsu

Pythonで、Jsonの値に改行が存在する場合に読み込む時の注意

matsu

inference endpoints のホストサーバーの階層

- halt
  - init
  - poweroff
  - reboot
  - runlevel
  - setcap
  - shutdown
  - telinit
  - tzconfig
  - update-ca-certificates
  - update-mime
- srv
- sys
  - kernel
  - power
  - class
  - devices
  - dev
  - hypervisor
  - fs
  - bus
  - firmware
  - block
  - module
- tmp
- usr
  - bin
  - games
  - include
  - lib
  - lib32
  - lib64
  - libexec
  - libx32
  - local
  - sbin
  - share
  - src
- var
  - backups
  - cache
  - lib
  - local
  - lock
  - log
  - mail
  - opt
  - run
  - spool
  - tmp
- repository
  - cache
  - .huggingface
  - .gitattributes
  - config.json
  - handler.py
  - requirements.txt
  - style_vectors.npy
  - model.safetensors
  - hub_model_info.json
  - __pycache__
- app
  - __pycache__
  - huggingface_inference_toolkit
  - entrypoint.sh
  - webservice_starlette.py
  - build
  - src
  - .dockerignore
  - LICENSE
  - MANIFEST.in
  - README.md
  - dockerfiles
  - pyproject.toml
  - scripts
  - setup.cfg
  - setup.py

matsu

 Dockerfile で apt install するときはユーザー切り替えする前にapt はルートユーザー権限を要求するため。
You’re switching to the new user too early, that should probably be one of the last steps you do. Once you specify USER, all subsequent commands will be run as that user which is a problem if the commands in question require root… like apt does.
https://www.reddit.com/r/docker/comments/xf8f8i/e_list_directory_varlibaptlistspartial_is_missing/

matsu

 Get-AzRoleDefinitionimport-module Az
Connect-AzAccount
Get-AzRoleDefinition
https://www.guri2o1667.work/entry/2021/12/06/【Azure】AzurePowershellでのAzureRBAC確認

matsu

Azure OpenAI Whisper/Azure Speech to Textの違い

matsu

 SPXERR_GSTREAMER_NOT_FOUND_ERROR原因

Gestreamer がインストールされていない。

https://qiita.com/Kobe-dad/items/f83525f0f9b5b06fa1b3
インストール方法

https://learn.microsoft.com/en-us/azure/ai-services/speech-service/how-to-use-codec-compressed-audio-input-streams?tabs=windows%2Cdebian%2Cjava-android%2Cterminal&pivots=programming-language-python

https://gstreamer.freedesktop.org/download/#windows

matsu

 Sourcetree で git fetch/push/pull が使えない対処方法
ppk を登録する https://qiita.com/reflet/items/4f7b5c4a312bc27df10e
エラー対処 https://qiita.com/Hina_Developer/items/1ec8b5731e9b2e8c04c2

git -c diff.mnemonicprefix=false -c core.quotepath=false --no-optional-locks fetch --no-tags origin

The host key is not cached for this server:

hf.co (port 22)

You have no guarantee that the server is the computer you

think it is.

The server's ssh-ed25519 key fingerprint is:

ssh-ed25519 255 SHA256:dVjzGIdV7d6cwKIeZiCoRMa2gMvSKfGZAvHf4gMiMao

If you trust this host, enter "y" to add the key to Plink's
エラー終了しました。エラーの内容は上記をご覧ください。

cache and carry on connecting.

If you want to carry on connecting just once, without adding

the key to the cache, enter "n".

If you do not trust this host, press Return to abandon the

connection.

Store key in cache? (y/n, Return cancels connection, i for more info) fatal: Could not read from remote repository.
Please make sure you have the correct access rights

and the repository exists.
10 秒 の間、接続エラーとSSHキーの問題について推論

このエラーは、Git が SSH 経由でリモートリポジトリ（この場合は hf.co）に接続しようとしたとき、サーバーのホストキーがまだキャッシュされていないため、正当性を確認できず接続を中断したことを示しています。
詳細な説明

ホストキーがキャッシュされていない:

SSH クライアント（ここでは PuTTY の Plink）が、hf.co のホストキーをまだ信頼済みリスト（known hosts）に登録していません。このため、「サーバーがあなたが想定しているコンピュータかどうか」を確認できず、セキュリティ上の警告が表示されています。
ユーザーへの選択肢:

メッセージでは、信頼できる場合は「y」を入力してホストキーをキャッシュに追加し、接続を続行するよう求めています。一時的に接続する場合は「n」を入力することもできますが、通常は「y」を選択するのが一般的です。
接続失敗の原因:

もし「y」や「n」の入力が行われなかった場合、もしくは適切なアクセス権限やSSHキーが設定されていない場合、接続が確立されず「fatal: Could not read from remote repository.」というエラーが発生します。
解決策

ホストを信頼する場合:

コマンドプロンプトで表示されたプロンプトに「y」と入力してホストキーをキャッシュに追加し、接続を継続してください。
SSH 設定の確認:
使用している SSH キーが正しく設定され、リモートリポジトリへのアクセス権限があるか確認してください。

Windows で PuTTY/Plink を利用している場合は、PuTTY の設定や Pageant（SSH エージェント）が正しく起動しているかも確認してください。

既知のホストに手動で追加:

将来的に毎回プロンプトが表示されるのを防ぐため、hf.co のホストキーを事前に known_hosts ファイルに追加する方法もあります。
以上の対策を講じることで、リモートリポジトリへの接続が正常に行えるようになるはずです。
あなた:

.\plink.exe -ssh -C -i "C:\Users\taish.ssh\id_rsa.ppk" git clone git@hf.co

Unable to open connection:

Host does not exist

16 秒 の間、SSH接続エラーについて推論した

このエラーは、plink の引数の指定方法が原因です。あなたのコマンド
bash

コピーする

.\plink.exe -ssh -C -i "C:\Users\taish.ssh\id_rsa.ppk" git clone git@hf.co

では、plink は「git」をホスト名として解釈し、リモートで「clone git@hf.co」というコマンドを実行しようとしています。つまり、ホスト名「git」が存在しないため「Host does not exist」となっています。
対処方法

SSH 接続のテスト

まず、接続先のホストに正しく接続できるかを確認するため、以下のようにホスト部分を git@hf.co として直接接続テストを行ってみてください：
bash

コピーする

.\plink.exe -ssh -C -i "C:\Users\taish.ssh\id_rsa.ppk" git@hf.co

これにより、hf.co に対して SSH 接続ができるか確認できます。
Git Clone の正しい使用方法

通常、リポジトリのクローンは Git コマンドで実施します。SSH クライアントとして plink を使用する場合、環境変数 GIT_SSH を plink.exe に設定しておくと、以下のようなコマンドで実行できます：
bash

コピーする

git clone git@hf.co:username/repository.git

※ username と repository.git は実際のユーザー名とリポジトリ名に置き換えてください。
このように、直接 plink コマンドに「git clone」を渡すのではなく、Git のリモート URL として正しく指定することで、問題が解決するはずです。

matsu

 requirements.txt --extra-index-urlnumpy<2.0

transformers

gradio

--index-url https://download.pytorch.org/whl/cu118

torch==2.1.2

torchaudio==2.1.2

faster-whisper

ctranslate2==3.24.0
4 秒 の間、理由を述べた依存関係の解決方法

この requirements ファイルでは、以下のような記述になっています:
makefile

コピーする

numpy<2.0

transformers

gradio

--index-url https://download.pytorch.org/whl/cu118

torch==2.1.2

torchaudio==2.1.2

faster-whisper

ctranslate2==3.24.0

ここでの問題は、--index-url https://download.pytorch.org/whl/cu118 の指定が原因です。

このオプションは、pip のデフォルトのパッケージインデックス (PyPI) を上書きし、すべてのパッケージをこの指定先から取得しようとします。しかし、transformers や gradio などのパッケージはこのインデックスには存在しないため、「No matching distribution found for transformers」というエラーが発生します。
対処法

--index-url の代わりに --extra-index-url を使用する

--extra-index-url を使用すると、指定されたインデックスを追加の検索先として利用でき、PyPI はデフォルトのインデックスとして残ります。たとえば、requirements ファイルを以下のように変更します:
diff

コピーする

numpy<2.0

transformers

gradio
--index-url https://download.pytorch.org/whl/cu118
--extra-index-url https://download.pytorch.org/whl/cu118

torch==2.1.2

torchaudio==2.1.2

faster-whisper

ctranslate2==3.24.0

これにより、pip はまず PyPI (公式インデックス) からパッケージを探し、見つからない場合は追加のインデックスを検索するようになります。
必要なパッケージごとに指定する

もし torch や torchaudio など特定のパッケージのみをカスタムインデックスから取得したい場合は、コマンドラインで個別に指定するか、別の方法を検討してください。ただし、一般的には上記の --extra-index-url を使う方法が簡単です。
これらの変更を行えば、transformers や他のパッケージも正しくインストールできるはずです。

matsu

 Azure speechsdk でセグメンテーション設定を Time に設定するとエラーが起きる原因

タイムアウト値の範囲に制限あり。

公式ドキュメントには書いていないが Github issue に記載あり。

https://github.com/Azure-Samples/cognitive-services-speech-sdk/issues/2740
参考

https://learn.microsoft.com/en-us/javascript/api/microsoft-cognitiveservices-speech-sdk/propertyid?view=azure-node-latest

https://learn.microsoft.com/ja-jp/cpp/cognitive-services/speech/microsoft-cognitiveservices-speech-namespace

matsu

https://pypi.org/project/unidic/

matsu

Unidic アンインストール

アンインストール(辞書が残るので手動で消す)

pip install unidic
rm -rf {unidicのフォルダー}

matsu

Huggingface spaces "Error: No API found"

TypeError: argument of type 'bool' is not iterable

matsu

 Gradio の Examples にデータが埋め込めないexamples.dataset にデータを与えるとよい。

dropdown.change(update_examples, dropdown, examples.dataset)

https://github.com/gradio-app/gradio/pull/8745

matsu

 Huggingface spaces Storgae
https://discuss.huggingface.co/t/space-evicted-storage-limit-exceeded/36876/6

matsu

 Huggingface inference endpoints text embedding model batch size issueデフォルトでは 32 で、それ以上のリクエストをするとエラーがでる。

 対処方法デプロイ時のオプションでバッチサイズの指定が可能。


 参考https://github.com/huggingface/text-embeddings-inference
はい、画像にある Inference Endpoints のデプロイ画面で指定できる項目の中に「Container Arguments」や「Container Command」があり、これらを使えばhandler.pyなしでも一部の挙動を制御できる可能性があります。ただし、バッチサイズの変更が可能かは使っているベースイメージによります。

 🔍 画像の該当箇所
 Container Arguments
これは、Hugging Face が提供するベースの推論コンテナに 追加の引数を渡すためのフィールドです。
たとえば --batch-size 1024 のような引数がそのベースコンテナでサポートされていれば、ここで渡すことで変更できます。

 Container Command
実行コマンドそのものを上書きします（エントリーポイント変更）
ただしこちらは handler.py を使っているか、自前イメージをビルドしているとき向けです

 🧪 試せること（ベースイメージ次第）Hugging Face が管理している text-embeddings-inference ベース（Container Configuration で表示されている）を使っている場合、

これは sentence-transformers 系のモデルで、以下の引数が使える可能性があります：
--model-id sentence-transformers/all-MiniLM-L6-v2
--task sentence_embeddings
--batch-size 1024
これを Container Arguments に以下のように設定：
--batch-size 1024

 ⚠️ 注意点この方法が動作するかは、選択した「Task」が Sentence Embeddings であり、バックエンドが text-embeddings-inference コンテナである必要があります。
公式には明示されていませんが、内部的には text-embeddings-inference リポジトリが使われていることが多く、これは --batch-size に対応しています。

 ✅ 結論はい、Container Arguments に --batch-size 1024 を指定すれば、handler.py なしでバッチサイズを上げられる可能性があります。

 ✴ 試してみる入力（例）Container Arguments に以下を入力：
--batch-size 1024
Task は「Sentence Embeddings」を選択済みであることを確認
必要であれば text-embeddings-inference のコマンドライン仕様を確認し、正式なオプションを調べることもできます。試してみますか？

matsu

Jupyter で非同期関数

import nest_asyncio
nest_asyncio.apply()

matsu

FastMCP SSE on cursor

matsu

 Flowise (LangChain) で Bedrock を使うInference profile ID と正しいリージョンを設定する。

ssh で github, huggingface に繋がらない場合、ssh-agent がそもそも起動していない可能性

ssh -vT git@hf.co で Permission denied (publickey) が出るときの原因

huggingface に ssh で git するときは config の user は git じゃないといけないみたい

TTS の「いった」「おこなった」問題

Davinci resolve で一つのタイムラインに他のタイムラインを挿入したときに音がならない場合、挿入元タイムラインのオーディオがステレオとモノラルが混在している。

Fusionノードを別プロジェクトで使い回したいときはメディアインではなくローダーを使う

ローダーのフルパス指定はパスマップで省略できる

huggingface にデプロイした fastapi のエンドポイント

[VSCode] venvで作成した仮想環境がkernel一覧に表示されないときの対処法

Bert-VITS2 に脆弱性が報告されたが、その後修正されている

修正 PR マージ済み

Style-Bert-VITS2 の学習時に使う事前学習済みモデル

Early Stopping

warmup の効果と数

Whisper のファインチューニングでは事前学習よりも小さい学習率を設定した方がよいみたい

lr_scheduler_type = cosine_with_restarts は warmup あり

各 lr_scheduler

勾配蓄積

チェックポイントから WhisperProcessor をロードするには、抽出機とトークナイザーを個別にロードしてからインスタンス化する

cosine_with_restarts のピークは learning_rate

cosine_with_restarts の num_cycles は設定できない仕様

lr_scheduler_kwargs で与えられるみたい

Colab 上で Style-Bert-VITS2

音素バランス文

音声コーパスは JVS

声優統計コーパス→JSUTコーパス→JVSコーパス→つくよみちゃんコーパスの成り立ちについて

JVS コーパス

収録時間は音源の約 3 倍

DWANGO MEDIA VILLAGE - SEIREN VOICE

Gradio の DataFrame がスクロールでどんどん大きくなる問題

Faster-whisper のストリーミング化

検証環境

ポイント

コーパス

ATR503

ITA コーパス

Deepgram

ROHAN

話者分離 WhoSpeaks

Coqui model v2.0.2

whisper streaming

資料

エラー対処

Cursor で Google colab GPU リソースを使う

VSCode から Runpod に SSH 接続

jupyter カーネルが選べない

解決方法

jupyter notebook --no-browser --port=8080 & でエラー

エラー

対処

Colab カスタム GCE VM に接続します

dataset の concatenate

Huggingface で uvicorn アプリがスタート状態でスタックする

原因

参考

ffmpeg で g711 を wav に変換

tokenizer.json の中に vocab.json と merges.txt の内容が書かれている

ffmpegを使う

FFmpegで音声/動画ファイルを同間隔に分割する

ffmpeg 時間分割

VSCode のターミナルで自動的に .venv を activate する方法

Huggingface の Spaces に PyAudio がインストールできない

cu118 の torch を pip install したのに cudnn エラーが出る

ImportError: cannot import name 'ModelFilter' from 'huggingface_hub' while importing from nemo.collections.asr.models import EncDecMultiTaskModel

Unable to load any of {libcudnn_ops.so.9.1.0, libcudnn_ops.so.9.1, libcudnn_ops.so.9, libcudnn_ops.so}

got an unexpected keyword argument 'threshold'"

Huggingface リポジトリモデルリストアップ

最適なチェックポイントのみを保存したいとき

【python】pip installの際に「error: Microsoft Visual C++ 14.0 is required」が発生した場合の対応方法

virtualenvのactivateで「PSSecurityException」が発生する原因

libcuda.so が見つからない

WSL2 上で lspci しても GPU (Nvidia) が見えない

[cuda_driver.cpp:load_lib@36] libcuda.so lib not found.

OpenGL.error.Error: Attempt to retrieve context when no valid context

ITAコーパスの録音方法まとめ

Huggingface リポジトリを clone するときは

Mac+Cursor+Vimでキーの長押し問題

MacでKarabiner-Elementsを使う方法と効かない場合

Mac + US配列キーボード で ` (バッククォート) を打つと § (セクション) が入力される

Colab の CUDA とcuDNN のバージョン確認

`ssh -vT git@hf.co` で `Permission denied (publickey)` が出るときの原因

`lr_scheduler_type` = `cosine_with_restarts` は warmup あり

チェックポイントから `WhisperProcessor` をロードするには、抽出機とトークナイザーを個別にロードしてからインスタンス化する

`cosine_with_restarts` のピークは `learning_rate`

`cosine_with_restarts` の `num_cycles` は設定できない仕様

`lr_scheduler_kwargs` で与えられるみたい

`jupyter notebook --no-browser --port=8080 &` でエラー

WSL2 上で `lspci` しても GPU (Nvidia) が見えない

`OpenGL.error.Error: Attempt to retrieve context when no valid context`

Mac + US配列キーボードで ` (バッククォート) を打つと § (セクション) が入力される

#define CUDNN_MAJOR 8
#define CUDNN_MINOR 9
#define CUDNN_PATCHLEVEL 6

Dockerfile で `apt install` するときはユーザー切り替えする前に

`Container Arguments`

`Container Command`