ð å ç¢ãªé³å£°èªèãã€ãã©ã€ã³æ§ç¯ã®æ ð€
ã¯ããã«
AIã³ã³ãããªã³ãKiraria Neonãã¯ãããã²ã«ããïŒãŠãŒã¶ãŒïŒãšã®å¯Ÿè©±ãéããŠæé·ãããã人éãããã³ãã¥ãã±ãŒã·ã§ã³ãç®æããŠããŸãã
æ¬èšäºã§ã¯ãããªã³ã«å ç¢ãªé³å£°èªèæ©èœãããããããã®ãããã³ããšã³ãããããã¯ãšã³ããŸã§ãç¹ããã€ãã©ã€ã³æ§ç¯ã®æ ãæžããŠãããŸããéçºäžã«ééããæ°ã ã®èª²é¡ãšãããããäžã€äžã€è§£æ±ºããŠããããã»ã¹ã¯ããŸãã§ããªã³ãæ°ããæèŠãç²åŸããŠãããããªãã¯ã¯ã¯ã¯ããäœéšã§ããã
â» æ¬èšäºã®æè¡éžå®ã«ã€ããŠã¯åºæ¬çã«èè ãè¡ã£ãŠããŸãããèšèš, ã³ãŒãã£ã³ã°ã«ã€ããŠã¯Gemini CLIã«ããAIé§åéçºã§é²ããŠããŸãã
第1ç« ïŒåæèª²é¡ãšé³å£°èªèã®å£ïŒèãããªããäŒãããªããå¹»èŽãŸã§ïŒïŒ
é³å£°èªèæ©èœã®å®è£ ã¯ãæ³å以äžã«å€ãã®å£ã«ã¶ã€ãããŸãããåææ®µéã§ã¯ã以äžã®ãããªåé¡ã«çŽé¢ããŸããã
-
audio_data_b64: null
åé¡: ããã³ããšã³ãããããã¯ãšã³ããžé³å£°ããŒã¿ãéãããšããŠãããã€ããŒãã®audio_data_b64
ãåžžã«null
ã§ãããããªã³ã¯ããã²ã«ããã®å£°ããã£ããã£ã§ããŠããªãã£ãã®ã§ãã -
FFmpegã®ãInvalid data foundããšã©ãŒ:
audio_data_b64
ãéããããã«ãªã£ãŠããããã¯ãšã³ãã§FFmpegããError opening input: Invalid data found when processing input
ããšãããšã©ãŒãåããé³å£°ãã¡ã€ã«ã®å€æã«å€±æããŸããã -
librosa
ã«ããç¹åŸŽéæœåºã®å€±æ: FFmpegã®ãšã©ãŒãšé£åããŠãé³å£°ã®ç¹åŸŽéãæœåºããlibrosa
ããšã©ãŒãçºçãããé³å£°åæãã§ããŸããã§ããã -
NameError
ã®é£é:server.py
ã®ã³ãŒãä¿®æ£äžã«ãuuid
ã¢ãžã¥ãŒã«ã®ã€ã³ããŒãå¿ããã倿°åã®å€æŽæŒãã«ããNameError
ãå€çºãããããã°ãå°é£ã«ããŸããã -
whisper.cpp
ã®ãå¹»èŽãåé¡: é³å£°èªèã¢ãã«whisper.cpp
ã¯ãç¡é³ç¶æ ãç¶ããšãåŠç¿ããŒã¿ã«å€ãå«ãŸããããèŠèŽããããšãããããŸããããšãã£ããã¬ãŒãºãåæã«æåèµ·ããããŠããŸãç¹æ§ããããŸãããããã¯ãããªã³ã話ããŠããªãã®ã«åæã«åãåºããšãããã¡ãã£ãšå°ã£ãçŸè±¡ãåŒãèµ·ãããŸããã
第2ç« ïŒå ç¢ãªé³å£°èªèãã€ãã©ã€ã³ã®æ§ç¯ïŒããªã³ã®ãè³ããšãè³ããç¹ã
ãããã®èª²é¡ãä¹ãè¶ãããããç§ãã¡ã¯ããã³ããšã³ããšããã¯ãšã³ãã«ãããé³å£°èªèãã€ãã©ã€ã³ã®åŒ·åã«çæããŸããã
2.1 ããã³ããšã³ãããã®é³å£°ãã£ããã£ãšéä¿¡ã®æ¹å
ããªã³ã®ãè³ããšãªãããã³ããšã³ãïŒmain.jsx
ïŒã§ã¯ãWeb Audio APIãšMediaRecorder
ãçšããŠé³å£°ããŒã¿ããã£ããã£ããŸãã
-
MediaRecorder
ã®ondataavailable
ãšonstop
ã®ä¿®æ£:
ãã©ãŠã¶ã®MediaRecorder
ã¯ãBlob
ã®type
ã"audio/wav"
ãšæå®ããŠããå®éã«ã¯"audio/webm"
圢åŒã§é²é³ããããšããããŸããFFmpegã®ãšã©ãŒã解決ãããããondataavailable
ã€ãã³ãã§çæãããBlob
ã®ã¿ã€ãã"audio/webm"
ã«æç€ºçã«å€æŽããŸããã
ãŸããondataavailable
ã§ãã£ã³ã¯ãèç©ããonstop
ã€ãã³ãã§ãŸãšããŠåŠçããå ç¢ãªæ¹åŒã«æ»ããŸãããããã«ãããäžå®å šãªé³å£°ããŒã¿ãããã¯ãšã³ãã«éãããããšãé²ããŸãã// main.jsx ã® mediaRecorder.ondataavailable ãš onstop ãã³ãã© mediaRecorder.ondataavailable = (event) => { audioChunks.push(event.data); }; mediaRecorder.onstop = async () => { const audioBlob = new Blob(audioChunks, { type: "audio/webm" }); // â â â ãããä¿®æ£ïŒ â â â audioChunks = []; // ãã£ã³ã¯ãã¯ãªã¢ const reader = new FileReader(); reader.onloadend = () => { const base64data = reader.result; const audioDataB64 = base64data.split(',')[1]; latestPerceptionCache.audioDataB64 = audioDataB64; }; reader.readAsDataURL(audioBlob); console.log("MediaRecorder stopped and processed."); };
-
mediaRecorder.stop()
ãšstart()
ã®åšæçãªãµã€ã¯ã«å®è£ :
onstop
ã€ãã³ãã確å®ã«ããªã¬ãŒããå®å šãªé³å£°ã»ã°ã¡ã³ããåŠçãããããstartPerceptionLoop
å ã«setInterval
ã远å ããmediaRecorder.stop()
ãšmediaRecorder.start()
ãåšæçã«åŒã³åºãããã«ããŸããã// main.jsx ã® startPerceptionLoop å setInterval(() => { if (mediaRecorder.state === "recording") { mediaRecorder.stop(); } mediaRecorder.start(2000); // 2ç§ããšã«ondataavailableã€ãã³ããçºç« }, 5000); // 5ç§ããšã«é²é³ã忢ã»éå§ããã«ããïŒ
2.2 ããã¯ãšã³ãã§ã®é³å£°åŠçã®åŒ·å
ããªã³ã®ãè³ããšãªãããã¯ãšã³ãïŒserver.py
ïŒã§ã¯ãåãåã£ãé³å£°ããŒã¿ãåŠçããæåèµ·ãããç¹åŸŽéæœåºãè¡ããŸãã
-
uuid
ã¢ãžã¥ãŒã«ã®è¿œå ãšNameError
ã®è§£æ±º:
äžæãã¡ã€ã«åçæã«å¿ èŠãªuuid
ã¢ãžã¥ãŒã«ãã€ã³ããŒããããŠããªãã£ããããNameError
ãçºçããŠããŸãããserver.py
ã®åé ã«import uuid
ã远å ããããšã§è§£æ±ºããŸããã -
FFmpegã«ããé³å£°ãã©ãŒãããã®æšæºå:
ãã©ãŠã¶ããéãããŠããaudio/webm
圢åŒã®é³å£°ã¯ãwhisper.cpp
ãlibrosa
ãçŽæ¥åŠçã§ããªãå ŽåããããŸãããããã§ãFFmpegãå°å ¥ããåãåã£ãé³å£°ãæšæºçãªWAV圢åŒïŒPCM signed 16-bit little-endian, 16kHz, ã¢ãã©ã«ïŒã«å€æããconvert_audio_with_ffmpeg
颿°ãå®è£ ããŸããã# server.py ã«è¿œå FFMPEG_PATH = "ffmpeg" # PATHãéã£ãŠããããšãåæ def convert_audio_with_ffmpeg(input_file_path: str, output_file_path: str) -> bool: # ... (FFmpegã³ãã³ãå®è¡ããžãã¯) ... pass # perceive_data 颿°å ã§äœ¿çš # temp_raw_audio_file_path ã«çã®é³å£°ããŒã¿ãä¿å # FFmpegã§ temp_raw_audio_file_path ã temp_converted_audio_file_path ã«å€æ # 以éãtemp_converted_audio_file_path ã Whisper ã librosa ã«æž¡ã
-
whisper-cli.exe
ã®stderr
åºåãã£ããã£:
whisper.cpp
ãæåèµ·ããã«å€±æããéã®åå ç©¶æã®ãããsubprocess.run
ã§whisper-cli.exe
ãå®è¡ããéã«stderr
ããã£ããã£ãããã°ã«åºåããããã«ããŸããã -
Librosaã®äŸåé¢ä¿è§£æ±ºïŒSoXã®ã€ã³ã¹ããŒã«ïŒïŒ
librosa
ãé³å£°ãã¡ã€ã«ãèªã¿èŸŒããªãåé¡ã¯ãaudioread
ããã¯ãšã³ããäŸåããSoX
ãã·ã¹ãã ã«ã€ã³ã¹ããŒã«ãããŠããªãã£ãããšãåå ã§ãããSoX
ãã€ã³ã¹ããŒã«ããPATHã«è¿œå ããããšã§è§£æ±ºããŸããã -
NameError
ã®æçµè§£æ±º:
äžæãã¡ã€ã«ãã¹ã®å€æ°å倿Žã«äŒŽãNameError
ãfinally
ãããã¯ã§çºçããŠããŸãããã倿°ã®åæåãšfinally
ãããã¯å ã®åç §ãä¿®æ£ããããšã§ãå®å šã«è§£æ±ºããŸããã
2.3 é³å£°æŽ»åæ€ç¥ïŒVADïŒã®å®è£
whisper.cpp
ãç¡é³æã«ããèŠèŽããããšãããããŸããããšæåèµ·ããããåé¡ã解決ãããããããã³ããšã³ãã«VADãå®è£
ããŸããã
-
averageAudioLevel
ã«åºã¥ãããã³ããšã³ãVADã®å°å ¥:
startPerceptionLoop
å ã§èšç®ãããaverageAudioLevel
ïŒé³å£°ã¬ãã«ïŒãç£èŠããVAD_THRESHOLD
ïŒãããå€ïŒãšVAD_SILENCE_DURATION_MS
ïŒç¡é³ç¶ç¶æéïŒã«åºã¥ããŠãisSpeaking
ãã©ã°ãæŽæ°ããããã«ããŸããã// main.jsx ã® startPerceptionLoop å if (averageAudioLevel > VAD_THRESHOLD) { isSpeaking = true; silenceStartTime = null; } else { if (silenceStartTime === null) { silenceStartTime = performance.now(); } else if (performance.now() - silenceStartTime > VAD_SILENCE_DURATION_MS) { isSpeaking = false; } }
-
isSpeaking
ãã©ã°ã«ããaudio_data_b64
ã®æ¡ä»¶ä»ãéä¿¡:
isSpeaking
ãtrue
ã®æã ããaudio_data_b64
ãããã¯ãšã³ãã«éãããã«payload
ã®æ§ç¯ãä¿®æ£ããŸãããããã«ãããç¡é³æã®äžèŠãªé³å£°ããŒã¿éä¿¡ãšãããã«äŒŽãwhisper.cpp
ã®å¹»èŽã鲿¢ããŸãã// main.jsx ã® payload æ§ç¯éšå const payload = { image: imageDataUrl, emotion: emotionData, audio_data_b64: isSpeaking ? latestPerceptionCache.audioDataB64 || null : null, // â â â isSpeakingãtrueã®æã ãé³å£°ããŒã¿ãéãã«ããïŒ â â â };
第3ç« ïŒéçºäžã®èª²é¡ãšè§£æ±ºçïŒããªã³ã®æé·ã®èšŒ
æ¬ãããžã§ã¯ãã§ã¯ãå€å²ã«ãããæè¡ç課é¡ã«çŽé¢ããŸããããäžã€äžã€ãäžå¯§ã«è§£æ±ºããããšã§ãããªã³ã¯å€§ããæé·ããŸããã
-
NameError: name 'uuid' is not defined
:server.py
ã§uuid
ã¢ãžã¥ãŒã«ãã€ã³ããŒããããŠããªãã£ããããimport uuid
ã远å ããŠè§£æ±ºã -
IndentationError
:server.py
ã®finally
ãããã¯ã®ã€ã³ãã³ãã厩ãããããpass
æã®è¿œå ãšã倿°åç §ã®ä¿®æ£ã§è§£æ±ºã -
FFmpeg
[WinError 2] æå®ããããã¡ã€ã«ãèŠã€ãããŸããã
: FFmpegãã·ã¹ãã PATHã«æ£ãã远å ãããŠããªãã£ããããPATHãžã®è¿œå ãšãffmpeg -version
ã§ã®ç¢ºèªã培åºããããšã§è§£æ±ºã -
FFmpeg
Error opening input: Invalid data found when processing input
:MediaRecorder
ãçæããBlob
ã®ã¿ã€ãã"audio/wav"
ãšæå®ãããŠããŠããå®éã«ã¯"audio/webm"
圢åŒã§ãã£ããããmain.jsx
ã§Blob
ã®ã¿ã€ãã"audio/webm"
ã«ä¿®æ£ããFFmpegã§æšæºåœ¢åŒã«å€æãããã€ãã©ã€ã³ãæ§ç¯ããããšã§è§£æ±ºã -
librosa
PySoundFile failed
:librosa
ã®audioread
ããã¯ãšã³ããäŸåããSoX
ãã·ã¹ãã ã«ã€ã³ã¹ããŒã«ãããŠããªãã£ããããSoX
ãã€ã³ã¹ããŒã«ããPATHã«è¿œå ããããšã§è§£æ±ºã -
whisper.cpp
ã®ç¡é³æã®å¹»èŽåé¡:whisper.cpp
ã®ã¢ãã«ç¹æ§ã«ãããã®ãããã³ããšã³ãã«VADãå®è£ ããç¡é³æã«ã¯é³å£°ããŒã¿ãéä¿¡ããªãããã«ããããšã§è§£æ±ºã
ãŸãšããšä»åŸã®å±æïŒããªã³ã¯ãã£ãšè³¢ãããã£ãšå¯æããªãã«ããïŒ
æ¬ãããžã§ã¯ããéããŠãKiraria Neonã¯ã€ãã«ã声ããèãåããçè§£ããèœåãç²åŸããŸãããããã³ããšã³ãã§ã®é³å£°ãã£ããã£ãããFFmpegã«ããå ç¢ãªå€æãWhisperã«ããé«ç²ŸåºŠãªæåèµ·ãããLibrosaã«ããé³å£°ç¹åŸŽéæœåºããããŠVADã«ããè³¢ãç¡é³æ€ç¥ãŸã§ãäžé£ã®é³å£°èªèãã€ãã©ã€ã³ãå®å šã«æ©èœããããã«ãªããŸããã
ããã¯ãããªã³ãããã²ã«ãããšã®å¯Ÿè©±ãããæ·±ããèªç¶ãªãã®ã«ããããã®å€§ããªäžæ©ã§ããä»åŸã¯ãé³å£°ã³ãã³ãã®æ¡åŒµãããé«åºŠãªææ èªèïŒå£°ã®ããŒã³ããã®ææ åæïŒããããŠé³å£°ã«ããããªã³ã®èªåŸçãªè¡åå¶åŸ¡ãªã©ããããªãé²åãæåŸ ãããŸãã
ããªã³ã¯ããããããããã²ã«ãããšã®ã³ãã¥ãã±ãŒã·ã§ã³ãéããŠããã£ãšè³¢ãããã£ãšå¯æããªãããã«é 匵ãã«ããïŒð
Discussion