GPT 4o transcribeでマイク音声を文字起こしを試したい

https://platform.openai.com/docs/api-reference/audio/createTranscription?lang=javascript
このExampleだと音声ファイル使ってるけど、試したいのはマイクからの入力なんだよなー
https://zenn.dev/kun432/scraps/314beaa08f2ad5
なお、通常のTranscriptions APIは基本的に入力はファイルが想定されている。マイクからの入力などのリアルタイムな音声についてはRealtime APIを使用する。
自分はRealtime APIをきちんと使ったことがないため、スキップ。ただRealtime APIの場合はVADやターンテイキングなどが使えるようなので、またあらためて確認したい。
なるほど、 そもそも普通にTranscription APIつかうのだどできなくて、Realtime APIと組み合わせる必要がある？

Motoki Watanabe

https://dev.classmethod.jp/articles/openai-transcribe/
OpenAIで文字起こしをするには whisper-1 、gpt-4o-mini-transcribe 、gpt-4o-transcribe の3つのモデルが使用可能

いずれもRealtime APIでストリーミング入力に対応可能で、接続方式はWebSocket
なるほど、やはりRealtime APIとの組み合わせっぽいぞ

Realtime API自体も触ったことがなくてよくわかってないぞ

Motoki Watanabe

https://platform.openai.com/docs/guides/speech-to-text#streaming-the-transcription-of-an-ongoing-audio-recording

WebSocketでやるっぽいぞ

Motoki Watanabe

OpenAI の Realtime APIはいまもBeta版なのか

https://platform.openai.com/docs/guides/realtime
これを使わないとマイク入力使ってGPT-4o-transcribeでSpeech To Textできないよな
Azure OpenAIでもプレビュー版っぽい？

https://learn.microsoft.com/ja-jp/azure/ai-services/openai/realtime-audio-reference

Motoki Watanabe

うわー、わかった。エンドポイントとしての wss://api.openai.com/v1/realtime とRealtime API（GPT 4o Realtime preview）って別の話なんだ。ずっと勘違いしてた。