Closed10
OpenAI Realtime APIを調査する
まだOpenAIのAPIはリリースされていない
curl https://api.openai.com/v1/models -H "Authorization: Bearer ${OPENAI_API_KEY}" | grep "realtime"
Azure OpenAIではUS-East2リージョンでリリースされている
modelをdeployしてendpointとkeyを取得できた
サンプルコード
READMEはAPI仕様がよく書かれているので熟読する
Javascriptのサンプルコードを実行
git clone https://github.com/Azure-Samples/aoai-realtime-audio-sdk
cd aoai-realtime-audio-sdk/javascript/samples/
sh download-pkg.sh
cd web
npm install
#npm install -g vite
npm run dev
viteがなかったので入れたが無事起動
喋った!
ログはこんな感じ
console.log
session.created
input_audio_buffer.speech_started
{
"type": "input_audio_buffer.speech_stopped",
"event_id": "event_AE7AFsr4Ls4kdX058ckLB",
"audio_end_ms": 2080,
"item_id": "item_AE7AEZvFcBygwZFQ8o4Yp"
}
{
"type": "input_audio_buffer.committed",
"event_id": "event_AE7AFwlWaTDfJpOtdyPtX",
"previous_item_id": null,
"item_id": "item_AE7AEZvFcBygwZFQ8o4Yp"
}
{
"type": "conversation.item.created",
"event_id": "event_AE7AFb26nQzlPrjeNg1oc",
"previous_item_id": null,
"item": {
"id": "item_AE7AEZvFcBygwZFQ8o4Yp",
"object": "realtime.item",
"type": "message",
"status": "completed",
"role": "user",
"content": [
{
"type": "input_audio",
"transcript": null
}
]
}
}
{
"type": "response.created",
"event_id": "event_AE7AF57xBshOaI3qZ3xSC",
"response": {
"object": "realtime.response",
"id": "resp_AE7AFASlfCpqbTe6KcVtn",
"status": "in_progress",
"status_details": null,
"output": [],
"usage": null
}
}
{
"type": "response.output_item.added",
"event_id": "event_AE7AFnWqCABzxHM9na8ue",
"response_id": "resp_AE7AFASlfCpqbTe6KcVtn",
"output_index": 0,
"item": {
"id": "item_AE7AFkYdCGa3lhYlLg3sN",
"object": "realtime.item",
"type": "message",
"status": "in_progress",
"role": "assistant",
"content": []
}
}
{
"type": "conversation.item.created",
"event_id": "event_AE7AFCmHfZC2ZSQPSOV1t",
"previous_item_id": "item_AE7AEZvFcBygwZFQ8o4Yp",
"item": {
"id": "item_AE7AFkYdCGa3lhYlLg3sN",
"object": "realtime.item",
"type": "message",
"status": "in_progress",
"role": "assistant",
"content": []
}
}
{
"type": "response.content_part.added",
"event_id": "event_AE7AFGtVIAXh06TgOvn49",
"response_id": "resp_AE7AFASlfCpqbTe6KcVtn",
"item_id": "item_AE7AFkYdCGa3lhYlLg3sN",
"output_index": 0,
"content_index": 0,
"content": {
"type": "audio",
"transcript": ""
},
"part": {
"type": "audio",
"transcript": ""
}
}
2response.audio_transcript.delta
conversation.item.input_audio_transcription.completed
2response.audio.delta
response.audio_transcript.delta
response.audio.delta
response.audio_transcript.delta
2response.audio.delta
5response.audio_transcript.delta
2response.audio.delta
response.audio_transcript.delta
response.audio.delta
4response.audio_transcript.delta
4response.audio.delta
{
"type": "response.audio.done",
"event_id": "event_AE7AG9Ksd628cdmW6Bju2",
"response_id": "resp_AE7AFASlfCpqbTe6KcVtn",
"item_id": "item_AE7AFkYdCGa3lhYlLg3sN",
"output_index": 0,
"content_index": 0
}
{
"type": "response.audio_transcript.done",
"event_id": "event_AE7AGTnbc2z0iHlgxIupb",
"response_id": "resp_AE7AFASlfCpqbTe6KcVtn",
"item_id": "item_AE7AFkYdCGa3lhYlLg3sN",
"output_index": 0,
"content_index": 0,
"transcript": "こんにちは!今日はどんなお手伝いが必要ですか?"
}
{
"type": "response.content_part.done",
"event_id": "event_AE7AGLjBa3USXa0f60vVU",
"response_id": "resp_AE7AFASlfCpqbTe6KcVtn",
"item_id": "item_AE7AFkYdCGa3lhYlLg3sN",
"output_index": 0,
"content_index": 0,
"content": {
"type": "audio",
"transcript": "こんにちは!今日はどんなお手伝いが必要ですか?"
},
"part": {
"type": "audio",
"transcript": "こんにちは!今日はどんなお手伝いが必要ですか?"
}
}
{
"type": "response.output_item.done",
"event_id": "event_AE7AGrxONmrXi3W7mntUy",
"response_id": "resp_AE7AFASlfCpqbTe6KcVtn",
"output_index": 0,
"item": {
"id": "item_AE7AFkYdCGa3lhYlLg3sN",
"object": "realtime.item",
"type": "message",
"status": "completed",
"role": "assistant",
"content": [
{
"type": "audio",
"transcript": "こんにちは!今日はどんなお手伝いが必要ですか?"
}
]
}
}
メッセージレスポンスは一連のイベントとなる
- response.created
- response.output_item.added
- conversation.item.created
- response.content_part.added
- response.text.delta...
- response.text.done
- response.content_part.done
- response.output_item.done
- response.done
conversation.itemが会話における1ターンで、responseはそれに結びつく内容を送ってくる感じか
- VAD(Voice Activity Detection)モードではユーザーの会話の区切りを自動認識してAIの返答を返す
- VADが適切な区切りを認識しない場合は会話の途中で返答を開始するので、プロンプトを使って会話の内容を確認して待つとか制御が必要か
- WebSocketなのでAIが答えている間も音声認識が動いており、途中で喋りかけて止めたりできる
tool callを試す
sessionでtoolを指定 type: "function"
が必要
ついでに音声を返さない modalities: ["text"]
を指定
main.ts
let configMessage: SessionUpdateMessage = {
type: "session.update",
session: {
modalities: ["text"],
turn_detection: {
type: "server_vad",
},
input_audio_transcription: {
model: "whisper-1"
},
tool_choice: "auto",
tools: [
{
name: "get_weather",
description: "Get the weather at a given location",
type: "function",
parameters: {
type: "object",
properties: {
location: {
type: "string",
description: "Location to get the weather from",
},
scale: {
type: "string",
enum: ['celsius', 'farenheit']
},
},
required: ["location", "scale"],
},
},
]
}
};
クライアントでツール実行の結果を返すhandlerを追加
main.ts
async function handleRealtimeMessages() {
for await (const message of realtimeStreaming.messages()) {
let consoleLog = "" + message.type;
switch (message.type) {
...
case "response.text.delta":
appendToTextBlock(message.delta);
break;
case "response.function_call_arguments.done":
console.log("Function call arguments received: " + message.arguments);
realtimeStreaming.send({
type: "conversation.item.create",
item: {
type: "function_call_output",
call_id: message.call_id,
output: "Rainy, 20degrees celsius",
},
});
realtimeStreaming.send({
event_id: "evt_reYb9LWwV1EmL4wz2",
type: "response.create"
});
break;
tool実行結果を元に答えた
Webでtool実行するとできることは限られるだろう
ログはこんな感じ
console.log
session.created
input_audio_buffer.speech_started
{
"type": "input_audio_buffer.speech_stopped",
"event_id": "event_AE7U4cITJOENtUWhu06OJ",
"audio_end_ms": 2816,
"item_id": "item_AE7U36239vwconWGzqv8E"
}
{
"type": "input_audio_buffer.committed",
"event_id": "event_AE7U4vkuZiAGmlzFtZqUi",
"previous_item_id": null,
"item_id": "item_AE7U36239vwconWGzqv8E"
}
{
"type": "conversation.item.created",
"event_id": "event_AE7U4Djagn03y4RATHnKM",
"previous_item_id": null,
"item": {
"id": "item_AE7U36239vwconWGzqv8E",
"object": "realtime.item",
"type": "message",
"status": "completed",
"role": "user",
"content": [
{
"type": "input_audio",
"transcript": null
}
]
}
}
{
"type": "response.created",
"event_id": "event_AE7U4lcX8QVqO8310zgBl",
"response": {
"object": "realtime.response",
"id": "resp_AE7U4kDbllR0YbhcrLhRY",
"status": "in_progress",
"status_details": null,
"output": [],
"usage": null
}
}
{
"type": "response.output_item.added",
"event_id": "event_AE7U46fz9maxyZAqfXQ7F",
"response_id": "resp_AE7U4kDbllR0YbhcrLhRY",
"output_index": 0,
"item": {
"id": "item_AE7U4e1aHlvzDRzEGBMRX",
"object": "realtime.item",
"type": "function_call",
"status": "in_progress",
"name": "get_weather",
"call_id": "call_oLhGAc0A0wmXA8En",
"arguments": ""
}
}
{
"type": "conversation.item.created",
"event_id": "event_AE7U45MZM3CVFOdHM8Qlw",
"previous_item_id": "item_AE7U36239vwconWGzqv8E",
"item": {
"id": "item_AE7U4e1aHlvzDRzEGBMRX",
"object": "realtime.item",
"type": "function_call",
"status": "in_progress",
"name": "get_weather",
"call_id": "call_oLhGAc0A0wmXA8En",
"arguments": ""
}
}
{
"type": "response.function_call_arguments.delta",
"event_id": "event_AE7U4LQmcP9xX9XprHoYA",
"response_id": "resp_AE7U4kDbllR0YbhcrLhRY",
"item_id": "item_AE7U4e1aHlvzDRzEGBMRX",
"output_index": 0,
"call_id": "call_oLhGAc0A0wmXA8En",
"delta": "{\n"
}
{
"type": "response.function_call_arguments.delta",
"event_id": "event_AE7U4ijAKw3gsNkSgiKow",
"response_id": "resp_AE7U4kDbllR0YbhcrLhRY",
"item_id": "item_AE7U4e1aHlvzDRzEGBMRX",
"output_index": 0,
"call_id": "call_oLhGAc0A0wmXA8En",
"delta": " "
}
{
"type": "response.function_call_arguments.delta",
"event_id": "event_AE7U4j62F6MyIYdWig54d",
"response_id": "resp_AE7U4kDbllR0YbhcrLhRY",
"item_id": "item_AE7U4e1aHlvzDRzEGBMRX",
"output_index": 0,
"call_id": "call_oLhGAc0A0wmXA8En",
"delta": " \""
}
{
"type": "response.function_call_arguments.delta",
"event_id": "event_AE7U4W72ueJ7LnZ9y2074",
"response_id": "resp_AE7U4kDbllR0YbhcrLhRY",
"item_id": "item_AE7U4e1aHlvzDRzEGBMRX",
"output_index": 0,
"call_id": "call_oLhGAc0A0wmXA8En",
"delta": "location"
}
{
"type": "response.function_call_arguments.delta",
"event_id": "event_AE7U4MVUQToeciuc3LTke",
"response_id": "resp_AE7U4kDbllR0YbhcrLhRY",
"item_id": "item_AE7U4e1aHlvzDRzEGBMRX",
"output_index": 0,
"call_id": "call_oLhGAc0A0wmXA8En",
"delta": "\":"
}
{
"type": "response.function_call_arguments.delta",
"event_id": "event_AE7U4TRFYG3woPpFRHeYk",
"response_id": "resp_AE7U4kDbllR0YbhcrLhRY",
"item_id": "item_AE7U4e1aHlvzDRzEGBMRX",
"output_index": 0,
"call_id": "call_oLhGAc0A0wmXA8En",
"delta": " \""
}
{
"type": "response.function_call_arguments.delta",
"event_id": "event_AE7U4SpuqAPv4ZDJjjfR8",
"response_id": "resp_AE7U4kDbllR0YbhcrLhRY",
"item_id": "item_AE7U4e1aHlvzDRzEGBMRX",
"output_index": 0,
"call_id": "call_oLhGAc0A0wmXA8En",
"delta": "Tokyo"
}
{
"type": "response.function_call_arguments.delta",
"event_id": "event_AE7U4TBCB0cERS5U26TW6",
"response_id": "resp_AE7U4kDbllR0YbhcrLhRY",
"item_id": "item_AE7U4e1aHlvzDRzEGBMRX",
"output_index": 0,
"call_id": "call_oLhGAc0A0wmXA8En",
"delta": "\",\n"
}
{
"type": "response.function_call_arguments.delta",
"event_id": "event_AE7U4Ztg0HUiN72eYd4eJ",
"response_id": "resp_AE7U4kDbllR0YbhcrLhRY",
"item_id": "item_AE7U4e1aHlvzDRzEGBMRX",
"output_index": 0,
"call_id": "call_oLhGAc0A0wmXA8En",
"delta": " "
}
conversation.item.input_audio_transcription.completed
{
"type": "response.function_call_arguments.delta",
"event_id": "event_AE7U4Xs91THYsVhYtZM1E",
"response_id": "resp_AE7U4kDbllR0YbhcrLhRY",
"item_id": "item_AE7U4e1aHlvzDRzEGBMRX",
"output_index": 0,
"call_id": "call_oLhGAc0A0wmXA8En",
"delta": " \""
}
{
"type": "response.function_call_arguments.delta",
"event_id": "event_AE7U40Oe5Hoy85DXqr7gW",
"response_id": "resp_AE7U4kDbllR0YbhcrLhRY",
"item_id": "item_AE7U4e1aHlvzDRzEGBMRX",
"output_index": 0,
"call_id": "call_oLhGAc0A0wmXA8En",
"delta": "scale"
}
{
"type": "response.function_call_arguments.delta",
"event_id": "event_AE7U4m9RD53JkI1w05z61",
"response_id": "resp_AE7U4kDbllR0YbhcrLhRY",
"item_id": "item_AE7U4e1aHlvzDRzEGBMRX",
"output_index": 0,
"call_id": "call_oLhGAc0A0wmXA8En",
"delta": "\":"
}
{
"type": "response.function_call_arguments.delta",
"event_id": "event_AE7U4Jc1L93WBVOYXycPF",
"response_id": "resp_AE7U4kDbllR0YbhcrLhRY",
"item_id": "item_AE7U4e1aHlvzDRzEGBMRX",
"output_index": 0,
"call_id": "call_oLhGAc0A0wmXA8En",
"delta": " \""
}
{
"type": "response.function_call_arguments.delta",
"event_id": "event_AE7U4W321YLcia3iku6Yg",
"response_id": "resp_AE7U4kDbllR0YbhcrLhRY",
"item_id": "item_AE7U4e1aHlvzDRzEGBMRX",
"output_index": 0,
"call_id": "call_oLhGAc0A0wmXA8En",
"delta": "c"
}
{
"type": "response.function_call_arguments.delta",
"event_id": "event_AE7U4mPIvreecLKGzhKih",
"response_id": "resp_AE7U4kDbllR0YbhcrLhRY",
"item_id": "item_AE7U4e1aHlvzDRzEGBMRX",
"output_index": 0,
"call_id": "call_oLhGAc0A0wmXA8En",
"delta": "elsius"
}
{
"type": "response.function_call_arguments.delta",
"event_id": "event_AE7U4CXbwNo7d7bhMMct6",
"response_id": "resp_AE7U4kDbllR0YbhcrLhRY",
"item_id": "item_AE7U4e1aHlvzDRzEGBMRX",
"output_index": 0,
"call_id": "call_oLhGAc0A0wmXA8En",
"delta": "\"\n"
}
{
"type": "response.function_call_arguments.delta",
"event_id": "event_AE7U4GdaEqDIZxIqczEZ7",
"response_id": "resp_AE7U4kDbllR0YbhcrLhRY",
"item_id": "item_AE7U4e1aHlvzDRzEGBMRX",
"output_index": 0,
"call_id": "call_oLhGAc0A0wmXA8En",
"delta": "}"
}
Function call arguments received: {
"location": "Tokyo",
"scale": "celsius"
}
response.function_call_arguments.done
{
"type": "response.output_item.done",
"event_id": "event_AE7U4dQkD00VBU5JAU5zN",
"response_id": "resp_AE7U4kDbllR0YbhcrLhRY",
"output_index": 0,
"item": {
"id": "item_AE7U4e1aHlvzDRzEGBMRX",
"object": "realtime.item",
"type": "function_call",
"status": "completed",
"name": "get_weather",
"call_id": "call_oLhGAc0A0wmXA8En",
"arguments": "{\n \"location\": \"Tokyo\",\n \"scale\": \"celsius\"\n}"
}
}
response.done
{
"type": "conversation.item.created",
"event_id": "event_AE7U5jKNqruEAQWtfFojp",
"previous_item_id": "item_AE7U4e1aHlvzDRzEGBMRX",
"item": {
"id": "item_AE7U5hkbe9W0sKcTwyQbV",
"object": "realtime.item",
"type": "function_call_output",
"status": "completed",
"call_id": "call_oLhGAc0A0wmXA8En",
"output": "Rainy, 20degrees celsius"
}
}
{
"type": "response.created",
"event_id": "event_AE7U5yfADQZ6mwr421pc6",
"response": {
"object": "realtime.response",
"id": "resp_AE7U5ZWqCEBxOw4zQRc0T",
"status": "in_progress",
"status_details": null,
"output": [],
"usage": null
}
}
{
"type": "response.output_item.added",
"event_id": "event_AE7U5PIFyLCv3B1x8Dd3Y",
"response_id": "resp_AE7U5ZWqCEBxOw4zQRc0T",
"output_index": 0,
"item": {
"id": "item_AE7U5IaYNwtRnNDZHP0ig",
"object": "realtime.item",
"type": "message",
"status": "in_progress",
"role": "assistant",
"content": []
}
}
{
"type": "conversation.item.created",
"event_id": "event_AE7U5aL9eFQ4H86ZUvv75",
"previous_item_id": "item_AE7U5hkbe9W0sKcTwyQbV",
"item": {
"id": "item_AE7U5IaYNwtRnNDZHP0ig",
"object": "realtime.item",
"type": "message",
"status": "in_progress",
"role": "assistant",
"content": []
}
}
{
"type": "response.content_part.added",
"event_id": "event_AE7U5t6L4NsnUtFaZ5vqJ",
"response_id": "resp_AE7U5ZWqCEBxOw4zQRc0T",
"item_id": "item_AE7U5IaYNwtRnNDZHP0ig",
"output_index": 0,
"content_index": 0,
"content": {
"type": "text",
"text": ""
},
"part": {
"type": "text",
"text": ""
}
}
35response.text.delta
{
"type": "response.text.done",
"event_id": "event_AE7U5q537gzp3yttYPej7",
"response_id": "resp_AE7U5ZWqCEBxOw4zQRc0T",
"item_id": "item_AE7U5IaYNwtRnNDZHP0ig",
"output_index": 0,
"content_index": 0,
"text": "東京の天気は雨で、気温は20度です。雨が続いているので、傘を忘れずにお出かけくださいね。"
}
{
"type": "response.content_part.done",
"event_id": "event_AE7U5Y1uf1r2R05apsoE0",
"response_id": "resp_AE7U5ZWqCEBxOw4zQRc0T",
"item_id": "item_AE7U5IaYNwtRnNDZHP0ig",
"output_index": 0,
"content_index": 0,
"content": {
"type": "text",
"text": "東京の天気は雨で、気温は20度です。雨が続いているので、傘を忘れずにお出かけくださいね。"
},
"part": {
"type": "text",
"text": "東京の天気は雨で、気温は20度です。雨が続いているので、傘を忘れずにお出かけくださいね。"
}
}
{
"type": "response.output_item.done",
"event_id": "event_AE7U5LRy8trnKf0myOdQZ",
"response_id": "resp_AE7U5ZWqCEBxOw4zQRc0T",
"output_index": 0,
"item": {
"id": "item_AE7U5IaYNwtRnNDZHP0ig",
"object": "realtime.item",
"type": "message",
"status": "completed",
"role": "assistant",
"content": [
{
"type": "text",
"text": "東京の天気は雨で、気温は20度です。雨が続いているので、傘を忘れずにお出かけくださいね。"
}
]
}
}
response.done
まとめ
- AzureのOpenAI Realtime APIは公開されているがOpenAIのAPIと認証が違ってそのままではOpenAI側のクライアントを利用できない
- 日本語は音声として通じるが、Transcription表示が韓国語やロシア語になったりする
- この辺の微調整がOpenAI API公開が遅れている理由?
- セッション自体はStatefulだが、長期保存はされないので過去の会話に復帰したい場合は履歴は保存して渡す必要がある
- APIキー隠蔽やToolcallを含め、バックエンドを作り込む必要がある
- テストが難しいのでプロンプトエンジニアリングの難易度はかなり高い
- 逆に言えばテストのために送られてきた音声を保存するサービスも当然でてくると思うので利用前に規約を読んだほうがいい
- 音声認識した分だけのtokenであればさほど入力のコストはかからないのではないか
- AIがずっと喋り続けてユーザーがたまに反応するようなケースはかなりコストが高そう
- ツールといかに連携させるかがRealtime API利用にとって重要だろう
このスクラップは1ヶ月前にクローズされました