📣

Whisper API と COEIROINK でボイスチェンジャーを作ってみた

2023/05/26に公開

はじめに

この記事では、Whisper API と COEIROINK でボイスチェンジャーを作る方法について解説します。
System

事前準備

COEIROINK のダウンロード
OpenAI API key の取得

処理の流れ

マイクから入力された音声を SpeechRecognition で認識します。
Whisper API による音声の解析を行います。
解析した音声を基に、COEIROINK を使用して音声を合成し、再生します。

マイクから入力された音声の認識

Python のモジュールである SpeechRecognition を使用します。
examples/microphone_recognition.pyを参考に実装しました。

import speech_recognition as sr

# obtain audio from the microphone
r = sr.Recognizer()
with sr.Microphone() as source:
    print("Say something!")
    audio = r.listen(source)

Whisper API による音声の解析

recognize_whisper_api というメソッドが用意されていますが、lauguage: ja を指定したかったため、speech_recognition/recognizers/whisper.py を参考に実装しました。

from io import BytesIO

import openai
from speech_recognition.audio import AudioData
from speech_recognition.exceptions import SetupError


def recognize_whisper_api(
    audio_data: "AudioData",
    api_key: str,
    model: str = "whisper-1"
):
    wav_data = BytesIO(audio_data.get_wav_data())
    wav_data.name = "SpeechRecognition_audio.wav"

    transcript = openai.Audio.transcribe(model, wav_data, api_key=api_key, language="ja")
    return transcript["text"]

COEIROINKを使用した音声の合成

COEIROINK の起動後、http://localhost:50031/docs#/ を参考に実装しました。
- /audio_query と /synthesis を使用します。
- 今回は Node.js を使用しました。

import axios from 'axios'
import type { AudioQuery } from '../types'

const coeiroinkHost = 'http://127.0.0.1:50031'
const speaker = 0

export const getAudioQuery = async (text: string): Promise<AudioQuery | undefined> => {
  const audioQuery = axios.post(
    `${coeiroinkHost}/audio_query?text=${encodeURIComponent(text)}&speaker=${speaker}`
    , {
      headers: {
        Accept: 'application/json'
      }
    }
  ).then((res) => {
    return res.data as AudioQuery
  }).catch((err) => {
    console.error(err)
    return undefined
  })

  return await audioQuery
}

export const synthesisVoice = async (audioQuery: AudioQuery): Promise<string | undefined> => {
  const voice = await axios.post(
    `${coeiroinkHost}/synthesis?speaker=${speaker}`, audioQuery
    , {
      responseType: 'arraybuffer',
      headers: {
        'Content-Type': 'application/json',
        accept: 'audio/wav'
      }
    }
  ).then((res) => {
    return res.data as string
  }).catch((err) => {
    console.error(err)
    return undefined
  })

  return voice
}

合成した音声の再生

node-speaker を使用して実装しました。

import { PassThrough } from 'stream'
import Speaker from 'speaker'

export const playAudio = (audio: Buffer, sampleRate: number): void => {
  const speaker = new Speaker({
    channels: 1,
    bitDepth: 16,
    sampleRate
  })
  const bufferStream = new PassThrough()
  bufferStream.end(audio)
  bufferStream.pipe(speaker)
}

デモのスクリーンショット

「おはようございます」と発声したときの例

まとめ

この記事では、Whisper API と COEIROINK でボイスチェンジャーを作る方法について解説しました。
実装したコードは GitHub に置いてありますので、良ければ参考にしてください。

はじめに

事前準備

処理の流れ

マイクから入力された音声の認識

Whisper API による音声の解析

COEIROINKを使用した音声の合成

合成した音声の再生

デモのスクリーンショット

まとめ

参考

Discussion