🌕
Web Speech API で音声認識を試してみよう(React Hooks 実装付き)
🗣️ Web Speech APIとは
Web Speech API は、ブラウザで音声処理を行うためのAPIで、
以下の2つのモジュールで構成されています。
-
SpeechRecognition(音声認識 / Speech-to-Text)
音声をリアルタイムでテキストに変換します。 -
SpeechSynthesis(音声合成 / Text-to-Speech)
テキストを音声として読み上げます。
今回は 文字起こし(SpeechRecognition) を使って実装を行います。
1.型定義
- まずはTypeScript用の型を定義しておく。
src/types/SpeechRecognition.ts
export interface SpeechRecognitionEvent extends Event {
results: SpeechRecognitionResultList;
}
export interface SpeechRecognitionResultList {
length: number;
item(index: number): SpeechRecognitionResult;
[index: number]: SpeechRecognitionResult;
}
export interface SpeechRecognitionResult {
length: number;
item(index: number): SpeechRecognitionAlternative;
[index: number]: SpeechRecognitionAlternative;
isFinal: boolean;
}
export interface SpeechRecognitionAlternative {
transcript: string;
confidence: number;
}
export interface SpeechRecognitionErrorEvent extends Event {
error: string;
message: string;
}
export interface SpeechRecognition extends EventTarget {
continuous: boolean;
interimResults: boolean;
lang: string;
start(): void;
stop(): void;
abort(): void;
onstart: ((this: SpeechRecognition, ev: Event) => void) | null;
onresult: ((this: SpeechRecognition, ev: SpeechRecognitionEvent) => void) | null;
onerror: ((this: SpeechRecognition, ev: SpeechRecognitionErrorEvent) => void) | null;
onend: ((this: SpeechRecognition, ev: Event) => void) | null;
}
/**
* useSpeechRecognition Hook 用の設定オプション
*/
export interface SpeechRecognitionOptions {
/** 継続的に音声認識を行うかどうか(デフォルト: true) */
continuous?: boolean;
/** 中間結果を取得するかどうか(デフォルト: true) */
interimResults?: boolean;
/** 認識する言語(デフォルト: "ja-JP") */
lang?: string;
/** 最大録音時間(ミリ秒): 経過後に自動停止 */
maxListeningMs?: number;
/** 無音が続いたときに停止するまでの時間(ミリ秒) */
silenceTimeoutMs?: number;
}
- グローバル宣言も追加する。(windowオブジェクトを拡張)
src/types/global.d.ts
declare global {
interface Window {
SpeechRecognition: new () => SpeechRecognition;
webkitSpeechRecognition: new () => SpeechRecognition;
}
}
2.useSpeechRecognition Hookの実装
- Web Speech APIを簡単に使えるようにするカスタムフックを作成する。
src/hooks/useSpeechRecognition.ts
"use client";
import { useState, useEffect, useCallback, useRef } from "react";
import type {
SpeechRecognition,
SpeechRecognitionEvent,
SpeechRecognitionErrorEvent,
SpeechRecognitionOptions,
} from "@/types/SpeechRecognition"; // ✅ 型をインポート
export const useSpeechRecognition = (options?: SpeechRecognitionOptions) => {
const [transcript, setTranscript] = useState("");
const [listening, setListening] = useState(false);
const [browserSupportsSpeechRecognition, setBrowserSupport] = useState(false);
const [status, setStatus] = useState<"idle" | "connecting" | "connected" | "error">("idle");
const [errorMessage, setErrorMessage] = useState<string>("");
const recognitionRef = useRef<SpeechRecognition | null>(null);
const maxTimerRef = useRef<number | null>(null);
const silenceTimerRef = useRef<number | null>(null);
const stableOptions = JSON.stringify(options ?? {});
useEffect(() => {
const SpeechRecognition = window.SpeechRecognition || window.webkitSpeechRecognition;
if (!SpeechRecognition) {
setBrowserSupport(false);
return;
}
setBrowserSupport(true);
recognitionRef.current = new SpeechRecognition();
const continuous = options?.continuous ?? true;
const interimResults = options?.interimResults ?? true;
const lang = options?.lang ?? "ja-JP";
const silenceTimeoutMs = options?.silenceTimeoutMs ?? 10 * 1000;
const maxListeningMs = options?.maxListeningMs ?? 60 * 1000;
recognitionRef.current.lang = lang;
recognitionRef.current.continuous = continuous;
recognitionRef.current.interimResults = interimResults;
const startSilenceTimer = () => {
if (!silenceTimeoutMs) return;
if (silenceTimerRef.current !== null) clearTimeout(silenceTimerRef.current);
silenceTimerRef.current = window.setTimeout(() => recognitionRef.current?.stop(), silenceTimeoutMs);
};
const startMaxTimer = () => {
if (!maxListeningMs) return;
if (maxTimerRef.current !== null) clearTimeout(maxTimerRef.current);
maxTimerRef.current = window.setTimeout(() => recognitionRef.current?.stop(), maxListeningMs);
};
recognitionRef.current.onstart = () => {
startMaxTimer();
startSilenceTimer();
setListening(true);
setStatus("connected");
setErrorMessage("");
};
recognitionRef.current.onresult = (event: SpeechRecognitionEvent) => {
let fullTranscript = "";
for (let i = 0; i < event.results.length; i++) {
fullTranscript += event.results[i][0].transcript;
}
setTranscript(fullTranscript);
startSilenceTimer();
};
recognitionRef.current.onend = () => {
clearTimeout(maxTimerRef.current ?? undefined);
clearTimeout(silenceTimerRef.current ?? undefined);
setListening(false);
setStatus("idle");
};
recognitionRef.current.onerror = (event: SpeechRecognitionErrorEvent) => {
console.error("Speech recognition error:", event.error);
setErrorMessage(event.message || event.error || "Speech recognition error");
setStatus("error");
setListening(false);
};
return () => recognitionRef.current?.abort();
}, [stableOptions]);
const startListening = useCallback(() => {
if (recognitionRef.current && !listening) {
setTranscript("");
setStatus("connecting");
setErrorMessage("");
try {
recognitionRef.current.start();
} catch {
setStatus("error");
setErrorMessage("Failed to start speech recognition");
}
}
}, [listening]);
const stopListening = useCallback(() => recognitionRef.current?.stop(), []);
const abortListening = useCallback(() => recognitionRef.current?.abort(), []);
const resetTranscript = useCallback(() => setTranscript(""), []);
return {
transcript,
listening,
browserSupportsSpeechRecognition,
startListening,
stopListening,
abortListening,
resetTranscript,
status,
errorMessage,
};
};
3.UI作成
- speechRecognitionを検証する為の簡易的なUI作成。
src/app/speechRecognition/page.tsx
"use client";
import { useState, useMemo } from "react";
import {
useSpeechRecognition,
} from "@/hooks/useSpeechRecognition";
import type { SpeechRecognitionOptions } from "@/types/SpeechRecognition";
export default function SpeechDemo() {
// UI から渡すオプション(必要最低限)
const [lang, setLang] = useState<SpeechRecognitionOptions["lang"]>("ja-JP");
const [silenceTimeoutMs, setSilenceTimeoutMs] = useState<number>(3000);
const [maxListeningMs, setMaxListeningMs] = useState<number>(60000);
const options = useMemo<SpeechRecognitionOptions>(() => ({
lang,
silenceTimeoutMs,
maxListeningMs,
continuous: true,
interimResults: true,
}), [lang, silenceTimeoutMs, maxListeningMs]);
const {
transcript,
listening,
browserSupportsSpeechRecognition,
startListening,
stopListening,
abortListening,
resetTranscript,
status,
errorMessage,
} = useSpeechRecognition(options);
const copy = async () => {
try {
await navigator.clipboard.writeText(transcript);
} catch {
// no-op
}
};
return (
<div className="mx-auto max-w-3xl space-y-6 p-4">
<header className="space-y-2">
<h1 className="text-2xl font-bold">🎙️ Web Speech API デモ</h1>
<p className="text-sm text-gray-500">
ブラウザだけで音声をテキスト化します(試験的API / Chrome推奨)
</p>
{!browserSupportsSpeechRecognition && (
<p className="rounded-md bg-red-50 p-3 text-sm text-red-700">
お使いのブラウザは Web Speech API(SpeechRecognition)をサポートしていません。
Chrome系ブラウザでお試しください。
</p>
)}
</header>
{/* 設定 */}
<section className="rounded-xl border p-4">
<h2 className="mb-3 font-semibold">設定</h2>
<div className="grid gap-4 sm:grid-cols-3">
<label className="block text-sm">
<span className="mb-1 block text-gray-600">言語</span>
<select
className="w-full rounded-lg border px-3 py-2"
value={lang}
onChange={(e) => setLang(e.target.value)}
>
<option value="ja-JP">日本語 (ja-JP)</option>
<option value="en-US">English (en-US)</option>
<option value="zh-CN">中文 (zh-CN)</option>
<option value="ko-KR">한국어 (ko-KR)</option>
</select>
</label>
<label className="block text-sm">
<span className="mb-1 block text-gray-600">無音タイムアウト (ms)</span>
<input
type="number"
min={0}
step={500}
className="w-full rounded-lg border px-3 py-2"
value={silenceTimeoutMs}
onChange={(e) => setSilenceTimeoutMs(Number(e.target.value) || 0)}
/>
</label>
<label className="block text-sm">
<span className="mb-1 block text-gray-600">最大録音時間 (ms)</span>
<input
type="number"
min={0}
step={1000}
className="w-full rounded-lg border px-3 py-2"
value={maxListeningMs}
onChange={(e) => setMaxListeningMs(Number(e.target.value) || 0)}
/>
</label>
</div>
</section>
{/* コントロール */}
<section className="rounded-xl border p-4">
<h2 className="mb-3 font-semibold">コントロール</h2>
<div className="flex flex-wrap items-center gap-2">
<button
onClick={startListening}
disabled={!browserSupportsSpeechRecognition || listening}
className="rounded-lg bg-black px-4 py-2 text-white disabled:opacity-50"
>
▶️ 開始
</button>
<button
onClick={stopListening}
disabled={!browserSupportsSpeechRecognition || !listening}
className="rounded-lg bg-gray-800 px-4 py-2 text-white disabled:opacity-50"
>
⏹ 停止
</button>
<button
onClick={abortListening}
disabled={!browserSupportsSpeechRecognition || !listening}
className="rounded-lg bg-gray-600 px-4 py-2 text-white disabled:opacity-50"
>
🛑 中断
</button>
<button
onClick={resetTranscript}
className="rounded-lg border px-4 py-2"
>
🧹 クリア
</button>
<button
onClick={copy}
disabled={!transcript}
className="rounded-lg border px-4 py-2 disabled:opacity-50"
>
📋 コピー
</button>
</div>
{/* ステータス表示 */}
<div className="mt-3 flex flex-wrap items-center gap-2 text-sm">
<span className="rounded-full border px-3 py-1">
状態: <strong>{status}</strong>
</span>
<span
className={
"rounded-full px-3 py-1 " +
(listening ? "bg-green-100 text-green-700" : "bg-gray-100 text-gray-600")
}
>
{listening ? "🎤 リスニング中" : "⏸ 待機中"}
</span>
<span className="text-gray-500">Lang: {lang}</span>
</div>
{errorMessage && (
<p className="mt-2 rounded-md bg-red-50 p-3 text-sm text-red-700">
{errorMessage}
</p>
)}
</section>
{/* 結果 */}
<section className="rounded-xl border p-4">
<h2 className="mb-3 font-semibold">結果(Transcript)</h2>
<textarea
className="h-40 w-full resize-none rounded-lg border p-3 font-mono text-sm"
value={transcript}
readOnly
placeholder="ここに文字起こし結果が表示されます…"
/>
</section>
<footer className="text-xs text-gray-500">
※ マイク権限が必要です。HTTPS での動作を推奨します(localhostは可)。
</footer>
</div>
);
}
4.検証
-
"開始"を押しマイク入力を開始。
- マイクが起動し、音声入力がリアルタイムで反映されている事を確認。
-
"停止"を押し入力を止める。
まとめ
- Web Speech API を使うことで、ブラウザだけで 音声認識 が実現可能
- 無音タイムアウト や 最大録音時間 の設定により、UX(ユーザー体験) が向上
- 現時点では Chrome 系ブラウザ での動作が最も安定しており、開発・検証にもおすすめ
📚 参考資料
Discussion