✨

【Nextjs】【TypeScript】テキストを音声に変換するアプリを構築する（動作確認編）

2023/08/22に公開

Next.js

まえがき

Google CloudのText-to-Speech AIを使ってテキストを音声に変換するアプリを複数回に渡って構築します。

今回はNextjsのプロジェクトを新規作成して、ドキュメントのサンプルコードを動作させるところまで進めます。

実は2021年09月からYouTubeチャンネルを運営するために、VS Codeの拡張機能Jupyterを使ってPythonで構築していました。

最終的にはPythonで構築した機能をnextjsで実現することが目標です。
以下にPythonコードを見ていだだけます。

Pythonコード

# %% [markdown]
# すべてのテキストファイルを行単位で分割してリストとして取得後
#
# まず行単位で音声ファイルを作成して
#
# 音声ファイルを全部結合する

# %%
# !pip3 install --upgrade pip
# !pip3 install pydub
# !pip3 install --upgrade google-cloud-texttospeech

# %%
import glob, os
from pydub import AudioSegment
from google.cloud import texttospeech
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = 'secret.json'

# %%
start_tag = '<speak>'
end_tag = '</speak>'
break_time = '<break time="0.5s"/>'
all_text_file = glob.glob('*.txt')
# all_text_file = glob.glob('genkou/step2-open.txt')
all_text_file.sort()

# %%
def createvoice(text, filename, name, lang="japanese", gender="default"):
	language_code={
		"japanese":"ja-JP",
		"english":"en_US"
	}
	ssml_gender={
		"defalut":texttospeech.SsmlVoiceGender.SSML_VOICE_GENDER_UNSPECIFIED,
		"male":texttospeech.SsmlVoiceGender.MALE,
		"female":texttospeech.SsmlVoiceGender.FEMALE,
		"neutral":texttospeech.SsmlVoiceGender.NEUTRAL
	}
	wavenet_en = {
		"maleA":"en-US-Wavenet-A",
		"maleB":"en-US-Wavenet-B",
		"femaleC":"en-US-Wavenet-C",
		"maleD":"en-US-Wavenet-D",
	}
	wavenet_ja = {
		"femaleA":"ja-JP-Wavenet-A",
		"femaleB":"ja-JP-Wavenet-B",
		"maleC":"ja-JP-Wavenet-C",
		"maleD":"ja-JP-Wavenet-D"
	}
	neural2 = {
		"femaleB":"ja-JP-Neural2-B",
		"maleC":"ja-JP-Neural2-C",
		"maleD":"ja-JP-Neural2-D"
	}
	client = texttospeech.TextToSpeechClient()
	synthesis_input = texttospeech.SynthesisInput(ssml=text)
	voice = texttospeech.VoiceSelectionParams(
			language_code=language_code[lang],
			# ssml_gender=ssml_gender[gender],
			# name=wavenet_ja[name] # wavenet
			name=neural2[name] # Neural2
	)
	audio_config = texttospeech.AudioConfig(
			audio_encoding=texttospeech.AudioEncoding.MP3
	)
	response = client.synthesize_speech(
			input=synthesis_input, voice=voice, audio_config=audio_config
	)
	with open(filename, "wb") as out:
			out.write(response.audio_content)
			print(f'audio content written to file {filename}')

# %%
for text_file in all_text_file:
	f = open(text_file, 'r')
	file_name = text_file[:-4]
	# print(f'{file_name}')

	sentences = f.readlines()
	for i, sentence in enumerate(sentences, 1):
		# 空行かどうかを確認
		if sentence.strip():
			print(sentence.rstrip('\n'))
			tag_sentence = start_tag + break_time + sentence + end_tag
			# print(tag_sentence.rstrip('\n'))
			createvoice(text=tag_sentence, filename=f'{file_name}s{str(i).zfill(4)}.mp3', name="femaleB")

	f.close()

	all_sound_file = glob.glob(f"{file_name}s*.mp3")
	all_sound_file.sort()

	all_sound = AudioSegment.empty()

	for sound_file in all_sound_file:
		print(sound_file)

		sound = AudioSegment.from_file(sound_file, "mp3")
		all_sound += sound
	all_sound.export(f"ALL_{file_name}.mp3", format="mp3")

開発環境

プロジェクトを新規作成

まずは新規プロジェクトを作成します。

上記のHPを参考に以下のコマンドを実行します。
ここではプロジェクト名をtext-recordingとします。

npx create-next-app@latest text-recording

実行後、聞かれる質問はそのまま答えてください。
プロジェクトが作成されたら、ディレクトリに移動します。
移動したら一度立ち上げて見ます。

cd text-recording
npm run dev

サーバーを立ち上げたら、http://localhost:3000に移動します。

プロジェクトのコードをスッキリしておく

まずはapp/page.tsxから不要なコードを削除します。
mainタグ内のコードは不要なのですべて削除します。

app/page.tsx

export default function Home() {
  return (
    <main>
      <h1 className="text-3xl">Text Recording App</h1>
    </main>
  );
}

またapp/globals.cssはTailwind CSS以外は削除します。

app/globals.css

@tailwind base;
@tailwind components;
@tailwind utilities;

`app/layout.tsx`を編集

app/layout.tsxで設定しているmetadataとフォントの設定を変更します。

metadata の内容を変更

metadataのtitleとdescriptionを変更します。

app/layout.tsx

  export const metadata = {
-   title: "Create Next App",
-   description: "Generated by create next app",
+   title: "Text Recording App",
+   description: "Google Cloud Text-to-Speech App",
  };

フォントを Noto Sans Japanese に変更

フォントをGoogle FontのNoto Sans Japaneseを使用します。

フォントの設定には3つの手順があります。

フォントのインポート

まずは使用するフォントをnext/font/googleからインポートします。

app/layout.tsx

- import { Inter } from "next/font/google";
+ import { Noto_Sans_JP } from "next/font/google";

`subsets`を指定

次にsubsetsを指定します。

app/layout.tsx

- const inter = Inter({ subsets: ["latin"] });
+ const notoSansJP = Noto_Sans_JP({ subsets: ["latin"] });

subsets以外にもstyleやweightを指定できます。
詳しい引数は以下のページを参考にしてください。

JSX内を変更

最後にJSX内を以下のように変更します。

app/layout.tsx

  export default function RootLayout({
    children,
  }: {
    children: React.ReactNode;
  }) {
    return (
-     <html lang="en">
-       <body className={inter.className}>{children}</body>
+     <html lang="ja" className={notoSansJP.className}>
+       <body>{children}</body>
      </html>
    );
  }

フォントの設定を詳しく知りたい方は公式ドキュメントをご覧ください

ドキュメントのコードを参考に動作を確認

クライアントライブラリをインストール

まずはクライアントライブラリをインストールします。
以下のコマンドを実行してください。

npm install --save @google-cloud/text-to-speech

サンプルコードの内容を確認

まずはサンプルコードの内容を見ます。

utils/textRecording.ts

const textToSpeech = require("@google-cloud/text-to-speech");

const fs = require("fs");
const util = require("util");

const client = new textToSpeech.TextToSpeechClient();

async function quickStart() {
  const text = "hello, world!";

  const request = {
    input: {text: text},
    voice: {languageCode: "en-US", ssmlGender: "NEUTRAL"},
    audioConfig: {audioEncoding: "MP3"},
  };

  const [response] = await client.synthesizeSpeech(request);
  const writeFile = util.promisify(fs.writeFile);
  await writeFile("output.mp3", response.audioContent, "binary");
  console.log("Audio content written to file: output.mp3");
}
quickStart();

サンプルコードでは音声変換に必要な要素が4つあります。
具体的にはコードの中盤にある変数requestのオブジェクトを見ます。

utils/textRecording.ts

const request = {
  input: {text: text},
  voice: {languageCode: "en-US", ssmlGender: "NEUTRAL"},
  audioConfig: {audioEncoding: "MP3"},
};

次に説明しますが、カーソルを合わせるとヒントが表示されるので参照してください。

変数`text`

変数textには音声に変換する文章を代入します。

変数`languageCode`

変数languageCodeには国コードを代入します。

変数`ssmlGender`

変数ssmlGenderには性別を代入します。
代入できる要素は以下の6つです。

"SSML_VOICE_GENDER_UNSPECIFIED"
"MALE"
"FEMALE"
"NEUTRAL"
null
undefined

上記以外はエラーになります。

変数`audioEncoding`

変数audioEncodingには音声のファイル形式を代入します。
代入できる要素は以下の8つです。

"AUDIO_ENCODING_UNSPECIFIED"
"LINEAR16"
"MP3"
"OGG_OPUS"
"MULAW"
"ALAW"
null
undefined

上記以外はエラーになります。
私の用途では、基本的には"MP3"を扱うので変更しません。

サンプルコードのファイルを作成

ディレクトリutilsを作成してファイルquickStart.tsに新規作成します。
次にサンプルコードをquickStart.tsにコピー＆ペーストします。

サンプルコードを変更する

次はサンプルコードを変更します。

インポート

textToSpeech、fs、utilがrequire()を使ってインポートされています。
ただTypeScriptではimportでインポートするようなので、以下のようにインポートします。

utils/textRecording.ts

- const textToSpeech = require("@google-cloud/text-to-speech");
+ import * as textToSpeech from "@google-cloud/text-to-speech";

- const fs = require("fs");
- const util = require("util");
+ import fs from "fs";
+ import util from "util";

認証情報を追加

認証情報を追加します。
今回は以前Google Cloudで作成したJSON形式の認証情報を使用します。
ファイル名はsecret.jsonとします。
まずは変数optionにオブジェクトでkeyFilenameの値としてsecret.jsonを渡します。
インポートした直後に以下のコードを追加します。

utils/textRecording.ts

+ const option = {
+   keyFilename: "secret.json",
+ };

クラスTextToSpeechClientのインスタンスに変数optionを渡します。
以下のようにコードを変更します。

utils/textRecording.ts

- const client = new textToSpeech.TextToSpeechClient();
+ const client = new textToSpeech.TextToSpeechClient(option);

以下のページに詳細が載っているので参考にしてください。

型指定

変数responseと変数requestにエラーが出ています。
変数responseのエラーは変数requestのエラーが要因になっている可能性が高いので、変数requestのエラーを解消するためにエラー文を見てみます。
エラー文は以下の通りです。

エラー文の中に「引数を型 'ISynthesizeSpeechRequest' のパラメーターに割り当てることはできません。」とあります。
これは変数requestの型とメソッドsynthesizeSpeechで必要な型が一致していないことを表しています。
そこでメソッドsynthesizeSpeechに引数とその型を調べます。
調べ方は簡単でメソッドsynthesizeSpeechにカーソルを合わせるだけです。
すると、メソッドsynthesizeSpeechの説明が表示されます。

案の定requestの型も表示されています。
textToSpeech.protos.google.cloud.texttospeech.v1.ISynthesizeSpeechRequestのようです。
型がわかったので変数requestに型を与えます。

utils/textRecording.ts

- const request = {
+ const request: textToSpeech.protos.google.cloud.texttospeech.v1.ISynthesizeSpeechRequest = {
    input: { text: text },
    // Select the language and SSML voice gender (optional)
    voice: { languageCode: "en-US", ssmlGender: "NEUTRAL" },
    // select the type of audio encoding
    audioConfig: { audioEncoding: "MP3" },
  };

`Buffer`を追加

response.audioContentにエラーが出ています。
エラー文は以下の通りです。

これは変数response.audioContentの型とメソッドwriteFileで必要な型が一致していないことを表しています。
メソッドwriteFileにカーソルを合わせて説明を見ます。

ただ、ArrayBufferViewが大きなヒントのようですが、これ以上調べてもわかりませんでした。
そこで、ChatGPTに聞いてみたら、あっさりわかりました。
response.audioContentはバッファー型であるため、型アサーションを使用してBufferとして指定します。
具体的にはresponse.audioContentにas Bufferをつけるだけです。
するとエラーが解消されました。

utils/textRecording.ts

- await writeFile("output.mp3", response.audioContent, "binary");
+ await writeFile("output.mp3", response.audioContent as Buffer, "binary");

ここまでのtextRecording.tsのコードは以下の通りです。

utils/textRecording.ts

import * as textToSpeech from "@google-cloud/text-to-speech";

import fs from "fs";
import util from "util";

const option = {
  keyFilename: "secret.json",
};

export async function quickStart() {
  const client = new textToSpeech.TextToSpeechClient(option);
  const text = "hello, world!";

  const request: textToSpeech.protos.google.cloud.texttospeech.v1.ISynthesizeSpeechRequest =
    {
      input: { text: text },
      voice: { languageCode: "en-US", ssmlGender: "NEUTRAL" },
      audioConfig: { audioEncoding: "MP3" },
    };

  const [response] = await client.synthesizeSpeech(request);
  const writeFile = util.promisify(fs.writeFile);
  await writeFile("output.mp3", response.audioContent as Buffer, "binary");
  console.log("Audio content written to file: output.mp3");
}

動作を確認

関数を`app/page.tsx`に追加

関数quickStart()をapp/page.tsxに追加します。
追加を忘れると実行されません。

app/page.tsx

  export default function Home() {
+   quickStart();
    return (
      <main>
        <h1 className="text-3xl">Text Recording App</h1>
      </main>
    );
  }

また、utils/textRecording.tsのquickStart()は削除してください。

utils/textRecording.ts

- quickStart();

動作を確認

実行してサーバーを立ち上げます。

npm run dev

サーバーを立ち上げたら、http://localhost:3000に移動します。
プロジェクトのディレクトリ内にoutput.mp3が作成されたら動作確認完了です。
再生すると「hello, world!」と流れるはずです。

次回

次回は入力画面（UI）作成と関数quickStart()に値を渡すために引数を追加します。
そしてアプリの完成まで終わらせます。

スマホアプリ「ひとこと投資メモ」シリーズをリリース

記事とは関係ないことですが、最後にお知らせです。
Flutter学習のアウトプットの一環として「日本株ひとこと投資メモ」「米国株ひとこと投資メモ」を公開しています。

簡単に使えるライトな投資メモアプリです。
iPhone、Android両方に対応しています。
みなさんの投資ライフに少しでも活用していただきれば幸いです。
以下のリンクからそれぞれのサイトに移動してダウンロードをお願いします。
https://jpstockminimemo.arafipro.com/
https://usstockminimemo.arafipro.com/

GitHubで編集を提案

まえがき

開発環境

プロジェクトを新規作成

プロジェクトのコードをスッキリしておく

app/layout.tsxを編集

metadata の内容を変更

フォントを Noto Sans Japanese に変更

フォントのインポート

subsetsを指定

JSX内を変更

ドキュメントのコードを参考に動作を確認

クライアントライブラリをインストール

サンプルコードの内容を確認

変数text

変数languageCode

変数ssmlGender

変数audioEncoding

サンプルコードのファイルを作成

サンプルコードを変更する

インポート

認証情報を追加

型指定

Bufferを追加

動作を確認

関数をapp/page.tsxに追加

動作を確認

次回

スマホアプリ「ひとこと投資メモ」シリーズをリリース

Discussion

`app/layout.tsx`を編集

`subsets`を指定

変数`text`

変数`languageCode`

変数`ssmlGender`

変数`audioEncoding`

`Buffer`を追加

関数を`app/page.tsx`に追加