🧰

メルカリさんのエンジニア向け日英ボキャブラリーリストを英語で読み上げる

2020/10/10に公開

背景

Engineer Vocabulary List in Japanese/English エンジニア向け日英ボキャブラリーリストを見ててQuizletがあるなら、せっかくだし、Duoの復習用CDみたいに読み上げているのを聞きながらシャドーイングしたいなと思いました。

ざっくりやったこと

Azure Cognitive ServicesのSpeech Servicesというテキストを音声に変換してくれるサービスを利用して、上記GitHubにあったcsvファイルのEnglish Sentencesをwavファイルに変換しました。

手順

0. 事前準備

python3を実行できる環境があること
- python3環境に pandas ライブラリがインストールしてあること
- CSVのパーサーに利用しています
Azureアカウントがあること
- 今回多分課金処理が発生するような処理はないはず...

1. Azure Speech ServicesをAzure Portalから構築する

下記のMSドキュメントに従って構築します。

2. Azure Speech Servicesのサブスクリプションキーとロケーションを取得する

サブスクリプションキーって名前が悪いが、要するにアクセスキー。
ロケーションは、ロケーションごとにAzure Cognitive ServicesのAPIエンドポイントが異なるという仕様なので地味に重要です。

3. メルカリさんのリポジトリからcsvファイルを取得する

https://github.com/mercari/engineer-vocabulary-list/tree/master/csv の list_1.csv ~ list_5.csv を取得し、pythonが実行できる環境の同じフォルダに入れてください。

4. pythonスクリプトを実行して、csvファイルから音声ファイルを作成する

下記ソースコードを TTSSample.py としてcsvと同じフォルダに保存します
https://github.com/Azure-Samples/Cognitive-Speech-TTS/tree/master/Samples-Http/Python　をベースとして修正を加えています
書き換えなくてはいけないポイントがいくつかあります
- subscription_keyに先ほど取得したAzure Speech Serviceのキー情報を設定してください
- self.tts = df1.to_string(index=False) と書いてあるところを適当に dfX と書き換えることで任意のcsvファイルを読み込みます
- https://eastus から始まる、AzureのAPIを叩いているところが三か所あります。このeastus は先ほど取得したLocationの値になるので書き換えてください。忘れると下記のようにエラーになります。
- prosody.set('rate', '0.9') の値を変えることで読み上げスピードの調整ができます

'''
After you've set your subscription key, run this application from your working
directory with this command: python TTSSample.py
'''
import os, requests, time
import pandas as pd
from xml.etree import ElementTree

# This code is required for Python 2.7
try: input = raw_input
except NameError: pass

'''
If you prefer, you can hardcode your subscription key as a string and remove
the provided conditional statement. However, we do recommend using environment
variables to secure your subscription keys. The environment variable is
set to SPEECH_SERVICE_KEY in our sample.

For example:
subscription_key = "Your-Key-Goes-Here"
'''

# サブスクリプションって名前ですがAzure サブスクリプションとほとんど関係ないです
subscription_key = "Azure SpeechのKeys and Endpointのアクセスキーの値を設定してください"

'''
if 'SPEECH_SERVICE_KEY' in os.environ:
    subscription_key = os.environ['SPEECH_SERVICE_KEY']
else:
    print('Environment variable for your subscription key is not set.')
    exit()
'''

class TextToSpeech(object):
    def __init__(self, subscription_key):
        df1 = pd.read_csv('list_1.csv', skiprows=[0], usecols=[5])
        df2 = pd.read_csv('list_2.csv', skiprows=[0], usecols=[5])
        df3 = pd.read_csv('list_3.csv', skiprows=[0], usecols=[5])
        df4 = pd.read_csv('list_4.csv', skiprows=[0], usecols=[5])
        df5 = pd.read_csv('list_5.csv', skiprows=[0], usecols=[5])
        self.subscription_key = subscription_key
        # お好みのファイルを読み込んでください
        self.tts = df1.to_string(index=False) # + df2.to_string(index=False) + df3.to_string(index=False) + df4.to_string(index=False) + df5.to_string(index=False)
        self.timestr = time.strftime("%Y%m%d-%H%M")
        self.access_token = None

    '''
    The TTS endpoint requires an access token. This method exchanges your
    subscription key for an access token that is valid for ten minutes.
    '''
    def get_token(self):
        fetch_token_url = "https://eastus.api.cognitive.microsoft.com/sts/v1.0/issuetoken"
        headers = {
            'Ocp-Apim-Subscription-Key': self.subscription_key
        }
        response = requests.post(fetch_token_url, headers=headers)
        self.access_token = str(response.text)

    def save_audio(self):
        base_url = 'https://eastus.tts.speech.microsoft.com/'
        path = 'cognitiveservices/v1'
        constructed_url = base_url + path
        # 'X-Microsoft-OutputFormat'の値を 'audio-24khz-160kbitrate-mono-mp3' に変えるとmp3で保存できます。
        # その場合、下で呼び出しているopen()の拡張子を変えるのを忘れないでください。
        headers = {
            'Authorization': 'Bearer ' + self.access_token,
            'Content-Type': 'application/ssml+xml',
            'X-Microsoft-OutputFormat': 'riff-24khz-16bit-mono-pcm',
            'User-Agent': 'YOUR_RESOURCE_NAME'
        }
        xml_body = ElementTree.Element('speak', version='1.0')
        xml_body.set('{http://www.w3.org/XML/1998/namespace}lang', 'en-us')
        voice = ElementTree.SubElement(xml_body, 'voice')
        voice.set('{http://www.w3.org/XML/1998/namespace}lang', 'en-US')
        voice.set('name', 'en-US-Guy24kRUS') # Short name for 'Microsoft Server Speech Text to Speech Voice (en-US, Guy24KRUS)'
        prosody = ElementTree.SubElement(voice, 'prosody')
        # ここでスピード調整しているので遅いと思ったら早くしてください
        prosody.set('rate', '0.9')
        prosody.text = self.tts
        body = ElementTree.tostring(xml_body)

        response = requests.post(constructed_url, headers=headers, data=body)
        '''
        If a success response is returned, then the binary audio is written
        to file in your working directory. It is prefaced by sample and
        includes the date.
        '''
        if response.status_code == 200:
            with open('sample-' + self.timestr + '.wav', 'wb') as audio:
                audio.write(response.content)
                print("\nStatus code: " + str(response.status_code) + "\nYour TTS is ready for playback.\n")
        else:
            print("\nStatus code: " + str(response.status_code) + "\nSomething went wrong. Check your subscription key and headers.\n")
            print("Reason: " + str(response.reason) + "\n")

    def get_voices_list(self):
        base_url = 'https://eastus.tts.speech.microsoft.com/'
        path = 'cognitiveservices/voices/list'
        constructed_url = base_url + path
        headers = {
            'Authorization': 'Bearer ' + self.access_token,
        }
        response = requests.get(constructed_url, headers=headers)
        if response.status_code == 200:
            print("\nAvailable voices: \n" + response.text)
        else:
            print("\nStatus code: " + str(response.status_code) + "\nSomething went wrong. Check your subscription key and headers.\n")

if __name__ == "__main__":
    app = TextToSpeech(subscription_key)
    app.get_token()
    app.save_audio()
    # Get a list of voices https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/rest-text-to-speech#get-a-list-of-voices
    # app.get_voices_list()

python3 TTSSample.py コマンドを実行することで、音声ファイル sample-yyyyMMdd-HHmm.wav が同じフォルダに作成されます

以上で作業は終了です

お疲れ様でした。