🔉

AssemblyAIをサクッと試す

2023/12/16に公開

やりたいこと

1.Youtubeの動画を文字起こしして、ファイルに書き込む。
2.文字起こししたファイルの内容について質問し、回答を得ることができるようにする。

今回は、以下の孫正義とデイビット・ルベンシュタイン(カーライル・グループの共同創設者)の対談動画を使用します。
https://www.youtube.com/watch?v=yjbFXmSBh5E

使う技術

AssemblyAI

音声をテキストに変換するAIモデル
https://www.assemblyai.com/

LlamaIndex

大規模言語モデル(LLM)を使ったアプリケーション構築を支援するデータフレームワーク。
https://www.llamaindex.ai/

Replicate

機械学習モデルをホストし、デプロイし、実行するためのプラットフォーム。
今回は、Replicateが公開しているAPI経由でllama-2-7b-chatを使用しています。
https://replicate.com/meta/llama-2-7b-chat

前準備

API KEY取得

今回は、AssemblyAIとReplicateのAPI KEYを使用するので、サイトに登録して取得します。

コードを書いていきます

1.Youtubeの動画を文字起こしして、ファイルに書き込む。

必要なライブラリをインポート、対象のYoutubeのURLを決める

import yt_dlp
import assemblyai as aai

URL = 'https://www.youtube.com/watch?v=yjbFXmSBh5E'

Youtubeの字幕(最高音質)のみを取得。

with yt_dlp.YoutubeDL() as ydl:
    info = ydl.extract_info(URL, download=False)

for format in info["formats"][::-1]:
  if format["resolution"] == "audio only" and format["ext"] == "m4a":
    url = format["url"]
    break

取得した音声を元にAssemblyAIを用いてファイルに文字起こし

今回は話者ごとに発言を分けるために、speaker_labels=True。
話者は2人だとわかっているので、speakers_expected=2。

aai.settings.api_key = "Your API KEY"

config = aai.TranscriptionConfig(speaker_labels=True, speakers_expected=2)
transcript = aai.Transcriber().transcribe(url, config)

with open('transcript.txt', 'w') as f:
    for utterance in transcript.utterances:
        speaker = utterance.speaker
        text = utterance.text
        f.write(f"Speaker{speaker}: {text}\n")

実際に作成できたファイル。

transcript.txt
SpeakerA: You are clearly one of the world's most successful technology investors and one of the world's most successful businessmen. Let me start by asking you about a fund that you are now raising, the vision fund. It's supposed to be a fund of $100 billion.
SpeakerB: Yes.
SpeakerA: Now that would be the biggest fund ever raised. So when you told people you were going to raise a hundred billion dollar fund, did they tell you you were a little crazy?
SpeakerB: Well, some people said you had a.
SpeakerA: Meeting with a man who was the deputy crown prince of Saudi Arabia, who's now the crown prince of Saudi Arabia. And as I understand the story, you went in and in 1 hour, you convinced him to invest $45 billion.
SpeakerB: No, it's not true. Okay, 45 minutes. $45 billion.
SpeakerA: Okay, sorry. Okay, I apologize. In other words, if you had had.
SpeakerB: $1 billion per minute, what could you.
SpeakerA: Have said that was that persuasive to get 45 billion in one meeting?
SpeakerB: Well, actually, I said, you came to Tokyo as the first time, I want to give you a gift. I want to give you mass a gift, Tokyo gift, a trillion dollar gift. And he opened up his eyes and said, okay, now it's interesting.
SpeakerA: All right.
SpeakerB: So I woke up him and said, here is how I can give you a trillion dollar gift. You invest $100 billion to my fund, I give you a trillion dollars.
SpeakerA: But what is it that you told people? What was the vision that you actually gave them?
SpeakerB: So one vision, which is singularity. Singularity is the concept that the computing power, computers, artificial intelligence, surpass mankind's brains.
SpeakerA: The singularity is the concept. The word means. That is the point at which a computer becomes smarter than a human brain.
SpeakerB: Yes. Today already, computer is smarter than mankind. For chess or go or weather forecast. To some expert systems, computer is already smarter. But in 30 years, most of the subject that we are thinking they will be smarter than us, that's my belief.

2.文字起こししたファイルの内容について質問し、回答を得ることができるようにする。

必要なライブラリをインポート、API KEYの設定

import os
from llama_index.llms import Replicate
from llama_index import VectorStoreIndex, ServiceContext, SimpleDirectoryReader
os.environ["REPLICATE_API_TOKEN"] = "YOUR API KEY"

llama2_7b_chat = "meta/llama-2-7b-chat:8e6975e5ed6174911a6ff3d60540dfd4844201974602551e10e9e87ab143d81e"

ReplicateのAPIを使用してLlama2モデルを取得。

llm = Replicate(
    model=llama2_7b_chat,
    temperature=0.9,
    additional_kwargs={"top_p": 1, "max_new_tokens": 200},
)

SimpleDirectoryReaderを使用してtranscript.txtファイルを読み込む。

loader = SimpleDirectoryReader(input_files=["./transcript.txt"])
documents = loader.load_data()

文書からインデックスを構築し、これをもとにクエリエンジンを作成。サービスコンテキストには先ほどのLlama2モデルが設定される。

service_context = ServiceContext.from_defaults(llm=llm,)
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine(service_context=service_context)

クエリを実行して回答を表示。

response = query_engine.query("What is the relationships between SpeakerA and SpeakerB?")
print(response)

実際に得られた回答は以下です。概ねあっているように思われます。
Based on the provided context, SpeakerA and SpeakerB have a professional relationship. They are both involved in the technology industry, with SpeakerB being a successful businessman and investor, and SpeakerA being a journalist or interviewer who is conducting an interview with SpeakerB. The conversation suggests that SpeakerB has convinced SpeakerA of his investment ideas, particularly regarding a fund he is raising called the Vision Fund.

参考

https://www.assemblyai.com/blog/built-with-assemblyai-youtube-transcripts/
https://docs.llamaindex.ai/en/stable/examples/data_connectors/simple_directory_reader.html

おまけ(完成形コード)

1.Youtubeの動画を文字起こしして、ファイルに書き込む。

# YouTube動画から音声トラックを抽出し、AssemblyAIを使って文字起こしを行う

# 使用ライブラリのインポート
import yt_dlp 
import assemblyai as aai

# 文字起こし対象のYouTube動画URL
URL = 'https://www.youtube.com/watch?v=yjbFXmSBh5E'  

# yt_dlpを使って音声トラックを取得
with yt_dlp.YoutubeDL() as ydl:
    info = ydl.extract_info(URL, download=False) 
    for format in info["formats"][::-1]:
        if format["resolution"] == "audio only" and format["ext"] == "m4a":
            url = format["url"]
            break

# AssemblyAIの設定と文字起こし実行()
aai.settings.api_key = "YOUR API KEY" 
config = aai.TranscriptionConfig(speaker_labels=True, speakers_expected=2)
transcript = aai.Transcriber().transcribe(url, config)

# 文字起こし結果をファイルに保存
with open('transcript.txt', 'w') as f:
    for utterance in transcript.utterances: 
        speaker = utterance.speaker
        text = utterance.text 
        f.write(f"Speaker{speaker}: {text}\n")

2.文字起こししたファイルの内容について質問し、回答を得ることができるようにする。

# LLAMAで文書の内容を理解し、質問に回答するシステム構築

# ライブラリのインポート  
import os  
from llama_index.llms import Replicate   
from llama_index import VectorStoreIndex, ServiceContext, SimpleDirectoryReader

# LLAMAの設定とラップ
os.environ["REPLICATE_API_TOKEN"] = "YOUR API KEY" # APIトークン 
llama2_7b_chat = "meta/llama-2-7b-chat:8e6975e5ed6174911a6ff3d60540dfd4844201974602551e10e9e87ab143d81e"  
llm = Replicate(model=llama2_7b_chat, temperature=0.9, additional_kwargs={"top_p": 1, "max_new_tokens": 200})  

# 文書の読み込み
loader = SimpleDirectoryReader(input_files=["./transcript.txt"])  
documents = loader.load_data()  

# インデックスとクエリエンジンの構築
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine(service_context=service_context)  

# 質問と回答
response = query_engine.query("What is the relationships between SpeakerA and SpeakerB?") 
print(response)

Discussion