🔉

VoicyやYoutubeのリンクを入力して、OpenAI APIを使用して内容を要約するアプリを作ってみよう！

2023/07/03に公開

はじめに

初めまして。
CryptoGamesというブロックチェーンゲーム企業でエンジニアをしている cardene（かるでね） です！
スマートコントラクトを書いたり、フロントエンド・バックエンド・インフラと幅広く触れています。

https://cryptogames.co.jp/

代表的なゲームはクリプトスペルズというブロックチェーンゲームです。

https://cryptospells.jp/

今回の記事では、OpenAI APIを使用して、「VoicyやYoutubeなどのリンクを入力すると、その内容を要約してくれるアプリ」を作成する手順を紹介していきます。

実際に僕が作ったものは以下になります（サイドメニューから「Audio Summary」を選択してもらえれば使用できます！）。

https://ai-app.streamlit.app/

使い方やソースコードは以下のGithub内に記載してあります。

実際に動かすことはできますが、セキュリティの観点から使用後は生成したOpenAI API Keyは削除しておくことをお勧めします。

記事の中ではPythonのフレームワークである「Streamlit」を使用します。

では早速みていきましょう！

https://chaldene.net/audio-summary

リンクから音声ファイル取得

まずはVoicyやYoutubeのリンクから音声ファイルを取得していきましょう！

モジュールをインストールします。

terminal

$ pip install yt_dlp

ファイルを作成して、以下のコードをコピペしてください。

download_audio.py

from yt_dlp import YoutubeDL


def get_audio_file(url: str):
    ydl_opts = {
        'outtmpl': './audio.%(ext)s',
        'format': 'bestaudio/best',
        'postprocessors': [{
            'key': 'FFmpegExtractAudio',
            'preferredcodec': 'mp3',
            'preferredquality': '192'
        }],
    }
    ydl = YoutubeDL(ydl_opts)
    ydl.download([url])


def main():
    get_audio_file("https://voicy.jp/channel/2627/559190")


if __name__ == '__main__':
    main()

今回はVoicyのパーソナリティである「中島聡」さんの放送をします。
では実行してみましょう。

terminal

$ python download_audio.py

[voicy] Extracting URL: https://voicy.jp/channel/2627/559190
[voicy] 559190: Downloading JSON metadata
[download] Downloading multi_video: SlashGPTをオープンソース化しました
[voicy] Playlist SlashGPTをオープンソース化しました: Downloading 2 items of 2
[download] Downloading item 1 of 2
[info] 1934043: Downloading 1 format(s): hls
[hlsnative] Downloading m3u8 manifest
[hlsnative] Total fragments: 2
[download] Destination: ./audio.m4a
[download] 100% of  103.18KiB in 00:00:00 at 1.08MiB/s
[FixupM3u8] Fixing MPEG-TS in MP4 container of "./audio.m4a"
[ExtractAudio] Destination: ./audio.mp3
Deleting original file ./audio.m4a (pass -k to keep)
[download] Downloading item 2 of 2
[info] 1314556: Downloading 1 format(s): hls
[hlsnative] Downloading m3u8 manifest
[hlsnative] Total fragments: 45
[download] Destination: ./audio.m4a
[download] 100% of    3.92MiB in 00:00:02 at 1.67MiB/s
[FixupM3u8] Fixing MPEG-TS in MP4 container of "./audio.m4a"
[ExtractAudio] Destination: ./audio.mp3
Deleting original file ./audio.m4a (pass -k to keep)
[download] Finished downloading playlist: SlashGPTをオープンソース化しました

上記のように出力されていれば成功です！

音声をテキストに変換

次に先ほど取得した音声ファイルをテキストに変換していきます。

そしてこの章で紹介する「音声をテキストに変換」と「テキストを要約」については以下の記事を参考に実装しています。

以下のコマンドを実行して.envというファイルを作成してください。

terminal

$ touch .env

作成したらファイルを開き、以下を貼り付けてください。

.env

OPENAI_API_KEY="<Open AI API Key>"

「<Open AI API Key>」の部分にOpenAIのAPI Keyを入力してください。

次に以下のモジュールをインストールしてください。

terminal

$ pip install openai
$ pip install python-dotenv

ファイルを作成して以下をコピペしてください。

audio_to_text.py

import os
import openai
from dotenv import load_dotenv

load_dotenv()
openai.api_key = os.getenv('OPENAI_API_KEY')


def get_audio_text(file_path: str) -> str:
    audio_file = open(file_path, "rb")
    transcript = openai.Audio.transcribe("whisper-1", audio_file)
    return transcript["text"]


def main():
    file_path = "./audio.mp3"
    audio_text = get_audio_text(file_path)
    print(audio_text)


if __name__ == "__main__":
    main()

上記を実行します。

terminal

$ python audio_to_text.py
こんにちは中島です 日本に出張したりしててちょっと...

上記のようにテキストが出力されれば成功です！

テキストを要約

この章でも前章で紹介した記事を参考にしています。

まずは必要なモジュールを追加していきましょう。

terminal

$ pip install llama-index

ファイルを作成して以下をコピペしてください。

summary_text.py

import os
import openai
from llama_index import StorageContext, StringIterableReader, GPTVectorStoreIndex, load_index_from_storage
from dotenv import load_dotenv

load_dotenv()
openai.api_key = os.getenv('OPENAI_API_KEY')
STORAGE_PATH = "./storage"


def get_summary_text(audio_text: str, summary_prompt: str) -> str:
    if not os.path.exists(STORAGE_PATH):
        os.makedirs(STORAGE_PATH)
    try:
        storage_context = StorageContext.from_defaults(persist_dir=STORAGE_PATH)
        vector_index = load_index_from_storage(storage_context)
    except Exception:
        documents = StringIterableReader().load_data(texts=[audio_text])
        vector_index = GPTVectorStoreIndex.from_documents(documents)
        vector_index.storage_context.persist(persist_dir=STORAGE_PATH)

    query_engine = vector_index.as_query_engine()
    response = query_engine.query(summary_prompt)

    return response.response


def main():
    text = """
    こんにちは中島です 日本に出張したりしててちょっと...
    """
    prompt = "テキストの内容を5~10の項目に分けて要約してください。"
    summary_text = get_summary_text(text, prompt)
    print(f"summary_text: {summary_text}")


if __name__ == "__main__":
    main()

上記を実行します。

terminal

$ python summary_text.py
1. 自然言語のインターフェースを使うことで、複雑なシステムでも文脈を理解してメニューを探さなくてもやってくれるシステムができる。
2. ユーザーインターフェースの進化の意味で、一気に進んだという感じである。
...

上記のように項目に分けて出力されていれば成功です！

Streamlitでアプリ作成

では次にStreamlitを使用してここまでの処理をアプリにしてみましょう！

まずはモジュールをインストールします。

terminal

$ poetry add streamlit

次に2つのファイルを作成します。

lib.py

lib.py

from yt_dlp import YoutubeDL
import openai
import os
from llama_index import StorageContext, StringIterableReader, GPTVectorStoreIndex, load_index_from_storage

STORAGE_PATH = "./storage"


def get_audio_file(url: str):

    ydl_opts = {
        'outtmpl': './audio.%(ext)s',
        'format': 'bestaudio/best',
        'postprocessors': [{
            'key': 'FFmpegExtractAudio',
            'preferredcodec': 'mp3',
            'preferredquality': '192'
        }],
    }

    ydl = YoutubeDL(ydl_opts)
    ydl.download([url])


def get_audio_text(file_path: str) -> str:
    audio_file = open(file_path, "rb")
    transcript = openai.Audio.transcribe("whisper-1", audio_file)
    return transcript["text"]


def get_summary_text(audio_text: str, summary_prompt: str) -> str:
    if not os.path.exists(STORAGE_PATH):
        os.makedirs(STORAGE_PATH)
    try:
        storage_context = StorageContext.from_defaults(persist_dir=STORAGE_PATH)
        vector_index = load_index_from_storage(storage_context)
    except Exception:
        documents = StringIterableReader().load_data(texts=[audio_text])
        vector_index = GPTVectorStoreIndex.from_documents(documents)
        vector_index.storage_context.persist(persist_dir=STORAGE_PATH)

    query_engine = vector_index.as_query_engine()
    response = query_engine.query(summary_prompt)

    return response.response

app.py

app.py

import streamlit as st
import openai
import webbrowser
import os
import shutil

import lib

AUDIO_FILE_PATH = "./audio.mp3"
STORAGE_PATH = "./storage"


def main():
    summary_prompt = ''
    audio_text = ''
    api_flag = False

    st.title('Audio Summary')

    if st.button("Reset", key="reset"):
        if os.path.exists(STORAGE_PATH):
            shutil.rmtree(STORAGE_PATH)
        if os.path.exists(AUDIO_FILE_PATH):
            os.remove(AUDIO_FILE_PATH)

    open_ai_api_key: str = st.text_input(label="Open AI API Key (required)", type="password")

    if st.button("Register Open AI API Key", key="api_key") and open_ai_api_key:
        st.success("Open AI API Key is set.", icon="✅")
        if (api_flag is False):
            os.environ['OPENAI_API_KEY'] = open_ai_api_key
            openai.api_key = open_ai_api_key
        api_flag = True

    audio_url: str = st.text_input(label="Audio URL (required)")

    if audio_url:
        st.success("Audio URL is set.", icon="✅")
        if st.button("Get Audio File", key="get_audio_file"):
            with st.spinner(text="Get Audio Data..."):
                lib.get_audio_file(audio_url)

    summary_prompt = st.selectbox(
        label="Choice Prompt (required)",
        options=[
            "テキストの内容を300文字程度で要約してください。",
            "テキストの内容を5~10の項目に分けて要約してください",
            "テキストの内容を500文字程度にまとめてください。",
            "テキストの内容を1000文字程度にまとめてください。",
        ]
    )

    if audio_url and os.path.exists(AUDIO_FILE_PATH):
        st.success("Get Audio Data.", icon="✅")
        audio_file = open(AUDIO_FILE_PATH, 'rb')
        audio_bytes = audio_file.read()

        st.audio(audio_bytes, format='audio/mp3')
        st.download_button(label="Download Audio File", data=audio_bytes, file_name="audio.mp3", mime="audio/mp3")

        col1, col2 = st.columns(2)

        with col1:
            st.write("Summary Text Using OpenAI")
            summary_btn = st.button("Get Summary Text", key="get_summary_text")

        with col2:
            st.write("Open ChatGPT")
            check_open_chatgpt_btn = st.button("Open ChatGPT Button", key="check_open_chatgpt")

        if summary_btn:
            with st.spinner(text="Convert Audio Text..."):
                audio_text = lib.get_audio_text(AUDIO_FILE_PATH)
            if audio_text:
                st.success("Convert Audio Text.", icon="✅")
                with st.expander("Audio Text"):
                    st.write(audio_text)
                with st.spinner(text="Get Summary Text..."):
                    summary_text = lib.get_summary_text(audio_text, summary_prompt)
                if summary_text:
                    st.success("Get Summary Text.", icon="✅")
                    st.write(summary_text)

        if check_open_chatgpt_btn:
            with st.spinner(text="Convert Audio Text..."):
                audio_text = lib.get_audio_text(AUDIO_FILE_PATH)
            if audio_text:
                st.success("Convert Audio Text.", icon="✅")
                st.info("以下のコードをコピーして、以下のボタンを押したのち、開いたChatGPTページ内の入力欄に貼り付けてください。")
                st.write(f"{summary_prompt}\n{audio_text}")
            if st.button("Open ChatGPT", key="open_chatgpt"):
                search_url = 'https://chat.openai.com/'
                with st.spinner(text="Open ChatGPT..."):
                    webbrowser.open(search_url)


if __name__ == "__main__":
    main()

では実行します。

terminal

$ streamlit run app.p
  You can now view your Streamlit app in your browser.

  Local URL: http://localhost:8501
  Network URL: http://192.168.1.2:8501

上記を実行するとブラウザでページが自動で開くと思いますが、もし開かない場合は出力されているURLにアクセスしてください。

最後に

今回の記事では「VoicyやYoutubeなどのリンクを入力すると、その内容を要約してくれるアプリ」の作成を解説してきました。

より詳しい解説が欲しい方、作成したアプリをデプロイしたい方は以下の記事を参考にしてください。

https://chaldene.net/audio-summary

はじめに

リンクから音声ファイル取得

音声をテキストに変換

テキストを要約

Streamlitでアプリ作成

最後に

Discussion