🍎
open-notebooklmのソースコード解説

2024/10/06に公開
Gradio
open-notebooklmは、PDFファイルやウェブページのテキスト内容をもとに、ポッドキャスト形式の音声ファイルを生成するオープンソースプロジェクトです。オープンソースのAIモデル（Llama 3.1 405B、MeloTTS、Bark）を活用し、ユーザーが提供したコンテンツを魅力的な音声コンテンツに変換します。
この記事では、open-notebooklmのソースコードを詳しく解説し、コードブロックを多用して可読性を高めています。

 ディレクトリ構成├─ examples/
│  ├─ 1310.4546v1.pdf
├─ examples_cached/
├─ app.py
├─ constants.py
├─ prompts.py
├─ README.md
├─ requirements.txt
├─ schema.py
├─ utils.py

 ファイルごとの詳細解説
 1. app.py
メインのアプリケーションファイルで、Gradioを使用してWebインターフェースを構築しています。
import gradio as gr
from constants import (
    APP_TITLE,
    UI_DESCRIPTION,
    UI_INPUTS,
    UI_OUTPUTS,
    # 他の定数
)
from utils import generate_podcast

# Gradioインターフェースの設定
demo = gr.Interface(
    title=APP_TITLE,
    description=UI_DESCRIPTION,
    fn=generate_podcast,
    inputs=[
        gr.File(label=UI_INPUTS["file_upload"]["label"], ...),
        gr.Textbox(label=UI_INPUTS["url"]["label"], ...),
        # 他の入力項目
    ],
    outputs=[
        gr.Audio(label=UI_OUTPUTS["audio"]["label"], format=UI_OUTPUTS["audio"]["format"]),
        gr.Markdown(label=UI_OUTPUTS["transcript"]["label"]),
    ],
    # 他の設定
)

if __name__ == "__main__":
    demo.launch(show_api=UI_SHOW_API)
主な機能:
ユーザーからの入力（PDFファイル、URL、質問、トーン、長さ、言語、音声生成のオプション）を受け取ります。

generate_podcast関数を呼び出し、ポッドキャスト音声とトランスクリプトを生成します。
Gradioを使用してWebアプリケーションを起動します。

 2. constants.py
アプリケーション全体で使用される定数を定義しています。
APP_TITLE = "Open NotebookLM"
CHARACTER_LIMIT = 100_000

# エラーメッセージ
ERROR_MESSAGE_NO_INPUT = "Please provide at least one PDF file or a URL."
ERROR_MESSAGE_NOT_PDF = "The provided file is not a PDF. Please upload only PDF files."
# 他の定数

# Gradio関連の定数
UI_DESCRIPTION = """
<table style="border-collapse: collapse; border: none; padding: 20px;">
  <!-- HTML内容 -->
</table>
"""

UI_INPUTS = {
    "file_upload": {
        "label": "1. 📄 Upload your PDF(s)",
        "file_types": [".pdf"],
        "file_count": "multiple",
    },
    # 他の入力項目
}

UI_OUTPUTS = {
    "audio": {"label": "🔊 Podcast", "format": "mp3"},
    "transcript": {"label": "📜 Transcript"},
}

# 他の設定
主な機能:
アプリケーションのタイトルや説明文、エラーメッセージなどの文字列定数を定義。
GradioのUI設定（入力、出力、テーマ、例など）を定義。
外部API関連の定数やマッピング情報を定義。

 3. prompts.py
LLM（大規模言語モデル）へのプロンプトを定義しています。
SYSTEM_PROMPT = """
You are a world-class podcast producer tasked with transforming the provided input text into an engaging and informative podcast script. ...
"""

QUESTION_MODIFIER = "PLEASE ANSWER THE FOLLOWING QN:"
TONE_MODIFIER = "TONE: The tone of the podcast should be"
LANGUAGE_MODIFIER = "OUTPUT LANGUAGE <IMPORTANT>: The the podcast should be"

LENGTH_MODIFIERS = {
    "Short (1-2 min)": "Keep the podcast brief, around 1-2 minutes long.",
    "Medium (3-5 min)": "Aim for a moderate length, about 3-5 minutes.",
}
主な機能:
LLMに渡すプロンプトテンプレートを定義し、生成されるポッドキャストの品質を高めます。
ユーザーの入力に応じてプロンプトを動的に変更するための修飾子を定義。

 4. README.md
プロジェクトの概要、インストール方法、使用方法を記載しています。

 5. requirements.txt
プロジェクトで必要なPythonパッケージの一覧です。
gradio==4.44.0
pypdf==4.1.0
pydub==0.25.1
# 他のパッケージ
主な機能:

pip install -r requirements.txtで必要なパッケージを一括インストールできます。

 6. schema.py
データモデルを定義し、LLMからの出力を検証するために使用します。
from pydantic import BaseModel, Field
from typing import Literal, List

class DialogueItem(BaseModel):
    """単一の対話項目を表すクラス。"""
    speaker: Literal["Host (Jane)", "Guest"]
    text: str

class ShortDialogue(BaseModel):
    """短い対話を表すクラス。"""
    scratchpad: str
    name_of_guest: str
    dialogue: List[DialogueItem] = Field(
        ..., description="A list of dialogue items, typically between 11 to 17 items"
    )

class MediumDialogue(BaseModel):
    """中程度の長さの対話を表すクラス。"""
    scratchpad: str
    name_of_guest: str
    dialogue: List[DialogueItem] = Field(
        ..., description="A list of dialogue items, typically between 19 to 29 items"
    )
主な機能:
LLMからのJSON出力をPydanticモデルで検証し、データの整合性を保ちます。
対話の構造（話者、テキスト）を明確に定義。

 7. utils.py
各種ユーティリティ関数を定義しています。
from bark import generate_audio, preload_models
from gradio_client import Client
from openai import OpenAI
# 他のインポート

# モデルの事前ロード
preload_models()

def generate_script(
    system_prompt: str,
    input_text: str,
    output_model: Union[ShortDialogue, MediumDialogue],
) -> Union[ShortDialogue, MediumDialogue]:
    """LLMを使用して対話スクリプトを生成します。"""
    # 実装内容

def call_llm(system_prompt: str, text: str, dialogue_format: Any) -> Any:
    """LLMにプロンプトを渡して応答を取得します。"""
    # 実装内容

def parse_url(url: str) -> str:
    """指定されたURLのテキストコンテンツを取得します。"""
    # 実装内容

def generate_podcast_audio(
    text: str, speaker: str, language: str, use_advanced_audio: bool, random_voice_number: int
) -> str:
    """テキストを音声に変換します。"""
    # 実装内容
主な機能:

LLMとのインタラクション:

generate_script: システムプロンプトと入力テキストを使用して、対話スクリプトを生成します。

call_llm: OpenAI APIを介してLLMを呼び出します。


テキスト抽出:

parse_url: 指定されたURLからテキストを抽出します。Jina Reader APIを使用。


音声生成:

generate_podcast_audio: テキストを音声ファイルに変換します。BarkまたはMeloTTSを使用。

preload_models: Barkモデルを事前にロードしておき、音声生成の効率を高めます。


 まとめopen-notebooklmは、ユーザーが提供したPDFファイルやウェブページのテキストコンテンツを元に、AIモデルを活用して魅力的なポッドキャストを生成するツールです。ソースコードはモジュール化されており、各ファイルが明確な役割を持っています。

app.py: アプリケーションのエントリーポイント。Gradioインターフェースを設定。

constants.py: アプリケーション全体で使用する定数を定義。

prompts.py: LLMへのプロンプトテンプレートを定義。

README.md: プロジェクトの概要とセットアップ方法を記載。

requirements.txt: 必要なパッケージの一覧。

schema.py: データモデルを定義し、LLMの出力を検証。

utils.py: LLMとのやり取りや音声生成などのユーティリティ関数を提供。
このプロジェクトを通じて、オープンソースのAIモデルを活用した実践的なアプリケーションの構築方法を学ぶことができます。

 スペースhttps://huggingface.co/spaces/gabrielchua/open-notebooklm
open-notebooklmのソースコード解説

ディレクトリ構成

ファイルごとの詳細解説

1. `app.py`

2. `constants.py`

3. `prompts.py`

4. `README.md`

5. `requirements.txt`

6. `schema.py`

7. `utils.py`

まとめ

スペース

Discussion