



(注: 今回の評価はChatGPTでしているので主観的な評価は含まれていません。また各プロンプトが細かく作れていなかったり、全体的にざっくりとした検証です。)

📝 検証内容



① Input: 検証対象のURL


② Text Splitters: 検証のためのテキスト処理



# 今回のテキスト(加工処理後)
- 合計テキスト文字数: 18076
- 分割後のテキスト数: 7
- 分割後のテキストの文字数: [2746, 2006, 2527, 2603, 2806, 2760, 2628]


def split_by_html_header_url(url):
    print("--- HTMLHeaderTextSplitter ---")

    # --- 取得方法: 分割するタグ指定 ---
    headers_to_split_on = [] # タグ指定なし

    # --- 取得インスタンス: TextSplitter生成 ---
    html_splitter = HTMLHeaderTextSplitter(headers_to_split_on=headers_to_split_on)

    # --- 出力 ---
    html_header_splits = html_splitter.split_text_from_url(url)

    # --- 取得インスタンス2: TextSplitter生成 ---
    text_splitter = RecursiveCharacterTextSplitter(
        separators=["\n\n", "\n", " ", ""], # チャンクの区切り文字リスト
        chunk_size=3000,   # チャンクの最大文字数
        chunk_overlap=300  # チャンク間の重複する文字数

    # --- 出力2 ---
    result = text_splitter.split_documents(html_header_splits)
    return result


③ Combines: 検証する要約手法

  • MapReduce
    • 分割テキストごとに要約を作成して、各要約をまとめて再要約する
  • MapRerank
    • 分割テキストごとに回答&自信度を生成して、自信度の高い回答を採用する
  • Refine
    • 分割テキストの1つ目を要約し、2つ目以降は「前の要約+分割テキスト」で要約することで、最終的に洗練された要約を作る

🏁 検証結果

手法(モデル) シチュエーション
MapReduce(GPT-4) バランス重視
MapReduce(GPT-3.5) コスパ重視
MapRerank(GPT-4) -
MapRerank(GPT-3.5) 低コスト、スピード重視
Refine(GPT-4) 精度重視
Refine(GPT-3.5) -
手法(モデル) 精度1(※) 精度2(※) コスト 生成時間 出力文字数
MapReduce(GPT-4) 75 /100 85 /100 約14.1円($0.09424) 73.48秒 1685文字
MapReduce(GPT-3.5) 80 /100 90 /100 約1.4円($0.0093335) 13.33秒 767文字
MapRerank(GPT-4) 70 /100 75 /100 約12.6円($0.08393) 80.34秒 673文字
MapRerank(GPT-3.5) 65 /100😢 70 /100😢 約1.3円($0.0088355) 12.57秒 438文字
Refine(GPT-4) 85 /100👑 95 /100👑 約20.0円($0.13348) 154.17秒 2375文字
Refine(GPT-3.5) 60 /100 80 /100 約1.6円($0.010415) 16.96秒 759文字


  1. まずは記事を読んでもらう。


# Title: Aligning language models to follow instructions
# Text:
  1. 評価してもらう。





1. MapReduce

# ==================================================
# MapReduceDocumentsChain
# ==================================================
from langchain.chains import (
from langchain_core.prompts import PromptTemplate
from langchain_openai import OpenAI, ChatOpenAI
import requests
from bs4 import BeautifulSoup
from textsplitter import split_by_html_header_url

def map_reduce_documents(url):

    # --- タイトル取得 ---
    res = requests.get(url)
    soup = BeautifulSoup(res.text, "html.parser")
    title = soup.title.text.strip()

    # --- モデル指定 ---
    chat_gpt3 = ChatOpenAI(model='gpt-3.5-turbo-0125')
    chat_gpt4 = ChatOpenAI(model='gpt-4-0125-preview')

    # --- チェーン生成: 要約するチェーン ---
    prompt = PromptTemplate.from_template(
        "The TEXT below is part of a page with the following title."
        "I would like to extract and summarise the content parts of this article, extract and summarise the parts relevant to the title of the TEXT."
        "If there is 'no content associated with the title' or 'no specific detail or content exists for the summary', only the Title is output.\n\n"

        "Title: " + title + "\n"
    llm_chain = LLMChain(llm=chat_gpt3, prompt=prompt)

    # --- チェーン生成: 結合するチェーン ---
    reduce_prompt = PromptTemplate.from_template(
        "Summarise the entire summary list of the following titles in plain language.\n\n"

        "Title: " + title + "\n"
    reduce_llm_chain = LLMChain(llm=chat_gpt3, prompt=reduce_prompt) # LLMチェーン(LLM+プロンプト)

    document_prompt = PromptTemplate(
    # ドキュメントを結合するためのチェーン
    combine_documents_chain = StuffDocumentsChain(

    # 結合するチェーン
    reduce_documents_chain = ReduceDocumentsChain(

    # --- チェーン生成: MapReduce ---
    # ドキュメントを分割するためのチェーン
    chain = MapReduceDocumentsChain(
        llm_chain=llm_chain,                           # 要約するチェーン
        reduce_documents_chain=reduce_documents_chain, # 結合するチェーン

    result_documents = split_by_html_header_url(url)   # ドキュメントを分割する
    result_map_reduce = chain.invoke(result_documents) # チェーンを実行する
    return result_map_reduce

result = map_reduce_documents("https://openai.com/research/instruction-following")
出力結果: gpt-4-0125-preview
This collection of articles talks about how a new version of AI, called InstructGPT, has been developed to follow instructions better than its predecessor, GPT-3. The key to InstructGPT’s success is that it has been trained with input from people to be more accurate, less likely to make stuff up, and less likely to say harmful things. It does a better job of understanding and doing what users ask, from simple tasks to explaining complex ideas in simple ways. Even though InstructGPT is less complex than GPT-3 in terms of its inner workings, people generally prefer its responses. It has been tested and shown to be safer and more in line with what people want, using techniques that involve learning from human feedback.

The articles also touch on the tricky balance of making these AI models do what users want without losing their ability to perform well on standardized tests for language understanding. A method that mixes in some of the original training data during a fine-tuning process is suggested to keep the AI performing well across the board while still keeping it safe and aligned with human preferences.

Despite these advancements, there are still challenges. The newer models sometimes produce content that's not appropriate, and the articles stress that how these AI tools are put into use also matters for keeping them safe. There's ongoing work to make sure the AI doesn't just cater to the majority but also respects the values and preferences of diverse groups of people. This effort represents the start of applying research to make AI tools that better understand and align with what humans want, underscoring a team effort in addressing these complex issues.




出力結果: gpt-3.5-turbo-0125
The article discusses the development and implementation of InstructGPT models, which are trained to follow user instructions, be more truthful, and less toxic than previous models like GPT-3. These models have shown superior performance in following English instructions and generating preferred outputs, leading to their deployment as the default language models on their API. The alignment research aims to improve safety, reliability, and overall helpfulness of language models, with a focus on mitigating biases and harms. The text also highlights the importance of aligning language models to prevent misuse and ensure safe outputs, emphasizing the need to consider the values of specific populations and continue developing alignment techniques for AI systems.



2. MapRerank

# ==================================================
# MapRerankDocumentsChain
# ==================================================
from langchain_openai import OpenAI, ChatOpenAI
from langchain.chains import LLMChain, MapRerankDocumentsChain
from langchain_core.prompts import PromptTemplate
from langchain.output_parsers.regex import RegexParser
import requests
from bs4 import BeautifulSoup
from textsplitter import split_by_html_header_url

def map_rerank_documents(url):

    # --- タイトル取得 ---
    res = requests.get(url)
    soup = BeautifulSoup(res.text, "html.parser")
    title = soup.title.text.strip()

    # --- モデル指定 ---
    chat_gpt3 = ChatOpenAI(model='gpt-3.5-turbo-0125')
    chat_gpt4 = ChatOpenAI(model='gpt-4-0125-preview')

    # --- チェーン生成: 要約するチェーン ---
    # プロンプト: 各ドキュメントの処理時
    prompt_template = (
        "The TEXT below is part of a page with the following title."
        "I would like to extract and summarise the content part of this article. If the text part of the article exists, please extract and summarise its content."
        "For the summary, please identify its main points and express your level of confidence in the summary as a Score."
        "If there is 'No content to match the title' or 'No specific details or content present', output `Answer: None\n"
        "Score: 0`.\n\n"

        "# Output Format:\n"
        "Answer: xxx(str)\n"
        "Score: xxx(num: 0-100)\n\n"

        "# Title: " + title + "\n"
        "# Text:\n"

        "# Summary:\n"

    # 出力のパース: 回答とスコアを取得するための正規表現
    output_parser = RegexParser(
        regex=r"(.*?)\nScore: (.*)",
        output_keys=["answer", "score"],

    # プロンプト: 各ドキュメントの処理時
    prompt = PromptTemplate(
    llm_chain = LLMChain(llm=chat_gpt3, prompt=prompt)

    # --- チェーン生成: MapRerank ---
    chain = MapRerankDocumentsChain(

    result_documents = split_by_html_header_url(url)   # ドキュメントを分割する
    result_map_rerank = chain.invoke(result_documents) # チェーンを実行する
    return result_map_rerank

result = map_rerank_documents("https://openai.com/research/instruction-following")
出力結果: gpt-4-0125-preview
Answer: The article discusses the development of InstructGPT, an improvement over GPT-3, highlighting its enhanced ability to follow user instructions more accurately. This advancement was achieved through alignment research, incorporating human feedback into the training process, resulting in models that are not only better at understanding and executing tasks as per user intentions but also exhibit increased truthfulness and reduced toxicity. InstructGPT models are now the default on the API. The text includes examples comparing responses of GPT-3 and InstructGPT to various prompts, demonstrating InstructGPT's superior comprehension and execution of instructions.


回答 この論文では、GPT-3を改良したInstructGPTの開発について論じており、より正確にユーザーの指示に従う能力が強化されたことを強調している。この進歩は、人間のフィードバックを訓練プロセスに取り入れたアラインメント研究によって達成され、その結果、ユーザーの意図通りにタスクを理解し実行する能力が向上しただけでなく、真実性が高まり、毒性が減少したモデルを実現しました。InstructGPTモデルは現在APIのデフォルトとなっている。本文には、様々なプロンプトに対するGPT-3とInstructGPTの応答を比較した例が含まれており、InstructGPTの優れた命令の理解と実行が実証されています。
出力結果: gpt-3.5-turbo-0125
The article discusses the development of InstructGPT models, which are trained to better follow user intentions compared to GPT-3. These models are also designed to be more truthful and less toxic. The InstructGPT models are trained with humans in the loop and have been deployed as the default language models on their API. The examples provided show that InstructGPT performs better at following English instructions compared to GPT-3. 



3. Refine

# ==================================================
# RefineDocumentsChain
# ==================================================
from langchain_openai import OpenAI, ChatOpenAI
from langchain.chains import RefineDocumentsChain, LLMChain
from langchain_core.prompts import PromptTemplate
import requests
from bs4 import BeautifulSoup
from textsplitter import split_by_html_header_url

def refine_documents(url):

    # --- タイトル取得 ---
    res = requests.get(url)
    soup = BeautifulSoup(res.text, "html.parser")
    title = soup.title.text.strip()

    # --- モデル指定 ---
    chat_gpt3 = ChatOpenAI(model='gpt-3.5-turbo-0125')
    chat_gpt4 = ChatOpenAI(model='gpt-4-0125-preview')

    # --- チェーン生成: 1番目に要約するチェーン ---
    document_prompt = PromptTemplate(
    prompt = PromptTemplate.from_template(
        "The TEXT below is part of a page with the following title."
        "I would like to extract and summarise the content parts of this article, extract and summarise the parts relevant to the title of the TEXT."
        "If there is 'no content associated with the title' or 'no specific detail or content exists for the summary', output Title.\n"

        "# Title: " + title + "\n"
        "# Text:\n"
        "# Summary:\n"

    initial_llm_chain = LLMChain(llm=chat_gpt3, prompt=prompt)

    # --- チェーン生成: 2番目以降に要約するチェーン ---
    prompt_refine = PromptTemplate.from_template(
        "The TEXT below is part of a page with the following title."
        "The Previous Summary also summarises the context immediately preceding this TEXT."
        "Build on the Previous Summary and refine it with additional insights and details found in the TEXT."
        "Focus on strengthening the summary by integrating relevant information that supplements or clarifies the initial main points."
        "Try to maintain consistency and add value to the initial summary without repeating information already mentioned.\n\n"

        "# Title: " + title + "\n"
        "# Previous Summary:\n{prev_summary}\n\n"
        "# Text:\n{context}\n\n"
        "# Summary:\n"
    refine_llm_chain = LLMChain(llm=chat_gpt3, prompt=prompt_refine)

    # --- チェーン生成: Refine ---
    chain = RefineDocumentsChain(

    result_documents = split_by_html_header_url(url) # ドキュメントを分割する
    result_refine = chain.invoke(result_documents)   # チェーンを実行する
    return result_refine

result = refine_documents("https://openai.com/research/instruction-following")
出力結果: gpt-4-0125-preview
Building upon the initial summary, the additional insights from the text highlight OpenAI's commitment to addressing the complex challenge of ensuring that InstructGPT models refuse to execute potentially harmful instructions. This highlights the ongoing struggle to balance user instruction compliance with the imperative to prevent misuse. Furthermore, the text reveals an awareness of the model's current limitations in representing diverse cultural values, particularly those of non-English speaking communities. This underscores a broader challenge in AI alignment: the ethical and societal implications of whose values are prioritized in model outputs. OpenAI acknowledges the necessity of developing more inclusive, responsible processes for determining how models should align with the diverse values of global users.

Moreover, the text reiterates the effectiveness of the strategies employed to mitigate the alignment tax, ensuring models remain useful for a broad range of academic and practical applications. It also emphasizes the initial success in generalizing the model to align with a wider array of human preferences beyond the training labelers, an essential step towards creating universally acceptable AI systems.

However, these advancements come with the acknowledgment of the ongoing challenges in completely safeguarding against the generation of toxic, biased, or otherwise harmful content. This necessitates continuous efforts in refining content filters, monitoring for misuse, and exploring more nuanced approaches to model training that consider the varied societal impacts of AI-generated content.

OpenAI positions these efforts as just the beginning of a long-term commitment to enhance the safety, utility, and alignment of AI with human values. The call for collaboration and further research in this area indicates an open acknowledgment of the complexities involved in AI alignment and the need for a collective effort to tackle these challenges.

This refined summary not only elaborates on the strategies and challenges in aligning InstructGPT with human instructions and ethical standards but also underscores the broader societal implications of these efforts. It highlights the importance of ongoing research, inclusivity, and responsible decision-making in the development of AI models that are aligned with a diverse range of human values.






出力結果: gpt-3.5-turbo-0125
The InstructGPT models, designed to enhance adherence to user instructions, face challenges in aligning with diverse user groups and avoiding misuse. To address this, models need to be trained to refuse unsafe instructions, prioritize the preferences of specific populations, and align outputs with diverse cultural values. Research is ongoing to understand and address differences in user preferences and ensure responsible decision-making in aligning model outputs. The application of alignment techniques has shown promise in improving the alignment of AI systems with human intentions, but further advancements are needed to create safe and beneficial language tools. Efforts are underway to refine these techniques and enhance alignment with user values.







