Open2025/01/08にコメント追加2

タスクを理解したプロンプト最適化フレームワーク「PromptWizard」を試す

論文

LLM

promptengineering

promptwizard

kun432

論文
https://arxiv.org/abs/2405.18369
Claude-3.5-Sonnetによる落合プロンプトの結果

 PromptWizard：タスクを理解したプロンプト最適化フレームワーク
 1. どんなもの？PromptWizardは、大規模言語モデル（LLM）のプロンプト（指示文）を自動的に最適化するフレームワークです。プロンプトとは、LLMに与える入力指示のことで、モデルの出力の質を大きく左右します。従来、プロンプトの作成は人手による試行錯誤が必要でしたが、このツールは自己進化と自己適応のメカニズムを活用して、プロンプトを自動的に改善します。フィードバックに基づく批評と合成のプロセスを通じて、プロンプトの探索と活用のバランスを取りながら、指示文と具体例の両方を段階的に改善していきます。これにより、人が読みやすく、かつ特定のタスクに最適化されたプロンプトを生成することができます。

 2. 先行研究を比べてどこがすごい？従来のプロンプト最適化手法には、連続的最適化と離散的最適化の2つのアプローチがありました。連続的手法（InstructZero、Instinct）は、ソフトプロンプトを使用して最適化を行いますが、複雑なタスクへの対応が難しく、解釈も困難です。離散的手法（PromptBreeder、EvoPrompt）は、複数のプロンプトバリエーションを生成して評価しますが、フィードバックメカニズムがないため非効率的でした。PromptWizardの革新的な点は：
フィードバックに基づく探索により、より効率的にプロンプトを改善
指示文と具体例を同時に最適化することで、タスク固有の特徴をより深く理解
人が理解できる形式のプロンプトを生成し、スケーラビリティを確保
API呼び出しとトークン使用量を大幅に削減

 3. 技術や手法の肝はどこ？PromptWizardの核となる技術は5つのステップで構成されています：
プロンプト指示の反復的改善：様々な思考スタイルを用いてプロンプトのバリエーションを生成し、スコアリング、批評、合成のサイクルで改善
多様な例の特定：訓練データから効果的な例を選択し、ポジティブ・ネガティブ両方の例を活用
指示と例の段階的最適化：プロンプトと具体例を交互に改善し、相乗効果を生む
推論の自己生成と検証：選択された例に対して詳細な推論チェーンを生成し、一貫性を確保
タスクの意図とエキスパートペルソナの統合：タスク固有のヒントや専門家の視点を組み込む

 4. どうやって有効だと検証した？主に3つの観点から検証を行っています：
BIG-Bench Instruction Induction（BBII）データセットを含む45のタスクで評価を実施し、既存手法を大きく上回る性能を達成
コスト分析により、APIコール数を従来の連続的手法の1/5、離散的手法の1/16～1/60に削減できることを実証
限られたトレーニングデータ（25例のみ）や小規模なLLMでの有効性も確認

 5. 議論はある？以下の制限事項が指摘されています：
新しいタスクへの適用時には慎重な検証が必要
実環境での展開前にはプロンプトの応答テストが不可欠
自動化されているものの、最適化プロセスの方向付けには人間の専門知識が依然として重要

 6. 次に読むべき論文は？プロンプトエンジニアリングと自動最適化に関する以下の論文が参考になります：
InstructZero (Chen et al., 2023)：黒箱LLMに対する効率的な指示最適化について
Instinct (Lin et al., 2024)：ニューラルバンディットとトランスフォーマーを組み合わせた指示最適化
PromptBreeder (Fernando et al., 2023)：自己参照による進化的プロンプト最適化
EvoPrompt (Guo et al., 2024)：進化的アルゴリズムを用いたプロンプト最適化
GitHubレポジトリ
https://github.com/microsoft/PromptWizard?tab=readme-ov-file

 PromptWizard 🧙PromptWizard: タスク対応型プロンプト最適化フレームワーク

Eshaan Agarwal, Joykirat Singh, Vivek Dani, Raghav Magazine, Tanuja Ganu, Akshay Nambi 

 概要 🌟

PromptWizardフレームワークの概要

referred from https://github.com/microsoft/PromptWizard and translated into Japanese by kun432
PromptWizardは、LLMが自らプロンプトや例を生成、批評、洗練し、フィードバックと合成を通じて継続的に改善する自己進化メカニズムを採用した離散プロンプト最適化フレームワークです。この自己適応アプローチにより、指示とインコンテキスト学習例の両方が進化し、タスクパフォーマンスが向上します。
PromptWizardの3つの主要コンポーネントは以下の通りです：
フィードバック駆動型リファインメント: LLMが自らプロンプトや例を生成、批評、改良し、フィードバックと合成を通じて継続的に改善します​
多様な例の批評と合成: タスク対応型で頑健かつ多様な合成例を生成し、プロンプトと例を同時に最適化します​
ポジティブ、ネガティブ、合成例を組み合わせた自己生成型Chain of Thought (CoT)ステップ


ステージ1: 指示の反復最適化

referred from https://github.com/microsoft/PromptWizard and translated into Japanese by kun432


ステージ2: 指示と例示の連続的最適化

referred from https://github.com/microsoft/PromptWizard and translated into Japanese by kun432

 PromptWizardの仕組み 🔍問題の記述と初期のプロンプト指示を使用して、PromptWizard (PW) はLLMをプロンプトしてタスク記述を変異させることで指示のバリエーションを生成します。性能に基づいて最適なプロンプトが選択されます。PWには批評コンポーネントが組み込まれており、フィードバックを提供してプロンプトの改善を複数回の反復にわたって指導・調整します。
また、PWはインコンテキスト例も最適化します。PWはトレーニングデータから多様な例を選択し、変更されたプロンプトを使用してその性能に基づいてポジティブおよびネガティブな例を識別します。ネガティブな例はさらなるプロンプトの改良に役立ちます。
例と指示は順次最適化され、批評を活用して現在のプロンプトの弱点に対応する合成例を生成します。これらの例は統合され、プロンプトをさらに改良します。
PWはChain-of-Thought (CoT)を用いて詳細な推論チェーンを生成し、プロンプトの問題解決能力を豊かにします。
PWはタスクの意図や専門家の人格（エキスパート・ペルソナ）を統合することでプロンプトを人間の推論と一致させ、モデルの性能と解釈性を向上させます。

 結果 📈

referred from https://github.com/microsoft/PromptWizard
PromptWizardは、さまざまな閾値で他の手法を一貫して上回り、最高のp(τ)値を維持します。これにより、すべてのタスクで可能な限り最高の精度に近いパフォーマンスを一貫して実現していることを示します。
上記の図は、インストラクション誘導タスクのパフォーマンスプロファイル曲線を示しています。パフォーマンスプロファイル曲線は、異なる手法のパフォーマンスが最高パフォーマンスにどれだけ近いかを視覚化したものです。この曲線において、x軸（τ）は最高パフォーマンスに対する性能比を表し、y軸（p(τ)）は、手法のパフォーマンスがこの比率内に収まるタスクの割合を反映しています。この曲線から、特定の手法がタスク全体でどの程度最高パフォーマンスに近いかを確認することができます。

kun432

Quickstart

Quickstartとして、3つのシナリオが用意されている。

例を使わないプロンプトの最適化
2.合成例を生成し、それを用いてプロンプトを最適化
3.学習データを使ったプロンプトの最適化

ここでいう例はFew-shotのことだと思う。でこのシナリオに沿ったノートブックが用意されている。

のだが、個人的に

いろいろ隠蔽されていてわかりにくい
日本語で試したい

と感じたので、このノートブックの中身を確認しつつ、一から組み立ててみたいと思う。

Colaboratoryで。

レポジトリをクローンしてインストール

!git clone https://github.com/microsoft/PromptWizard
%cd PromptWizard
!pip install -e .

LLMはOpenAIのものを使う。APIキーを環境変数にセット。

import os
from google.colab import userdata

os.environ['OPENAI_API_KEY'] = userdata.get('OPENAI_API_KEY')

各シナリオごとに流れを見ていく。

1. 例を使わないプロンプトの最適化

このシナリオの前提は以下。

学習用データはない
最終プロンプトに例を含めない

2つの設定ファイルを用意する。なお、設定ファイル自体はクローンしたレポジトリ内にあるもののコメント部分やプロンプトを日本語に変更をしたもの。ただ、コメントの中には、どうも現時点で実装されていなかったり記述がおかしいと感じられる箇所があるように思う。現時点ではあまり気にせず進める。

promptopt_config.yaml はプロンプトの最適化に関する振る舞いを設定するものに見える。

promptopt_config.yaml

# 使用するプロンプト改良手法を1つ以上指定します。複数のプロンプト改良手法を指定した場合、
# すべての手法が同じシードデータ上で実行されます。それぞれの手法に対して必要な反復数、
# および発生したコストが記録されます。また、各データインスタンスおよび全体での勝利手法も記録されます。

# サポートされているプロンプト改良手法: Basic, RecursiveEval, MedPrompt
# 使用したい手法のコメントアウトを解除してください
########## 批評タスクの説明 ここから ##########
prompt_technique_name: "critique_n_refine"
# llm_config.yamlで定義されたモデルのユニークなID
unique_model_id: gpt-4o-mini
# タスク説明の<mutation_rounds>回の変異と、指示の改良を実行する反復数
mutate_refine_iterations: 3
# 異なるスタイルを生成する際に実行する変異の回数
mutation_rounds: 3
# 変異後に指示を改良する
refine_instruction: true
# タスク説明およびfew-shot用のインコンテキスト例を改良するための反復回数
refine_task_eg_iterations: 3
# 指定された反復内で生成するプロンプトのバリエーション数
style_variation: 5
# トレーニングステップ中にLLMに対して1回のバッチで提示する質問の数
questions_batch_size: 1
# プロンプトが良好と見なされるために正解する必要がある質問バッチの最小数
min_correct_count: 3
# プロンプトを評価するミニバッチの最大数
max_eval_batches: 6
# 次の反復で使用するトップパフォーマンスのプロンプトの数
top_n: 1
# タスクの説明。この内容がプロンプトに入力されます
task_description: "あなたは数学の専門家です。与えられた数学の問題を解くのがあなたの仕事です。"
# データセットに合わせた基本指示。この内容がプロンプトに入力されます
base_instruction: "ステップバイステップで考えて。"
# 解答フォーマットを指定するための指示
answer_format: "各質問について、正しい答えとその根拠を提示してください。"
# データセットからトレーニングデータとして取り置くサンプルの数。各反復で
# `questions_batch_size`の例が置換付きでトレーニングデータから抽出されます。
seen_set_size: 25
# few-shot用に指定される例の数
few_shot_count: 5
# 生成される合成トレーニング例の数
num_train_examples: 20
# 合成推論を生成する
generate_reasoning: true
# 与えられたタスクを解くことができる専門家の説明を生成する
generate_expert_identity: true
# タスクの意図を記述するキーワードを生成する
generate_intent_keywords: false
########## 批評タスクの説明 ここまで ##########

setup_config.yamlは、ログやLLMの設定など動作に関する設定に思える。

setup_config.yaml

assistant_llm:
  # llm_config.yamlで指定したunique_model_idを入力してください
  prompt_opt: gpt-4o-mini
dir_info:
  # すべてのベースディレクトリ
  base_dir: logs
  log_dir_name: glue_logs
experiment_name: gsm8k
# モードがonline/offlineの場合、以下のように多くの機能が異なります。例:
# 1) オフラインモードではログがコンソールに出力されます
# 2) オンラインモードではLLMキューが初期化されます
mode: offline
# 実験の完全な説明。この内容がログに記録されます。
description:

promptopt_config.yamlのほうがキーになりそうな内容に思えるね。

これらの設定ファイルを読み込んで、プロンプト最適化・推論を行うためのオブジェクトを初期化する。

from promptwizard.glue.promptopt.instantiate import GluePromptOpt

gp = GluePromptOpt(
    prompt_config_path="promptopt_config.yaml",
    setup_config_path="setup_config.yaml",
    dataset_jsonl=None,
    data_processor=None
)

出力

Setup configurations parameters: [('assistant_llm', AssistantLLM(prompt_opt='gpt-4o-mini')), ('description', None), ('dir_info', Dir(base_dir='logs', log_dir_name='glue_logs')), ('experiment_name', 'gsm8k'), ('mode', 'offline')] 

======================================================================================================================================================

Prompt Optimization parameters: [('answer_format', '各質問について、正しい答えとその根拠を提示してください。'), ('base_instruction', 'ステップバイステップで考えて。'), ('few_shot_count', 5), ('generate_expert_identity', True), ('generate_intent_keywords', False), ('generate_reasoning', True), ('max_eval_batches', 6), ('min_correct_count', 3), ('mutate_refine_iterations', 3), ('mutation_rounds', 3), ('num_train_examples', 20), ('prompt_technique_name', 'critique_n_refine'), ('questions_batch_size', 1), ('refine_instruction', True), ('refine_task_eg_iterations', 3), ('seen_set_size', 25), ('style_variation', 5), ('task_description', 'あなたは数学の専門家です。与えられた数学の問題を解くのがあなたの仕事です。'), ('top_n', 1), ('unique_model_id', 'gpt-4o-mini')] 

======================================================================================================================================================

では、最適化を実行、といきたいのだが、どうも以下の環境変数の設定が必要みたい。USE_OPENAI_API_KEYはまあわからんでもないとして、モデル名は設定ファイルでも指定しているんだけどな。

import os
os.environ['USE_OPENAI_API_KEY'] = "True"
os.environ['OPENAI_MODEL_NAME'] = "gpt-4o-mini"

準備ができたので、最適化を実行。

best_prompt, expert_profile = gp.get_best_prompt(
    use_examples=False,
    run_without_train_examples=True,
    generate_synthetic_examples=False
)

出力


Mutating Task Description....
Iterations completed:   0%|          | 0/3 [00:00<?, ?it/s]
======================================================================================================================================================
 + Starting iteration: 1 
 current_base_instruction: ステップバイステップで考えて。
mutation_round=0 mutated_sample_prompt=You are given a task description and a prompt instruction and different styles known as meta prompts:
[Task Description]: あなたは数学の専門家です。与えられた数学の問題を解くのがあなたの仕事です。
[Meta Prompt]: How could I devise an experiment to help solve that problem?
Make a list of ideas for solving this problem, and apply them one by one to the problem to see if any progress can be made.
How could I measure progress on this problem?
How can I simplify the problem so that it is easier to solve?
What are the key assumptions underlying this problem?
Now you need to generate 5 variations of following Instruction adaptively mixing meta prompt while keeping similar semantic meaning.
Make sure to wrap each generated prompt with <START> and <END>
[Prompt Instruction]: ステップバイステップで考えて。
[Generated Prompts]:
mutated_prompt_generation=<START> ステップバイステップで問題を解決する方法を考えてみましょう。 <END>  
<START> 問題を解くために、段階的にアプローチを試みてください。 <END>  
<START> この問題を解決するために、ステップごとに考えて進めてみましょう。 <END>  
<START> 問題を段階的に分析し、解決策を見つける方法を考えてください。 <END>  
<START> ステップバイステップで進めることで、問題を解決する方法を探りましょう。 <END>  
mutation_round=1 mutated_sample_prompt=You are given a task description and a prompt instruction and different styles known as meta prompts:
[Task Description]: あなたは数学の専門家です。与えられた数学の問題を解くのがあなたの仕事です。
[Meta Prompt]: How could I devise an experiment to help solve that problem?
Make a list of ideas for solving this problem, and apply them one by one to the problem to see if any progress can be made.
How could I measure progress on this problem?
How can I simplify the problem so that it is easier to solve?
What are the key assumptions underlying this problem?
Now you need to generate 5 variations of following Instruction adaptively mixing meta prompt while keeping similar semantic meaning.
Make sure to wrap each generated prompt with <START> and <END>
[Prompt Instruction]: ステップバイステップで考えて。
[Generated Prompts]:
mutated_prompt_generation=<START> ステップバイステップで問題を解決する方法を考えてみましょう。 <END>  
<START> 問題を解くために、段階的にアプローチを試みてください。 <END>  
<START> この問題を解決するために、ステップごとに考えて進めてみましょう。 <END>  
<START> 問題を段階的に分析し、解決策を見つける方法を考えてください。 <END>  
<START> ステップバイステップで進めることで、問題を解決する方法を探りましょう。 <END>  
mutation_round=2 mutated_sample_prompt=You are given a task description and a prompt instruction and different styles known as meta prompts:
[Task Description]: あなたは数学の専門家です。与えられた数学の問題を解くのがあなたの仕事です。
[Meta Prompt]: How could I devise an experiment to help solve that problem?
Make a list of ideas for solving this problem, and apply them one by one to the problem to see if any progress can be made.
How could I measure progress on this problem?
How can I simplify the problem so that it is easier to solve?
What are the key assumptions underlying this problem?
Now you need to generate 5 variations of following Instruction adaptively mixing meta prompt while keeping similar semantic meaning.
Make sure to wrap each generated prompt with <START> and <END>
[Prompt Instruction]: ステップバイステップで考えて。
[Generated Prompts]:
mutated_prompt_generation=<START> 問題を解決するために、段階的に考えてみましょう。 <END>  
<START> この問題に対して、ステップごとにアプローチを考えてみてください。 <END>  
<START> 問題を解くために、各ステップを順を追って検討してみましょう。 <END>  
<START> ステップバイステップでこの問題に取り組む方法を考えてみてください。 <END>  
<START> 問題解決のために、段階的に考察を進めていきましょう。 <END>  
mutation_round=3 mutated_sample_prompt=You are given a task description and a prompt instruction and different styles known as meta prompts:
[Task Description]: あなたは数学の専門家です。与えられた数学の問題を解くのがあなたの仕事です。
[Meta Prompt]: How could I devise an experiment to help solve that problem?
Make a list of ideas for solving this problem, and apply them one by one to the problem to see if any progress can be made.
How could I measure progress on this problem?
How can I simplify the problem so that it is easier to solve?
What are the key assumptions underlying this problem?
Now you need to generate 5 variations of following Instruction adaptively mixing meta prompt while keeping similar semantic meaning.
Make sure to wrap each generated prompt with <START> and <END>
[Prompt Instruction]: ステップバイステップで考えて。
[Generated Prompts]:
mutated_prompt_generation=<START> ステップバイステップで問題を解決する方法を考えてみましょう。 <END>  
<START> 問題を解くために、段階的にアプローチを進めてみてください。 <END>  
<START> この問題を解決するために、ステップごとに考えてみることが重要です。 <END>  
<START> 問題を解く際に、各ステップを順を追って検討してみましょう。 <END>  
<START> ステップバイステップで進めることで、問題解決の手助けをしましょう。 <END>  

Optimization Finished...

Possible prompt variations:
_______________________________________________________________________

Variations 1:
Expert Profile:
You are a mathematics expert with a profound understanding of various mathematical concepts and problem-solving techniques. Your extensive training and experience enable you to tackle a wide range of mathematical problems, from basic arithmetic to advanced calculus and beyond. You possess a strong analytical mindset and can break down complex problems into manageable steps, making it easier to find solutions. Your ability to explain mathematical principles clearly and concisely allows you to guide others through the problem-solving process. Whether it's algebra, geometry, or statistics, you are equipped to provide accurate and insightful solutions to any given mathematical challenge. Your passion for mathematics drives you to help others appreciate the beauty and utility of this discipline.:
Prompt:
あなたは数学の専門家です。与えられた数学の問題を解くのがあなたの仕事です。
ステップバイステップで考えて。


各質問について、正しい答えとその根拠を提示してください。
Keywords: 数学, 専門家, 問題解決, ステップバイステップ, 分析
_______________________________________________________________________

Variations 2:
Expert Profile:
You are a mathematics expert with a profound understanding of various mathematical concepts and problem-solving techniques. Your extensive training and experience enable you to tackle a wide range of mathematical problems, from basic arithmetic to advanced calculus and beyond. You possess a strong analytical mindset and are skilled at breaking down complex problems into manageable steps. Your ability to apply different mathematical theories and methods allows you to find solutions efficiently and accurately. You are also adept at explaining your thought process clearly, making it easy for others to follow along and learn from your approach. Your expertise in mathematics makes you the ideal agent to solve any given mathematical problem, ensuring that you provide not only the correct answer but also a thorough understanding of the underlying principles involved.:
Prompt:
 ステップバイステップで問題を解決する方法を考えてみましょう。 


各質問について、正しい答えとその根拠を提示してください。
Keywords: 数学, 専門家, 問題解決, ステップバイステップ, 分析
Iterations completed:   0%|          | 0/3 [00:23<?, ?it/s]
Time taken to find best prompt: 23.48910689353943 sec
_______________________________________________________________________

Variations 3:
Expert Profile:
You are a mathematics expert with a profound understanding of various mathematical concepts and problem-solving techniques. Your extensive training and experience enable you to tackle a wide range of mathematical problems, from basic arithmetic to advanced calculus and beyond. You possess a strong analytical mindset and are skilled at breaking down complex problems into manageable steps. Your ability to apply different mathematical theories and methods allows you to find solutions efficiently and accurately. Whether the problem involves algebra, geometry, statistics, or any other area of mathematics, you are equipped to provide clear explanations and thorough solutions. Your expertise not only helps you solve problems but also aids in teaching and guiding others in their mathematical journey.:
Prompt:
 問題を解くために、段階的にアプローチを試みてください。 


各質問について、正しい答えとその根拠を提示してください。
Keywords: 数学, 専門家, 問題解決, ステップバイステップ, 分析

なるほど、３つのバリエーションが生成されているのがわかる。

ちなみに、best_promptとexpert_profileで結果を受け取るようになっているが、中身は空。コンソールには出力されてるけど。