プロンプトエンジニアリング支援ツール「PrOwl」を試す

GitHubレポジトリ
https://github.com/lks-ai/prowl

 PrOwl v0.1あなたのプロンプトに翼を！

 バックスストーリーLangChain の線形かつ文字列に依存したプロンプト作成アプローチに何ヶ月も取り組んだ結果、もう十分だと決意しました。私はプロンプトが HTML 1.0 のように直感的で、シンプルかつ宣言的でパワフルであってほしかったのです。PrOwl を使うことで、LangChain の7ヶ月分以上のプロンプト完了を初日に生成することができました。今や、プロンプトが中心となり、Python コーディングはプロンプトエンジニアリングを支援するためのものに過ぎません。PrOwl はプロンプトエンジニアリングを、それ自体が本来持つべき第一級の市民として扱います。

 PrOwl の優位性
ローカル LLM 向けに設計: 多くのフレームワークが OpenAI 互換性を重視する中、PrOwl は vLLM や Mistral Instruct 7B を念頭に置いて設計されており、さらなる LLM クライアントのサポートも可能です。

シンプルさとアクセスのしやすさ: .prowl スクリプトは軽量で、ProwlStack を通じて Python コードに容易に統合できるよう設計されています。余計な「ボイラープレート」ではなく、直接的で非同期なプロンプトマジックを実現します。

ミニマリズムの追求: 必要最低限の言語機能に焦点を当てた小さなインタプリタにより、コードベースをクリーンに保ち、機能を明確に定義します。

スマートな問題解決:

関心の分離: コードとプロンプト作成を明確に分離します。

多段階の構成: 自然な流れで複雑かつ多段階のプロンプトを構築します。

変数優先の生成: 変数の抽出に余計な時間をかけることなく、データとテキストを同時に生成します。

チェイニングとスタッキング: 生成されたすべての変数が後続の出力に条件付けされ、従来のチェイニングを一歩進化させます。

内蔵ツール: カスタムツールの作成と利用がシームレスに行えます。

幻覚制御: LLM を体系的なプロセスで導き、逸脱した出力を最小限に抑えます。

多重生成サポート: 変数を複数回宣言し、その進化を正確に追跡できます。

拡張されたプロンプトエンジニアリング: LLM と並行してスクリプトを設計し、人間の基準を機械生成の出力に統合します。これは「アラインメント」ではなく、コンディショニング なのです。
プロンプトテンプレート支援、というところかな

kun432

インストール

Ubuntu-22.04上でやる。

作業ディレクトリ＋Python仮想環境作成

mkdir prowl && cd prowl
uv venv -p 3.12.9

PrOwlは、デフォルトでvLLMを使用する。その他には、OllamaやOpenRouter、OpenAIなども行けるようだがとりあえずはデフォルトに従う。あと、モデルについても、PrOwlはどうやらMistralで検証されているようなので、mistralai/Mistral-7B-Instruct-v0.3を使うこととする。

パッケージインストール。

uv pip install prompt-owl vllm

出力

(snip)
 + prompt-owl==0.1.17
(snip)
 + vllm==0.7.3
(snip)

vLLMを起動（モデルのダウンロードは結構時間がかかる）

uv run vllm serve mistralai/Mistral-7B-Instruct-v0.3 \
    --max-model-len 4096 \
    --host 0.0.0.0 \
    --port 8888

別ターミナルを開いて、環境変数を設定

export PROWL_VLLM_ENDPOINT=http://localhost:8888
export PROWL_MODEL=mistralai/Mistral-7B-Instruct-v0.3

以降はこのターミナル上で作業を行う

kun432

PrOwlの使い方

ではサンプル。READMEにあるサンプルコードはそのままでは動かなかったので、デモのノートブックのコードをベースにした。

sample.py
from prowl import prowl
import asyncio
import json


template = """
# User Request
{user_request}

# User Intent
What is the user's intended meaning? Disambiguate the request and write one sentence about it's pragmatic intention. Be concise and accurate:
- The user is requesting I {user_intent(48, 0.0)}

# Fulfill the User Request
{output_text(1024, 0.2)}
"""

user_request = "How high is mount everest?"

async def main():
    result = await prowl.fill(
        template,
        variables={
            "user_request": prowl.Variable(value=user_request)
        }
    )
    print("===== 質問 =====")
    print(user_request)
    print("===== 全ての生成 =====")
    print(result.completion)
    print("===== 変数 =====")
    print(json.dumps(result.get(), indent=2, ensure_ascii=False))
    print("===== 回答 =====")
    print(result.val("output_text"))

if __name__ == "__main__":
    asyncio.run(main())

templateがPrOwlによるプロンプトテンプレート。変数は２つある。

プロンプトテンプレートに入力される変数。
プロンプトテンプレート内で定義する変数。

説明の簡単のため以降は前者を「外部変数」、後者を「PrOwlテンプレート変数」と呼ぶ。（公式の呼び方ではなく、自分が勝手に定義したものなので注意。）

外部変数は、テンプレート外部から入力される変数。上の例だとuser_requestがそれにあたる。単純に{}で囲めば良い。

PrOwlテンプレート変数は、PrOwlのプロンプトテンプレート内で定義をする変数で、以下のようなフォーマットになる。

{変数名(最大出力トークン, temperature)}

では実際に実行してみる。

uv run sample.py

出力

[INFO] Endpoint URL: http://localhost:8888/v1/completions, model: mistralai/Mistral-7B-Instruct-v0.3

<< user_intent(48, 0.0) >> multiline: False
provide the height of Mount Everest

<< output_text(1024, 0.2) >> multiline: True
Mount Everest is 8,848 meters (29,029 feet) high.
===== 質問 =====
How high is mount everest?
===== 全ての生成 =====

# User Request
How high is mount everest?

# User Intent
What is the user's intended meaning? Disambiguate the request and write one sentence about it's pragmatic intention. Be concise and accurate:
- The user is requesting I provide the height of Mount Everest

# Fulfill the User Request
Mount Everest is 8,848 meters (29,029 feet) high.


===== 変数 =====
{
  "user_request": "How high is mount everest?",
  "user_intent": "provide the height of Mount Everest",
  "output_text": "Mount Everest is 8,848 meters (29,029 feet) high."
}
===== 回答 =====
Mount Everest is 8,848 meters (29,029 feet) high.

「エベレストの高さは？」という質問に対して、「8848メートル（29029フィート）」という回答が得られているのがわかる。レスポンスのプロパティとメソッドには以下がある。

result.completion: 全ての変数が埋め込まれた完全な出力が取得できる
result.get(): 全ての変数の値を辞書で取得できる
result.val("変数名"): 個々の変数の値を取得できる

で、実際に行われたリクエストのプロンプトは以下となっている。

1回目

# User Request
How high is mount everest?

# User Intent
What is the user's intended meaning? Disambiguate the request and write one sentence about it's pragmatic intention. Be concise and accurate:
- The user is requesting I

2回目

# User Request
How high is mount everest?

# User Intent
What is the user's intended meaning? Disambiguate the request and write one sentence about it's pragmatic intention. Be concise and accurate:
- The user is requesting I provide the height of Mount Everest

# Fulfill the User Request

1回目にPrOwlテンプレート変数user_intentの内容を生成するためにLLMにリクエストが飛んで、2回目はその結果をテンプレートに入れてPrOwlテンプレート変数output_textの内容を生成している、つまりマルチステップの推論を１つのテンプレートフローで指定できるということになる。

このテンプレートは別ファイルに切り出すことができる。promptsというディレクトリを作成して、以下のように個々のプロンプトを別々のファイルに保存する。

prompts/input.prowl

# User Request
{user_request}

prompts/intent.prowl

# User Intent
What is the user's intended meaning? Disambiguate the request and write one sentence about it's pragmatic intention. Be concise and accurate:
- The user is requesting I {user_intent(48, 0.0)}

prompts/output.prowl

# Fulfill the User Request
{output_text(1024, 0.2)}

外部に切り出したプロンプトを読み出すにはProwlStackを使う。

sample_stack.py
from prowl import ProwlStack
import asyncio
import json

# 呼び出すプロンプトが保存されているディレクトリをStackとして定義
stack = ProwlStack(folder=['./prompts/'])

# 入力は辞書で渡す
inputs = {'user_request': 'How high is mount everest?'}

async def main():
    # Stackを実行
    result = await stack.run(
        ['input', 'intent', 'output'],  # 呼び出すプロンプトを順に指定
        inputs=inputs     # 入力を指定
    )

    print("===== 質問 =====")
    print(inputs['user_request'])
    print("===== 全ての生成 =====")
    print(result.completion)
    print("===== 変数 =====")
    print(json.dumps(result.get(), indent=2, ensure_ascii=False))
    print("===== 回答 =====")
    print(result.val("output_text"))

if __name__ == "__main__":
    asyncio.run(main())

実行してみる。

uv run sample_stack.py

出力

Checking ./prompts/ for prowl files...
Checking prompts/ for prowl files...
Loaded `.prowl` Scripts: ['output', 'intent', 'input']
✅ Requirements check passed for input -> intent -> output
[INFO] Endpoint URL: http://localhost:8888/v1/completions, model: mistralai/Mistral-7B-Instruct-v0.3
[INFO] Endpoint URL: http://localhost:8888/v1/completions, model: mistralai/Mistral-7B-Instruct-v0.3

<< user_intent(48, 0.0) >> multiline: False
provide the height of Mount Everest
[INFO] Endpoint URL: http://localhost:8888/v1/completions, model: mistralai/Mistral-7B-Instruct-v0.3

<< output_text(1024, 0.2) >> multiline: False
Mount Everest is 8,848.86 meters high
===== 質問 =====
How high is mount everest?
===== プロンプト =====
# User Request
How high is mount everest?
# User Intent
What is the user's intended meaning? Disambiguate the request and write one sentence about it's pragmatic intention. Be concise and accurate:
- The user is requesting I provide the height of Mount Everest
# Fulfill the User Request
Mount Everest is 8,848.86 meters high

===== 変数 =====
{
  "user_request": "How high is mount everest?",
  "user_intent": "provide the height of Mount Everest",
  "output_text": "Mount Everest is 8,848.86 meters high"
}
===== 回答 =====
Mount Everest is 8,848.86 meters high

最初の例とほぼ同じ結果になるが、プロンプトとコードが完全に分離され、プロンプトにおけるコード的要素も最小化される。

また、以下の部分を書き換えて実行してみる。

(snip)
    result = await stack.run(
        ['input', 'output'],  # intentプロンプトは使わない
        inputs=inputs
    )
(snip)

結果

出力

Checking ./prompts/ for prowl files...
Checking prompts/ for prowl files...
Loaded `.prowl` Scripts: ['output', 'intent', 'input']
✅ Requirements check passed for input -> output
[INFO] Endpoint URL: http://localhost:8888/v1/completions, model: mistralai/Mistral-7B-Instruct-v0.3
[INFO] Endpoint URL: http://localhost:8888/v1/completions, model: mistralai/Mistral-7B-Instruct-v0.3

<< output_text(1024, 0.2) >> multiline: False
Mount Everest is approximately 8,848.86 meters (29,029 feet) high. This measurement was made by a Chinese team in 2005 using a new method that combined satellite measurements with ground surveys. The previous measurement, made by a British team in 1955, put the height at 8,848 meters (29,028 feet). The difference in measurements is due to the movement of the Earth's crust and the melting and freezing of snow and ice on the mountain
===== 質問 =====
How high is mount everest?
===== 全ての生成 =====
# User Request
How high is mount everest?
# Fulfill the User Request
Mount Everest is approximately 8,848.86 meters (29,029 feet) high. This measurement was made by a Chinese team in 2005 using a new method that combined satellite measurements with ground surveys. The previous measurement, made by a British team in 1955, put the height at 8,848 meters (29,028 feet). The difference in measurements is due to the movement of the Earth's crust and the melting and freezing of snow and ice on the mountain

===== 変数 =====
{
  "user_request": "How high is mount everest?",
  "output_text": "Mount Everest is approximately 8,848.86 meters (29,029 feet) high. This measurement was made by a Chinese team in 2005 using a new method that combined satellite measurements with ground surveys. The previous measurement, made by a British team in 1955, put the height at 8,848 meters (29,028 feet). The difference in measurements is due to the movement of the Earth's crust and the melting and freezing of snow and ice on the mountain"
}
===== 回答 =====
Mount Everest is approximately 8,848.86 meters (29,029 feet) high. This measurement was made by a Chinese team in 2005 using a new method that combined satellite measurements with ground surveys. The previous measurement, made by a British team in 1955, put the height at 8,848 meters (29,028 feet). The difference in measurements is due to the movement of the Earth's crust and the melting and freezing of snow and ice on the mountain

先ほどの例とは違って、「高さ」以外の観測時の情報なども増えているのがわかる。このサンプルの最初の例では回答にフォーカスするようなプロンプトチェーンが組まれていたことがわかる。

つまり、Stackを使うことで、プロンプトを追加したり削除したり付け替えたり順番を変えたり、というプロンプトエンジニアリングの試行錯誤がシンプルに行えるということになる。

kun432

日本語で使う

では、モデルやプロンプトなども日本語で使えるかをやってみる。

モデルは今回はmicrosoft/Phi-4-mini-instructを使う

vLLMで起動

uv run vllm serve microsoft/Phi-4-mini-instruct \
    --host 0.0.0.0 \
    --port 8888 \
    --max-model-len 8192

別ターミナルを開いて、環境変数を設定

export PROWL_VLLM_ENDPOINT=http://localhost:8888
export PROWL_MODEL=microsoft/Phi-4-mini-instruct

日本語のプロンプトをprompts_jaディレクトリ配下に用意

prompts_ja/input.prowl

# ユーザのリクエスト
{user_request}

prompts_ja/intent.prowl

# ユーザの意図
ユーザのリクエストの背景にある意図は何ですか？リクエストの内容を吟味して、要求されている意図について1文で簡潔かつ正確に説明して。 
- ユーザが私に求めているのは {user_intent(100, 0.0)}

prompts_ja/output.prowl

# ユーザのリクエストに答える
{output_text(1024, 0.2)}

なお、Stackで指定するディレクトリ以外にpromptsディレクトリがデフォルトで読み込まれる様子。同じプロンプトファイル名だと意図したプロンプトにならない場合があるようなので注意。とりあえず今回はリネームする。

mv prompts prompts_

この日本語プロンプトを使用するStackを作成して推論。まずはIntentプロンプトは使用しない場合。

sample_stack_ja.py
from prowl import ProwlStack
import asyncio
import json

stack = ProwlStack(folder=['./prompts_ja/'])

inputs = {'user_request': '富士山って高いよね。どれぐらいなんだろう？'}

async def main():
    result = await stack.run(
        ['input', 'output'],
        inputs=inputs
    )

    print("===== 質問 =====")
    print(inputs['user_request'])
    print("===== 全ての生成 =====")
    print(result.completion)
    print("===== 変数 =====")
    print(json.dumps(result.get(), indent=2, ensure_ascii=False))
    print("===== 回答 =====")
    print(result.val("output_text"))

if __name__ == "__main__":
    asyncio.run(main())

結果

出力

Checking ./prompts_ja/ for prowl files...
Checking prompts/ for prowl files...
Loaded `.prowl` Scripts: ['output', 'intent', 'input']
✅ Requirements check passed for input -> output
[INFO] Endpoint URL: http://localhost:8888/v1/completions, model: microsoft/Phi-4-mini-instruct
[INFO] Endpoint URL: http://localhost:8888/v1/completions, model: microsoft/Phi-4-mini-instruct

<< output_text(1024, 0.2) >> multiline: False
富士山は日本で最も高い山で、標高は約3,776メートル（12,389フィート）です。
===== 質問 =====
富士山って高いよね。どれぐらいなんだろう？
===== 全ての生成 =====
# ユーザのリクエスト
富士山って高いよね。どれぐらいなんだろう？
# ユーザのリクエストに答える
富士山は日本で最も高い山で、標高は約3,776メートル（12,389フィート）です。

===== 変数 =====
{
  "user_request": "富士山って高いよね。どれぐらいなんだろう？",
  "output_text": "富士山は日本で最も高い山で、標高は約3,776メートル（12,389フィート）です。"
}
===== 回答 =====
富士山は日本で最も高い山で、標高は約3,776メートル（12,389フィート）です。

Intentプロンプトも組み込んで再度実行してみる

(snip)
async def main():
    result = await stack.run(
        ['input', 'intent', 'output'],
        inputs=inputs
    )
(snip)

結果

出力

Checking ./prompts_ja/ for prowl files...
Checking prompts/ for prowl files...
Loaded `.prowl` Scripts: ['output', 'intent', 'input']
✅ Requirements check passed for input -> intent -> output
[INFO] Endpoint URL: http://localhost:8888/v1/completions, model: microsoft/Phi-4-mini-instruct
[INFO] Endpoint URL: http://localhost:8888/v1/completions, model: microsoft/Phi-4-mini-instruct

<< user_intent(100, 0.0) >> multiline: False
、富士山の高さを教えてくれることです。
[INFO] Endpoint URL: http://localhost:8888/v1/completions, model: microsoft/Phi-4-mini-instruct

<< output_text(1024, 0.2) >> multiline: False
富士山は約3,776メートル（12,389フィート）の高さがあります。
===== 質問 =====
富士山って高いよね。どれぐらいなんだろう？
===== 全ての生成 =====
# ユーザのリクエスト
富士山って高いよね。どれぐらいなんだろう？
# ユーザの意図
ユーザのリクエストの背景にある意図は何ですか？リクエストの内容を吟味して、要求されている意図について1文で簡潔かつ正確に説明して。
- ユーザが私に求めているのは 、富士山の高さを教えてくれることです。
# ユーザのリクエストに答える
富士山は約3,776メートル（12,389フィート）の高さがあります。

===== 変数 =====
{
  "user_request": "富士山って高いよね。どれぐらいなんだろう？",
  "user_intent": "、富士山の高さを教えてくれることです。",
  "output_text": "富士山は約3,776メートル（12,389フィート）の高さがあります。"
}
===== 回答 =====
富士山は約3,776メートル（12,389フィート）の高さがあります。

大きくは違わないけど、余計な情報が減って回答にフォーカスしている。

自分がいろいろ試してみた限りだと、まあ日本語でも行けそうかなぁ。ただ、大したプロンプトでもないのに、モデルによっては回答が壊れるケースがあって、モデルそのものというよりは出力のパースに問題がありそう。

ざっと見た感じ、Structured Output的なものは考慮されてなさそうなのでまあそういうこともあるよなぁというところ。多分日本語だけの問題ではないと思う。プロンプトでJSON出力させるように書けばいいのかなぁ？

kun432

あとは深く見てないけど気になったところ。

ツール

一応「ツール」が使えるようで、以下のような感じでデコレータで指定する。例として、ComfyUIを使った画像生成ツールが用意されていて、プロンプトを生成してそれを元に画像生成するといった感じに見える。

## Prompt Composition
Compose a comma-delimited set of key phrases summarizing the above:
{prompt(520, 0.0)}

{@comfy(prompt)}

ドキュメントにはこれしか書いてないのだけど、ソースを見ると他にも色々あるようで、自分でも作れそう。

使い方とかはサンプルプロンプトのレポジトリが別にあるので、その中を追いかけていくのがいいかも。

CLI

CLIでも使える。唯自分が試した限りは以下のパッケージの追加が必要だった。

uv pip install pytz

こんな感じで実行

uv run prowl -folder=prompts_ja/ input intent output

対話形式になるので、入力して試せば良い。

出力

prompts_ja/
NO,
{'folder': 'prompts_ja/'}
Prompt Owl (PrOwl) version 0.1.17
---------------------------------
Working From /data/repository/prowl
Folder: ['prompts/', 'prompts_ja/']
Scripts: ['input', 'intent', 'output']
---------------------------------
Checking prompts/ for prowl files...
Checking prompts_ja/ for prowl files...
Loaded `.prowl` Scripts: ['output', 'intent', 'input']
@Tools> ['out', 'file', 'include', 'script', 'comfy', 'time', 'list', 'each']

You included an `input` block. Enter a value for `{user_request}`...
@>> User Request>

入力

@>> User Request> 富士山の高さは？

出力

✅ Requirements check passed for input -> intent -> output
[INFO] Endpoint URL: http://localhost:8888/v1/completions, model: microsoft/Phi-4-mini-instruct
[INFO] Endpoint URL: http://localhost:8888/v1/completions, model: microsoft/Phi-4-mini-instruct

<< user_intent(100, 0.0) >> multiline: False
、富士山の高さを知ることです。
[INFO] Endpoint URL: http://localhost:8888/v1/completions, model: microsoft/Phi-4-mini-instruct

<< output_text(1024, 0.2) >> multiline: False
富士山の高さは約3,776メートルです。

[[OUTPUT VARS]]

{'completion': '# ユーザのリクエスト\n富士山の高さは？\n# ユーザの意図\nユーザのリクエストの背景にある意図は何ですか？リクエストの内容を吟味して、要求されている意図について1文で簡潔かつ正確に説明して。 \n- ユーザが私に求めているのは 、富士山の高さを知ることです。\n# ユーザのリクエストに答える\n富士山の高さは約3,776メートルです。\n', 'variables': {'user_request': {'value': '富士山の高さは？', 'history': [{'value': '富士山の高さは？', 'usage': {'prompt_tokens': 0, 'total_tokens': 0, 'completion_tokens': 0, 'elapsed': 0}}], 'usage': {'prompt_tokens': 0, 'total_tokens': 0, 'completion_tokens': 0, 'elapsed': 0}}, 'user_intent': {'value': '、富士山の高さを知ることです。', 'history': [{'value': '、富士山の高さを知ることです。', 'usage': {'prompt_tokens': 91, 'total_tokens': 103, 'completion_tokens': 12, 'elapsed': 0.1582028865814209}}], 'usage': {'prompt_tokens': 91, 'total_tokens': 103, 'completion_tokens': 12, 'elapsed': 0.1582028865814209}}, 'output_text': {'value': '富士山の高さは約3,776メートルです。', 'history': [{'value': '富士山の高さは約3,776メートルです。', 'usage': {'prompt_tokens': 117, 'total_tokens': 134, 'completion_tokens': 17, 'elapsed': 0.18645286560058594}}], 'usage': {'prompt_tokens': 117, 'total_tokens': 134, 'completion_tokens': 17, 'elapsed': 0.18645286560058594}}}, 'usage': {'prompt_tokens': 208, 'total_tokens': 237, 'completion_tokens': 29, 'elapsed': 0.34465575218200684}, 'output': []}

[[COMPLETION PROMPT]]

# ユーザのリクエスト
富士山の高さは？
# ユーザの意図
ユーザのリクエストの背景にある意図は何ですか？リクエストの内容を吟味して、要求されている意図について1文で簡潔かつ正確に説明して。
- ユーザが私に求めているのは 、富士山の高さを知ることです。
# ユーザのリクエストに答える
富士山の高さは約3,776メートルです。


[[USAGE]]

{'prompt_tokens': 208, 'total_tokens': 237, 'completion_tokens': 29, 'elapsed': 0.34465575218200684}

[[OUTPUT FROM TEMPLATES]]

こんな感じで実行するとバッチで動かせた。

echo "富士山の高さは" | uv run prowl -folder=prompts_ja/ input intent output

この辺をうまく使えばユニットテスト的なこともできそう。

kun432

 まとめRedditあたりで多分見つけて、ちょっと面白そうだったので試してみた次第。色々試行錯誤するのに結構良さげな雰囲気を感じているのだけど、
ドキュメントが足りない。レポジトリ内を追いかけてコード読むなりは必要になりそう。
サンプルコードやノートブックもいろいろそのままだと動かないものがある。
上でも書いたけどStructured Outputがなさそうなので、プロンプトチェーンでうまくいかないとかはありそう。
あまり更新頻度は高くない。去年の春頃・今年の頭、ぐらいの感覚で修正されているように見える。まあ作者の方が困っていないのかも。
あたりはネックかなぁ。

このスクラップは26日前にクローズされました

ログインするとコメントできます