以下ページのGUIDESを読む。とりあえずざっくり読んだ後にまとめる

GPT

OpenAIのGPT(generative pre-trained transformer)モデルは自然言語とコードを理解するように訓練されていて、入力に応答してテキスト出力を提供する
GPTへの入力は「プロンプト」とも呼ばれる
ユースケース
- Draft documents
- Write computer code
- Answer questions about a knowledge base
- Analyze texts
- Create conversational agents
- Give software a natural language interface
- Tutor in a range of subjects
- Translate languages
- Simulate characters for games

Chat completions API

入力としてメッセージのリストを受け取り、出力としてモデルが生成したメッセージを返す。複数ターンでの会話が可能に設計されている
各messagesオブジェクトはrole（"system"、"user"、"assistant "のいずれか）とcontentを持つ

response = openai.ChatCompletion.create(
    model="gpt-3.5-turbo",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Who won the world series in 2020?"},
        {"role": "assistant", "content": "The Los Angeles Dodgers won the World Series in 2020."},
        {"role": "user", "content": "Where was it played?"}
    ]
)

{
  "choices": [
    {
      "finish_reason": "stop",
      "index": 0,
      "message": {
        "content": "The 2020 World Series was played in Texas at Globe Life Field in Arlington.",
        "role": "assistant"
      }
    }
  ],
  "created": 1677664795,
  "id": "chatcmpl-7QyqpwdfhqwajicIEznoc6Q47XAyW",
  "model": "gpt-3.5-turbo-0613",
  "object": "chat.completion",
  "usage": {
    "completion_tokens": 17,
    "prompt_tokens": 57,
    "total_tokens": 74
  }
}

システムメッセージ
- アシスタントの動作を設定するのに役立つ
- アシスタントのパーソナリティを変更したり、会話を通してアシスタントの振る舞いについて具体的な指示を与えることができる
ユーザーメッセージ
- アシスタントが応答するためのリクエストやコメントを提供
アシスタントメッセージ
- 以前のアシスタントの応答を保存するが、望ましい動作の例を与えるためにユーザが書くこともできる
モデルには過去のリクエストの記憶がないため、関連する情報はすべて、各リクエストの会話履歴の一部として提供されなければならない
finish_reason
- すべてのレスポンスに含まれる
- stop
  - APIが返した完全なメッセージ、またはstopパラメータで指定した停止シーケンスのいずれかで終了したメッセージ。
- length
  - max_tokensパラメータやトークンの制限により、モデルの出力が不完全である。
- function_call
  - モデルが関数の呼び出しを決定した
- content_filter
  - コンテンツフィルターのフラグによってコンテンツが省略された
- null
  - APIレスポンスがまだ進行中もしくは不完全・入力パラメータ(以下のように関数を提供するような)に応じて、モデルのレスポンスは異なる情報を含むかもしれません。
API reference documentation

Function calling

gpt-3.5-turbo-0613とgpt-4-0613で利用可能
関数を呼び出すべきタイミングを検出し、関数のシグネチャに準拠した JSON で応答するようにfine tuneされている
メールの送信、オンラインへの投稿、購入などを行う前に、ユーザー確認フローを挟むべき
関数は、モデルが学習した構文でシステムメッセージに注入される
- 関数がモデルのコンテキスト制限にカウントされ、入力トークンとして課金されることを意味
- コンテキスト制限にぶつかる場合は、関数の数を制限するか、関数のパラメータに提供するドキュメントの長さを制限する
ユースケース
- 外部API（ChatGPTプラグインなど）を呼び出して質問に答えるチャットボットを作成する
  - e.g. define functions like send_email(to: string, body: string), or get_current_weather(location: string, unit: 'celsius' | 'fahrenheit')
- 自然言語をAPIコールに変換する
  - e.g. convert "Who are my top customers?" to get_customers(min_revenue: int, created_before: string, limit: int) and call your internal API
- テキストから構造化データを抽出する
  - e.g. define a function called extract_data(name: string, birthday: string), or sql_query(query: string)
関数呼び出しのハルシネーションはシステムメッセージで軽減できることがある
- e.g.: "提供された関数だけを使いなさい "というシステムメッセージ
OpenAI cookbook

import openai
import json

# Example dummy function hard coded to return the same weather
# In production, this could be your backend API or an external API
def get_current_weather(location, unit="fahrenheit"):
    """Get the current weather in a given location"""
    weather_info = {
        "location": location,
        "temperature": "72",
        "unit": unit,
        "forecast": ["sunny", "windy"],
    }
    return json.dumps(weather_info)

def run_conversation():
    # Step 1: send the conversation and available functions to GPT
    messages = [{"role": "user", "content": "What's the weather like in Boston?"}]
    functions = [
        {
            "name": "get_current_weather",
            "description": "Get the current weather in a given location",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "The city and state, e.g. San Francisco, CA",
                    },
                    "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]},
                },
                "required": ["location"],
            },
        }
    ]
    response = openai.ChatCompletion.create(
        model="gpt-3.5-turbo-0613",
        messages=messages,
        functions=functions,
        function_call="auto",  # auto is default, but we'll be explicit
    )
    response_message = response["choices"][0]["message"]

    # Step 2: check if GPT wanted to call a function
    if response_message.get("function_call"):
        # Step 3: call the function
        # Note: the JSON response may not always be valid; be sure to handle errors
        available_functions = {
            "get_current_weather": get_current_weather,
        }  # only one function in this example, but you can have multiple
        function_name = response_message["function_call"]["name"]
        function_to_call = available_functions[function_name]
        function_args = json.loads(response_message["function_call"]["arguments"])
        function_response = function_to_call(
            location=function_args.get("location"),
            unit=function_args.get("unit"),
        )

        # Step 4: send the info on the function call and function response to GPT
        messages.append(response_message)  # extend conversation with assistant's reply
        messages.append(
            {
                "role": "function",
                "name": function_name,
                "content": function_response,
            }
        )  # extend conversation with function response
        second_response = openai.ChatCompletion.create(
            model="gpt-3.5-turbo-0613",
            messages=messages,
        )  # get a new response from GPT where it can see the function response
        return second_response

print(run_conversation())

どのモデルを使う？

gpt-4かgpt-3.5-turboのどちらかを使用することを推奨
gpt-4
- 複雑な命令に注意深く従う能力が高い
- gpt-3.5-turboよりも情報をでっち上げる可能性が低い(= ハルシネーション)
- コンテキストウィンドウが大きい
gpt-3.5-turbo
- 低レイテンシ
- 低トークン per コスト
一般的なデザインパターン
- 複数の異なるクエリタイプを使用し、それぞれを処理するのに適切なモデルにディスパッチする

GPT best practices

詳細はGPT best practices
OpenAI Cookbook

Managing tokens

言語モデルは、トークンと呼ばれる塊でテキストを読み書きする
"ChatGPT is great!"は、["Chat", "G", "PT", " is", " great", "!"]という6つのトークンになる
トークン数の影響
- APIコールにかかるコスト
- APIコールにかかる時間。トークンを多く書くほど時間がかかる
- APIコールが機能するか
  - トークンの合計がモデルの最大制限値（gpt-3.5-turboでは4097トークン）以下でなければならない
Counting tokens
- tiktoken Pythonライブラリ
  - How to count tokens with tiktoken
4090トークンのgpt-3.5-turboの会話は、6トークンで返信が途切れてしまう

Frequency and presence penalties

penalty coefficientsの妥当な値は、繰り返しのサンプルを多少減らすだけなら0.1～1程度
繰り返しを強く抑制することが目的であれば、2まで増やすことができるが、サンプルの質を著しく低下させる
負の値を使用すると、繰り返しの可能性を高めることができる

FAQ

Why are model outputs inconsistent?
- The API is non-deterministic by default
- temperatureを0にすることでほとんど決定論的
How should I set the temperature parameter?
- 値が低いほど一貫性のある出力が得られ、高いほど多様で創造的な結果が得られる
Is fine-tuning available for the latest models?
- gpt-3.5-turboとアップデートされたベースモデル（babbage-002とdavinci-002）のみ
- fine-tuning guide
Do you store the data that is passed into the API?
- お客様のAPIデータを30日間保持する。学習には利用しない。
- Usage policies
- Default usage policies by endpoint
How can I make my application more safe?
- moderation guide

r.kagaya

GPT best practices

GPTからより良い結果を得るための best practices
現在のところgpt-4でのみ機能する

Six strategies for getting better results

Write clear instructions

明確に望むもの・指示をする
- e.g._ アウトプットが長すぎる場合は、簡潔な返答を求める。アウトプットがシンプルすぎる場合は、専門家レベルの文章を依頼する。望むフォーマットを示す
Tactics
- より適切な回答を得るために、問い合わせに詳細を含める
  - before: How do I add numbers in Excel?
  - after: How do I add up a row of dollar amounts in Excel? I want to do this automatically for a whole sheet of rows with all the totals ending up on the right in a column called "Total".
  - https://platform.openai.com/docs/guides/gpt-best-practices/strategy-write-clear-instructions
- モデルにペルソナを採用するよう依頼する
  - SYSTEM
    - When I ask for help to write something, you will reply with a document that contains at least one joke or playful comment in every paragraph.
  - USER
    - Write a thank you note to my steel bolt vendor for getting the delivery in on time and in short notice. This made it possible for us to deliver an important order.
  - https://platform.openai.com/docs/guides/gpt-best-practices/tactic-ask-the-model-to-adopt-a-persona
- 区切り文字を使用して、入力の明確な部分を明確に示す
  - トリプルクォーテーション、XMLタグ、セクションタイトルなどの区切り記号で、テキストのセクションを区別する
  - """insert text here"""
  - <article> insert first article here </article>
  - Abstract: insert abstract here
  - https://platform.openai.com/docs/guides/gpt-best-practices/tactic-use-delimiters-to-clearly-indicate-distinct-parts-of-the-input
- タスクを完了するために必要なステップを指定する
  - ステップを明示的に書き出す
  - https://platform.openai.com/docs/guides/gpt-best-practices/tactic-specify-the-steps-required-to-complete-a-task
- 例を示す
  - few shot prompt
  - https://platform.openai.com/docs/guides/gpt-best-practices/tactic-provide-examples
- 出力の長さを指定する
  - 指定された長さの出力を生成するようにモデルに要求する。単語、文、段落、箇条書きなどの
  - ただ高い精度では機能しない。特定の段落数や箇条書きの数を持つ出力をより確実に生成する
  - Summarize the text delimited by triple quotes in about 50 words.
  - Summarize the text delimited by triple quotes in 2 paragraphs.
  - Summarize the text delimited by triple quotes in 3 bullet points.
  - https://platform.openai.com/docs/guides/gpt-best-practices/tactic-specify-the-desired-length-of-the-output

Provide reference text

参考文献を提供することは、捏造を少なくして回答することに役立つ
Tactics:
- Instruct the model to answer using a reference text
- Instruct the model to answer with citations from a reference text

SYSTEM
Use the provided articles delimited by triple quotes to answer questions. If the answer cannot be found in the articles, write "I could not find an answer."
USER
<insert articles, each delimited by triple quotes>

Question: <insert question here>

SYSTEM
You will be provided with a document delimited by triple quotes and a question. Your task is to answer the question using only the provided document and to cite the passage(s) of the document used to answer the question. If the document does not contain the information needed to answer this question then simply write: "Insufficient information." If an answer to the question is provided, it must be annotated with a citation. Use the following format for to cite relevant passages ({"citation": …}).
USER
"""<insert document here>"""

Question: <insert question here>

Split complex tasks into simpler subtasks

複雑なタスクを分割する
Tactics:
- Use intent classification to identify the most relevant instructions for a user query
- For dialogue applications that require very long conversations, summarize or filter previous dialogue
  - 会話の前のターンを要約する
  - embeddings-based searchを利用する
- Summarize long documents piecewise and construct a full summary recursively
  - 非常に長い文書を要約するには各セクションごとの要約を行う
  - 後のセクションの意味を理解するために、前のセクションの要約を利用できる
  - Summarizing books with human feedback

Give GPTs time to "think"

Tactics:

Use external tools

Tactic: Use embeddings-based search to implement efficient knowledge retrieval

外部ソースから関連する情報を動的にプロンプトのcontextに含めることで最新・質の高い回答を行うことできる
- Embeddings
e.g.) ユーザが特定の映画について質問した場合、その映画に関する質の高い情報（俳優、監督など）をモデルの入力に追加する
Vector Databases

Tactic: Use code execution to perform more accurate calculations or call external APIs

モデル自身に計算をさせる代わりに、コードを書いて実行するように指示することができる
トリプル・バックティクスのような指定されたフォーマットに、実行するコードを入れるように指示し、出力からコードを抽出して実行する

SYSTEM
You can write and execute Python code by enclosing it in triple backticks. Also note that you have access to the following module to help users send messages to their friends:

```python
import message
message.write(to="John", message="Hey, want to meetup after work?")```

モデルによって生成されたコードを実行することは、本質的に安全ではない。サンドボックス化されたコード実行環境が必要

Tactic: Give the model access to specific functions

function callingを用いる

Test changes systematically

Good evals
- Representative of real-world usage (or at least diverse)
- Contain many test cases for greater statistical power (see table below for guidelines)
- Easy to automate or repeat
https://github.com/openai/evals
- Evals is a framework for evaluating LLMs and LLM systems, and an open-source registry of benchmarks.

Tactic: Evaluate model outputs with reference to gold-standard answers

r.kagaya

Image generation

r.kagaya

Fine-tuning

Fine-tuningによって
- Higher quality results than prompting
  - Ability to train on more examples than can fit in a prompt
  - Token savings due to shorter prompts
  - Lower latency requests
- ファインチューニングでプロンプトに収まるよりも多くの例で学習することで、few shot以上に多くのタスクでより良い結果を達成できるように
- いったんモデルがチューニングされれば、プロンプトにそれほど多くの例を提供する必要がなくなり、コスト削減、低レイテンシリクエストを可能する
- ステップ
  - トレーニングデータの準備とアップロード
  - 新しいファインチューニング・モデルをトレーニングする
  - ファインチューニングしたモデルを使用する
- Fine-tuning可能なモデル
  - gpt-3.5-turbo-0613
    - recommended
  - babbage-002
  - davinci-002

When to use fine-tuning

まずはプロンプトエンジニアリング、プロンプトチェイニング(複雑なタスクを複数のプロンプトに分割する)、 function callingを試す
- 適切なプロンプトによって結果が改善されるタスクがたくさんある
- プデータセットを作成してトレーニングジョブを実行する必要があるファインチューニングを反復するよりも、はるかに速いフィードバックループが得られる
- ファインチューニングが必要な場合でも、最初のプロンプトエンジニアリング作業は無駄にはならない
  - ファインチューニングデータで優れたプロンプトを使用する（またはプロンプトチェイニング／ツールの使用とファインチューニングを組み合わせる）ことで、一般的により良い結果が得られる

Common use cases

スタイル、トーン、フォーマット、その他の質的側面の設定
希望するアウトプットを出す際の信頼性の向上
複雑なプロンプトに従わない場合の修正
多くのエッジケースを特定の方法で処理する
プロンプトで表現するのが難しい新しいスキルやタスクを実行する
品質を犠牲にすることなく、コストやレイテンシを削減

Preparing your dataset

{"messages": [{"role": "system", "content": "Marv is a factual chatbot that is also sarcastic."}, {"role": "user", "content": "What's the capital of France?"}, {"role": "assistant", "content": "Paris, as if everyone doesn't know that already."}]}
{"messages": [{"role": "system", "content": "Marv is a factual chatbot that is also sarcastic."}, {"role": "user", "content": "Who wrote 'Romeo and Juliet'?"}, {"role": "assistant", "content": "Oh, just some guy named William Shakespeare. Ever heard of him?"}]}
{"messages": [{"role": "system", "content": "Marv is a factual chatbot that is also sarcastic."}, {"role": "user", "content": "How far is the Moon from Earth?"}, {"role": "assistant", "content": "Around 384,400 kilometers. Give or take a few, like that really matters."}]}

Tips

少なくとも10個の例を提供する
- 通常、gpt-3.5-turboでは50から100のトレーニング例でファインチューニングを行うことで明確な改善が見られる
- 適切な数は正確なユースケースによって異なる
50個の例から始め、ファインチューニング後に改善の兆しを見せるかどうかを確認することを推奨
- 改善が見られない場合は、サンプルセットをスケーリングする前に、モデルのタスクの設定方法を見直すか、データを再構築する必要があることを示唆
トレーニング部分とテスト部分に分割することを推奨
100,000個のトークンを3回のエポックで学習する場合、予想されるコストは～2.40ドルです。
データフォーマットのチェック
- https://cookbook.openai.com/examples/chat_finetuning_data_prep
ファインチューニングの結果が芳しくなかった場合
- https://platform.openai.com/docs/guides/fine-tuning/use-a-fine-tuned-model
https://platform.openai.com/docs/guides/fine-tuning/iterating-on-data-quantity
https://platform.openai.com/docs/guides/fine-tuning/iterating-on-hyperparameters
example
- https://platform.openai.com/docs/guides/fine-tuning/fine-tuning-examples

When should I use fine-tuning vs embeddings with retrieval?

embeddings with retrievalは関連するコンテキストと情報を持つ文書の大規模なデータベースを持つ必要がある場合に最適
ファインチューニングの代替策ではなく、補完する

How do I know if my fine-tuned model is actually better than the base model?

サンプルを並べて比較する
より包括的に行うなら、https://github.com/openai/evals

How do rate limits work on fine-tuned models?

トータル・スループットの観点から、モデルごとのTPMが増えるわけではない

r.kagaya

Embeddings

embeddingsとは

浮動小数点数のベクトル（リスト）
- 2つのベクトル間の距離は、それらの関連性を測定する
- 距離が小さいと関連性が高く、距離が大きいと関連性が低いことを示す
用途
- 検索（クエリ文字列との関連性によって結果がランク付けされる）
- クラスタリング（テキスト文字列が類似性によってグループ化される）
- 推薦（関連するテキスト文字列を持つアイテムが推薦される）
- 異常検出（関連性の低い異常値が特定される）
- 多様性測定（類似度分布が分析される）
- 分類（テキスト文字列が最も類似したラベルによって分類される）

How to get embeddings

Embedding models

ほぼすべてのユースケースでtext-embedding-ada-002を使用することを推奨
- https://openai.com/blog/new-and-improved-embedding-model

Use cases

Text search using embeddings

from openai.embeddings_utils import get_embedding, cosine_similarity

def search_reviews(df, product_description, n=3, pprint=True):
   embedding = get_embedding(product_description, model='text-embedding-ada-002')
   df['similarities'] = df.ada_embedding.apply(lambda x: cosine_similarity(x, embedding))
   res = df.sort_values('similarities', ascending=False).head(n)
   return res

res = search_reviews(df, 'delicious beans', n=3)

Limitations & risks

Social bias
Blindness to recent events
- 2020年8月以降に発生した事象に関する知識が欠けている

FAQ

How can I retrieve K nearest embedding vectors quickly?

https://cookbook.openai.com/examples/vector_databases/readme
Vector database options
- Chroma, an open-source embeddings store
- Elasticsearch, a popular search/analytics engine and vector database
- Milvus, a vector database built for scalable similarity search
- Pinecone, a fully managed vector database
- Qdrant, a vector search engine
- Redis as a vector database
- Typesense, fast open source vector search
- Weaviate, an open-source vector search engine
- Zilliz, data infrastructure, powered by Milvus

Which distance function should I use?

コサイン類似度。通常distance functionはあまり重要ではない

r.kagaya

Speech to text

r.kagaya

Moderation

moderations endpoint
https://platform.openai.com/docs/api-reference/moderations/object