Context Precisionをざっくり調査する

 Context Precision（コンテキストの精度）https://docs.ragas.io/en/stable/concepts/metrics/available_metrics/context_precision/

 雑な初感コンテキスト内の関連チャンクの割合を測定するメトリクス。

LLMを用いて、取得したコンテキストが関連しているかどうかを識別する。

そのだ

 LLMContextPrecisionWithoutReference(参照なしのコンテキスト精度)検索されたコンテキスト（retrieved_contexts）の精度を評価している。

つまり、ユーザーの質問（user_input）に対して回答（response）を生成するために使用された検索コンテキスト（retrieved_contexts）がどれだけ関連性が高いかを測定。

 やってみたドキュメント通りにするとKeyError: 'reference'と出るので、referenceを追加して検証。
    sample = SingleTurnSample(
        user_input="Where is the Eiffel Tower located?", # 質問
        response="The Eiffel Tower is located in Paris.", # AIのレスポンス
        retrieved_contexts=["The Eiffel Tower is located in Paris."],  # RAGシステムが検索・取得した文脈情報（コンテキスト）のリスト
        reference="" # 正解とされる回答だが、本来ここは不要なはず
    )

    context_precision = LLMContextPrecisionWithReference(llm=evaluator_llm)

    res = await context_precision.single_turn_ascore(sample)
    print(res)
そうすると、ドキュメント通り0.9999999999と表示される。

 原因ragasのバージョンがv0.2.3以上だと修正されてるらしい。

しかし、pip install -upgradeしてもv0.2.14のままなので、一旦そのまま進める

https://github.com/explodinggradients/ragas/issues/1594

そのだ

LLMContextPrecisionWithReference

retrieved_contextsを評価しているが、LLMContextPrecisionWithoutReferenceとは違いユーザーの質問(user_input)ではなく、正解とされる回答(reference)を使用して評価している。
つまり、検索されたコンテキスト（retrieved_contexts）が正解とされる回答（reference）を生成するのにどれだけ役立つか/関連性があるかを評価している。

やってみた

from ragas import SingleTurnSample
from ragas.metrics import LLMContextPrecisionWithReference

context_precision = LLMContextPrecisionWithReference(llm=evaluator_llm)

sample = SingleTurnSample(
    user_input="Where is the Eiffel Tower located?",
    reference="The Eiffel Tower is located in Paris.",
    retrieved_contexts=["The Eiffel Tower is located in Paris."], 
)

await context_precision.single_turn_ascore(sample)

ドキュメント通り0.9999999999と表示される。
referenceとretrieved_contextsを評価しているということで、retrieved_contextsを全くの別物に変えて検証してみる。

    sample = SingleTurnSample(
        user_input="Where is the Eiffel Tower located?",
        reference="The Eiffel Tower is located in Paris.",
        retrieved_contexts=["これはサンプルです。", "これはテストデータです。"],  # 関係のないcontexts
    )

そうすると0.0になった。

そのだ

NonLLMContextPrecisionWithReference

LLMを使用せずに評価をする。
取得したコンテキス（retrieved_contexts）と参照コンテキスト（reference_contexts）を評価している。

導入

pip install rapidfuzz

やってみた

    context_precision = NonLLMContextPrecisionWithReference()

    sample = SingleTurnSample(
        retrieved_contexts=["The Eiffel Tower is located in Paris."], 
        reference_contexts=["Paris is the capital of France.", "The Eiffel Tower is one of the most famous landmarks in Paris."]
    )
    await context_precision.single_turn_ascore(sample)

ドキュメント通り0.9999999999と表示される。

このスクラップは6ヶ月前にクローズされました