💬

データセット作成のためのフレームワーク RAGEval とは?

2024/11/04に公開

RAGEvalとは

RAGの評価をするためには、質問と回答のペア(評価用データセット)が必要になります。一般にこのペアは多いほど正しく評価が可能ですが、何百何千となると人による評価用データセットを作成するのは困難になります。
RAGEvalは、評価用データセットを自動で作成するフレームワークです。
https://arxiv.org/abs/2408.01262

RAGEvalでできないこと

全て自動でデータセットを作成することはできない

全て自動でデータセットを作成することはできません。後述するシード文章を人間の手で作成する必要があります。また、シード文書から作成されたスキーマが本当に正しいかをレビューする必要があります。

日本語で正しくデータセットが作成されるかは言及がない

論文中で評価用データセットが正しく作られているかどうかのベンチマークが取られているのは、英語と中国語のみです。その他の言語については言及がないので、精度高く作成できるかどうかは不明です。

RAGEvalによって作成できるもの

  • ドキュメント
  • 質問
  • 回答
  • 参照情報の抜粋
  • キーポイント

詳しくはここから見ていきます。

RAGEvalでのデータセット作成フロー

例として、契約書のような法律に関わるRAGシステムを評価するデータセットを作成するユースケースを考えます。

①シード文書とスキーマの作成

シード文書の作成

まず最初に、シード文書と呼ばれる法律に関する重要な知識を含んだ文書を作成します。
このシード文書の作成が最も重要です。この文章が基準となって、以降のステップが行われるからです。例えば社内ナレッジを使ったRAGシステムの評価のためのデータセットを作成する場合、社内ナレッジを使った質の高いシード文書を作成する必要があります。

スキーマの作成

先ほど作成したシード文書から、LLM(大規模言語モデル)を用いてスキーマを自動生成します。そして、生成されたスキーマはそれで終わりではなく、AIと法律両方に詳しい専門家によってレビューします。

スキーマの例

{
  " courtAndProcuratorate ": {
    " court ": "",
    " procuratorate ": ""
  },
  " chiefJudge ": "",
  " judge ": "",
  " clerk ": "",
  " defendant ": {
    " name ": "",
    " gender ": "",
    " birthdate ": "",
    " residence ": "",
    " ethnicity ": "",
    " occupation ": ""
  },
  " defenseLawyer ": {
    " name ": "",
    " lawFirm ": ""
  },
  " caseProcess ": [
    {
      " event ": " Case Filing and Investigation ",
      " date ": ""
    },
    {
      " event ": " Detention Measures Taken ",
      " date ": ""
    },
    {
      " event ": " Criminal Detention ",
      " date ": ""
    },
    {
      " event ": " Arrest ",
      " date ": ""
    }
  ],
  " criminalFacts ": [
    {
      " crimeName ": "",
      " details ": [
        {
          " timePeriod ": "",
          " behavior ": "",
          " evidence ": ""
        }
      ]
    }
  ],
  " legalProcedure ": {
    " judgmentDate ": "",
    " judgmentResult ": [
      {
        " crimeName ": "",
        " sentence ": "",
        " sentencingConsiderations ": ""
      }
    ]
  }
}

②設定ファイルの作成

テキスト生成のための参考や制約のために、設定ファイル (Config) を作成します。これはRAGEvalにスクリプトが用意されていて自動で生成ができます。
この設定ファイルにより、この先のステップで生成する文章や質問の一貫性を確保することができます。

設定ファイルの例

{
  " courtAndProcuratorate ": {
    " court ": "Ashton , Clarksville , Court ",
    " procuratorate ": "Ashton , Clarksville , Procuratorate "
  },
  " chiefJudge ": "M. Gray ",
  " judge ": "H. Torres ",
  " clerk ": "A. Brown ",
  " defendant ": {
    " name ": "J. Gonzalez ",
    " gender ": " female ",
    " birthdate ": "15th , June , 1999 ",
    " residence ": "53 , Bayside street , Clarksville ",
    " ethnicity ": " Hispanic ",
    " occupation ": " Senior Inspector , Clarksville Tax Department "
  },
  " defenseLawyer ": {
    " name ": "M. Smith ",
    " lawFirm ": " Clarksville Legal Associates "
  },
  " caseProcess ": [
    {
      " event ": " Case Filing and Investigation ",
      " date ": "1st March 2023 "
    },
    {
      " event ": " Detention Measures Taken ",
      " date ": "5th March 2023 "
    },
    {
      " event ": " Criminal Detention ",
      " date ": "10 th March 2023 "
    },
    {
      " event ": " Arrest ",
      " date ": "12 th March 2023 "
    }
  ],
  " criminalFacts ": [
    {
      " crimeName ": " Crime of Bending the Law for Personal Gain ",
      " details ": [
        {
          " timePeriod ": " January 2022 - December 2022 ",
          " behavior ": "J. Gonzalez utilized her position as Senior Inspector in ... ",
          " evidence ": " Email correspondences between J. Gonzalez and ... "
        }
      ]
    }
  ],
  " legalProcedure ": {
    " judgmentDate ": "15 th May 2023 ",
    " judgmentResult ": [
      {
        " crimeName ": " Crime of Bending the Law for Personal Gain ",
        " sentence ": "5 years of fixed - term imprisonment ",
        " sentencingConsiderations ": " The defendant ’s position of trust ... "
      }
    ]
  }
}

③ドキュメント(文書)の生成

LLM(GPT-4oモデル)により、設定ファイルから特定のシナリオに合わせて変換することで、文書を生成します。最終的に、これが参照情報となって質問と回答が生成されます。

ドキュメントの例

{
" content ":"Ashton , Clarksville Court
Case No. XYZ12345
** IN THE COURT OF ASHTON , CLARKSVILLE **
** CRIMINAL JUDGMENT **
** COURT AND PROSECUTOR INFORMATION **
** Court :**
Ashton , Clarksville Court
** Prosecutorate :**
Ashton , Clarksville Procuratorate
** Chief Judge :**
M. Gray
** Judge :**
H. Torres
** Clerk :**
A. Brown
** DEFENDANT AND DEFENSE LAWYER INFORMATION **
** Defendant :**
Name : J. Gonzalez
Gender : Female
Birthdate : 15 th June 1999
Residence : 53 Bayside Street , Clarksville
Ethnicity : Hispanic
Occupation : Senior Inspector , Clarksville Tax Department
** Defense Lawyer :**
Name : M. Smith
Law Firm : Clarksville Legal Associates
** CASE PROCEDURES **
The case against J. Gonzalez commenced with an investigation following a suspicious tip received by
the Ashton , Clarksville Procuratorate on 1st March 2023. The investigation revealed substantial
evidence implicating the defendant in the Crime of Bending the Law for Personal Gain .
Consequently , J. Gonzalez was taken into detention on 5th March 2023. Criminal detention was
applied on 10 th March 2023 , and the defendant was formally arrested on 12 th March 2023.
** CASE STATEMENT **
The Crime of Bending the Law for Personal Gain by the defendant , J. Gonzalez , occurred over a span
of one year , from January 2022 to December 2022. During this period , J. Gonzalez exploited her
position as a Senior Inspector within the Clarksville Tax Department to manipulate tax audits ,
reports , and reduce penalty fees for several conspiring local businesses in exchange for
substantial financial bribes . This court will detail the pertinent events chronologically to
provide a comprehensive understanding of the criminal activities committed .
** Charge :**
Crime of Bending the Law for Personal Gain as per Article 397 of the applicable law .
** EVIDENCE DESCRIPTION **
**1. January 2022 - December 2022: Manipulation of Tax Audits in Exchange for Bribes **
During the year of 2022 , J. Gonzalez engaged in illicit activities using her privileged position .
Emails confirmed numerous correspondences between J. Gonzalez and various local business owners .
These emails explicitly outlined her agreement to manipulate tax audits and financial reports
for monetary compensation . Bank statements revealed a series of significant transactions
amounting to $125 ,000 deposited into an account owned by J. Gonzalez from suspicious sources .
Testimonies from several business owners corroborated these findings , revealing a consistent
pattern of bribery and exploitation .
...
** Date of Judgment :**
15 th May 2023
** ___ **
M. Gray , Chief Judge
** ___ **
H. Torres , Judge
** ___ **
A. Brown , Clerk "
}

④評価データセット(質問、回答、参考文献のペア)の作成

質問と回答の作成

設定ファイルを使って、GPT-4oが質問と回答を作成します。これもRAGEvalのリポジトリにあるスクリプトを使って自動生成ができます。この時点では参照情報を使っていないので、次のステップで参照情報に合わせて調整していきます。

{
  " qa_fact_based ": [
    {
      " Question Type ": " Factual Question ",
      " Question ": " According to the court judgment of Ashton , Clarksville , Court , what was the\njudgment date ?",
      "ref": [
        " Date of Judgment : 15 th May 2023 "
      ],
      " Answer ": "15 th May 2023. "
    }
  ],
  " qa_multi_hop ": [
    {
      " Question Type ": "Multi -hop Reasoning Question ",
      " Question ": " According to the judgment of Ashton , Clarksville , Court , how many instances\nof bending the law for personal gain did J. Gonzalez commit ?",
      "ref": [
        "The Crime of Bending the Law for Personal Gain by the defendant , J. Gonzalez ,\noccurred over a span of one year , from January 2022 to December 2022. ",
        " During this period , J. Gonzalez exploited her position as a Senior Inspector within\nthe Clarksville Tax Department to manipulate tax audits , reports , and reduce penalty fees for\nseveral conspiring local businesses in exchange for substantial financial bribes .",
        "In March 2022 , J. Gonzalez revised the tax records for Sunrise Construction Inc . ,\ndrastically reducing their tax liability after receiving a bribe of $50 ,000. ",
        "In exchange for $30 ,000 , J. Gonzalez facilitated the undue reduction of penalty\nfees levied on Downtown Boutique Ltd . for late tax submissions .",
        "The most egregious of the offenses occurred in November 2022 , when J. Gonzalez\ndisclosed sensitive and confidential information about ongoing tax investigations to executives\nat Riven Pharmaceuticals , securing a bribe of $45 ,000. "
      ],
      " Answer ": " According to the judgment , J. Gonzalez committed four instances of bending\nthe law for personal gain : manipulating tax audits and reports , altering tax records , reducing\npenalty fees , and providing confidential information ."
    }
  ],
  " qa_summary ": [
    {
      " Question Type ": " Summary Question ",
      " Summary Content ": " Facts of the crime ",
      " Question ": " According to the judgment of Ashton , Clarksville , Court , summarize the\nfacts of J. Gonzalez ’s crimes .",
      "ref": [
        "The Crime of Bending the Law for Personal Gain by the defendant , J. Gonzalez ,\noccurred over a span of one year , from January 2022 to December 2022. ",
        " During this period , J. Gonzalez exploited her position as a Senior Inspector within\nthe Clarksville Tax Department to manipulate tax audits , reports , and reduce penalty fees for\nseveral conspiring local businesses in exchange for substantial financial bribes .",
        "In March 2022 , J. Gonzalez revised the tax records for Sunrise Construction Inc . ,\ndrastically reducing their tax liability after receiving a bribe of $50 ,000. ",
        "In exchange for $30 ,000 , J. Gonzalez facilitated the undue reduction of penalty\nfees levied on Downtown Boutique Ltd . for late tax submissions .",
        "The most egregious of the offenses occurred in November 2022 , when J. Gonzalez\ndisclosed sensitive and confidential information about ongoing tax investigations to executives\nat Riven Pharmaceuticals , securing a bribe of $45 ,000. "
      ],
      " Answer ": "J. Gonzalez , a Senior Inspector at the Clarksville Tax Department , committed\nthe crime of bending the law for personal gain . From January 2022 to December 2022 , she\nmanipulated tax audits and reports in exchange for bribes from multiple local businesses . In\nMarch 2022 , she altered tax records to reduce the tax liability for Sunrise Construction Inc .\nafter receiving $50 ,000. In August 2022 , she reduced penalty fees for late tax submission of\nDowntown Boutique Ltd . in exchange for $30 ,000. In November 2022 , she provided confidential\ninformation about ongoing tax investigations to Riven Pharmaceuticals in exchange for $45 ,000. "
    }
  ]
}

参照情報の抜粋

生成された質問と回答から、前のステップで生成したドキュメントから参照情報を抜粋します。

回答と参照情報の最適化

参照情報を元に回答を修正します。参照情報で回答に含まれていない内容がある場合は追加します。逆に、回答に参照情報による裏付けがない情報がある場合には、関連する参照情報を見つけるか、裏付けのない部分を削除します。

このステップにより、ハルシネーションが減少し、正確な回答であることが担保されます。

キーポイントの生成

質問に対する回答から、重要な情報を要約したキーポイントを抽出します。これは、文脈に基づいた事前設定されたプロンプトを使い、様々な質問や状況に対応しています。通常、1つの回答から3〜5個のキーポイントが作られ、重要な事実や関連する推論、結論が含まれます。

このキーポイントを抽出することで、生成された内容を正確かつ信頼性高く評価できるようになります。

キーポイントの例

{
" prompt ":"In this task , you will be given a question and a standard answer . Based on the standard
answer , you need to summarize the key points necessary to answer the question . List them as
follows :
1. ...
2. ...
and so on , as needed .
Example :
Question : What are the significant changes in the newly amended Company Law ?
Standard Answer : The 2023 amendment to the Company Law introduced several significant changes .
Firstly , the amendment strengthens the regulation of corporate governance , specifically
detailing the responsibilities of the board of directors and the supervisory board [1]. Secondly
, it introduces mandatory disclosure requirements for Environmental , Social , and Governance ( ESG
) reports [2]. Additionally , the amendment adjusts the corporate capital system , lowering the
minimum registered capital requirements [3]. Finally , the amendment introduces special support
measures for small and medium - sized enterprises to promote their development [4].
Key Points :
1. The amendment strengthens the regulation of corporate governance , detailing the responsibilities
of the board of directors and the supervisory board .
2. It introduces mandatory disclosure requirements for ESG reports .
3. It adjusts the corporate capital system , lowering the minimum registered capital requirements .
4. It introduces special support measures for small and medium - sized enterprises .
Question : Comparing the major asset acquisitions of Huaxia Entertainment Co. , Ltd . in 2017 and Top
Shopping Mall in 2018 , which company ’s acquisition amount was larger ?
Standard Answer : Huaxia Entertainment Co. , Ltd . ’s asset acquisition amount in 2017 was larger [1] ,
amounting to 120 million yuan [2] , whereas Top Shopping Mall ’s asset acquisition amount in 2018
was 50 million yuan [3].
Key Points :
1. Huaxia Entertainment Co. , Ltd . ’s asset acquisition amount in 2017 was larger .
2. Huaxia Entertainment Co. , Ltd . ’s asset acquisition amount was 120 million yuan in 2017.
3. Top Shopping Mall ’s asset acquisition amount was 50 million yuan in 2018.
Question : Comparing the timing of sustainability and social responsibility initiatives by Meihome
Housekeeping Services Co. , Ltd . and Cultural Media Co. , Ltd . , which company initiated these
efforts earlier ?
Standard Answer : Meihome Housekeeping Services Co. , Ltd . initiated its sustainability and social
responsibility efforts earlier [1] , in December 2018 [2] , whereas Cultural Media Co. , Ltd .
initiated its efforts in December 2019 [3].
Key Points :
1. Meihome Housekeeping Services Co. , Ltd . initiated its sustainability and social responsibility
efforts earlier .
2. Meihome Housekeeping Services Co. , Ltd . initiated its efforts in December 2018.
3. Cultural Media Co. , Ltd . initiated its efforts in December 2019.

Question : Based on the 2017 Environmental and Social Responsibility Report of Green Source
Environmental Protection Co. , Ltd. , how did the company improve community relations through
participation in charitable activities , community support and development projects , and public
service projects ?
Standard Answer : Green Source Environmental Protection Co. , Ltd . improved community relations
through several social responsibility activities . Firstly , in March 2017 , the company
participated in or funded charitable activities and institutions to support education , health ,
and poverty alleviation , enhancing the company ’s social image and brand recognition [1].
Secondly , in June 2017 , the company invested in the local community , supporting education ,
health , and social development projects , deepening its connection with the community and
promoting overall community well - being and development [2]. Finally , in August 2017 , the company
participated in public service projects such as urban greening and public health improvement
projects , enhancing the quality of life in the community and promoting sustainable development
[3]. These measures enhanced public perception of the company and improved community relations
[4].
Key Points :
1. In March 2017 , the company participated in or funded charitable activities and institutions to
support education , health , and poverty alleviation , enhancing the company ’s social image and
brand recognition .
2. In June 2017 , the company invested in the local community , supporting education , health , and
social development projects , deepening its connection with the community and promoting overall
community well - being and development .
3. In August 2017 , the company participated in public service projects such as urban greening and
public health improvement projects , enhancing the quality of life in the community and promoting
sustainable development .
4. These measures enhanced public perception of the company and improved community relations .
Test Case :
Question : { question }
Standard Answer : { ground_truth }
Key Points :"

評価用データセットの評価

質問タイプ

RAGEvalでは以下の質問タイプに分類されたそれぞれの質問を使ってRAGEvalの手法によって作成された評価用のデータセットを評価しています。

Question Type Definition
Single-document QA
Factual 特定の参照内の詳細(例:レポート内の企業利益、法的ケースの判決、医療記録の症状など)をターゲットとした質問で、RAGのリトリーバル精度をテストするものです。
Summarization 包括的な回答が求められる質問で、関連するすべての情報をカバーし、RAGリトリーバルのリコール率を評価することを目的としています。
Multi-hop Reasoning ドキュメント内のイベントや詳細の間にある論理的関係を含む質問で、推論の連鎖を形成し、RAGの論理的推論能力を評価します。
Multi-document QA
Information Integration 2つのドキュメントから情報を組み合わせる必要がある質問で、通常、異なる情報の断片を含み、クロスドキュメントのリトリーバル精度をテストします。
Numerical Comparison データの断片を見つけて比較し結論を引き出す必要がある質問で、モデルの要約能力に焦点を当てています。
Temporal Sequence 情報の断片からイベントの時間的順序を判断する必要がある質問で、モデルの時間的推論スキルをテストします。
Unanswerable Questions
Unanswerable スキーマから記事への生成過程で情報が欠落する場合に発生する質問で、対応する情報の断片が存在しないか、回答には不十分な情報しかない場合を指します。

他の手法で生成したデータセットとの比較

下の結果より、Zero-ShotプロンプトやOne-Shotプロンプトで作成したデータセットより、どの指標でも上回っていることが確認できます。

総評

残念な点としては、人間によってシード文書を作成やスキーマファイルのレビューが必要があるというところです。そのため、完全な自動化は難しく結局人間によってレビューされる必要があり、またスキーマファイルがJSONファイルということもあり、レビューする人がAIやシステムの知見が必要になります。そのような人材は希少なので、実用的ではないと感じました。

また、論文内やリポジトリのソースコード内にもスキーマを作成するためのシード文書の例が載ってないので、具体的にどのような文書を用意すればいいのかが分からないです。

この辺りのドキュメントが豊富にあって、第三者がリポジトリやドキュメントを見て実装できるのであれば、使ってみたいと思いました。

Discussion