
Haystackチュートリアルをやってみる: Utilizing Existing FAQs for Question Answering



  • 利点
    • 推論時に非常に速い。
    • 既存のFAQデータを利用する。
    • 回答にかなりのコントロールが可能。
  • 欠点:
    • 汎用性: FAQに既存する質問に似た質問のみに答えられる。




GPUを有効にする必要があるので、「ノートブックの設定」で"T4 GPU"を使用する。



pip install --upgrade pip
pip install farm-haystack[colab,inference]


from haystack.telemetry import tutorial_running



import logging

logging.basicConfig(format="%(levelname)s - %(name)s -  %(message)s", level=logging.WARNING)


from haystack.document_stores import InMemoryDocumentStore

document_store = InMemoryDocumentStore()


from haystack.nodes import EmbeddingRetriever

retriever = EmbeddingRetriever(


import pandas as pd
from haystack.utils import fetch_archive_from_http

# ダウンロード
doc_dir = "data/tutorial4"
s3_url = ""
fetch_archive_from_http(url=s3_url, output_dir=doc_dir)

# "question"、"answer"、およびカスタムメタデータを含むデータフレームを取得
df = pd.read_csv(f"{doc_dir}/small_faq_covid.csv")
# 最小限のクリーニング
df.fillna(value="", inplace=True)
df["question"] = df["question"].apply(lambda x: x.strip())

# FAQの質問のための埋め込みを作成
# ほとんどの他の検索ユースケースとは対照的に、ここではドキュメントの内容から埋め込みを作成するのではなく、
# "入ってくる質問" <-> "保存された質問"をマッチさせたいため、追加のテキストフィールド"question"から埋め込みを作成します。
questions = list(df["question"].values)
df["embedding"] = retriever.embed_queries(queries=questions).tolist()
df = df.rename(columns={"question": "content"})

# Dataframeを辞書のリストに変換し、DocumentStoreにインデックスする
docs_to_index = df.to_dict(orient="records")
  • COVID-19のFAQのCSVをダウンロードしてきてPandasのデータフレームを作成する
  • データフレームの"question"のEmbeddingsを作成する。EmbeddingRetrieverにはembed_queriesというメソッドが生えていてこれでEmbeddingsを作成できる。
  • データフレームをDocumentStoreに追加する


from haystack.pipelines import FAQPipeline
from haystack.utils import print_answers

pipe = FAQPipeline(retriever=retriever)

prediction ="How is the virus spreading?", params={"Retriever": {"top_k": 3}})

print_answers(prediction, details="medium")



'Query: How is the virus spreading?'
[   {   'answer': 'This virus was first detected in Wuhan City, Hubei '
                  'Province, China. The first infections were linked to a live '
                  'animal market, but the virus is now spreading from '
                  'person-to-person. It’s important to note that '
                  'person-to-person spread can happen on a continuum. Some '
                  'viruses are highly contagious (like measles), while other '
                  'viruses are less so.\n'
                  'The virus that causes COVID-19 seems to be spreading easily '
                  'and sustainably in the community (“community spread”) in '
                  'some affected geographic areas. Community spread means '
                  'people have been infected with the virus in an area, '
                  'including some who are not sure how or where they became '
                  'Learn what is known about the spread of newly emerged '
        'context': 'This virus was first detected in Wuhan City, Hubei '
                   'Province, China. The first infections were linked to a '
                   'live animal market, but the virus is now spreading from '
                   'person-to-person. It’s important to note that '
                   'person-to-person spread can happen on a continuum. Some '
                   'viruses are highly contagious (like measles), while other '
                   'viruses are less so.\n'
                   'The virus that causes COVID-19 seems to be spreading '
                   'easily and sustainably in the community (“community '
                   'spread”) in some affected geographic areas. Community '
                   'spread means people have been infected with the virus in '
                   'an area, including some who are not sure how or where they '
                   'became infected.\n'
                   'Learn what is known about the spread of newly emerged '
        'score': 0.9358832836151123},
    {   'answer': 'The novel coronavirus SARS-CoV-2 spreads from person to '
                  'person. Droplet infection is the main mode of transmission. '
                  'Transmission can take place directly, from '
                  'person-to-person, or indirectly through contact between '
                  'hands and the mucous membranes of the mouth, the nose or '
                  'the conjunctiva of the eyes. There have been reports of '
                  'persons who were infected by individuals who had only shown '
                  'slight or non-specific symptoms of disease. The percentage '
                  'of asymptomatic cases is unclear; according to data from '
                  'WHO and China, however, such cases do not play a '
                  'significant role in the spread of SARS-CoV-2.',
        'context': 'The novel coronavirus SARS-CoV-2 spreads from person to '
                   'person. Droplet infection is the main mode of '
                   'transmission. Transmission can take place directly, from '
                   'person-to-person, or indirectly through contact between '
                   'hands and the mucous membranes of the mouth, the nose or '
                   'the conjunctiva of the eyes. There have been reports of '
                   'persons who were infected by individuals who had only '
                   'shown slight or non-specific symptoms of disease. The '
                   'percentage of asymptomatic cases is unclear; according to '
                   'data from WHO and China, however, such cases do not play a '
                   'significant role in the spread of SARS-CoV-2.',
        'score': 0.8732742071151733},
    {   'answer': 'Coronaviruses are a large family of viruses. Some cause '
                  'illness in people, and others, such as canine and feline '
                  'coronaviruses, only infect animals. Rarely, animal '
                  'coronaviruses that infect animals have emerged to infect '
                  'people and can spread between people. This is suspected to '
                  'have occurred for the virus that causes COVID-19. Middle '
                  'East Respiratory Syndrome (MERS) and Severe Acute '
                  'Respiratory Syndrome (SARS) are two other examples of '
                  'coronaviruses that originated from animals and then spread '
                  'to people. More information about the source and spread of '
                  'COVID-19 is available on the Situation Summary: Source and '
                  'Spread of the Virus.',
        'context': 'Coronaviruses are a large family of viruses. Some cause '
                   'illness in people, and others, such as canine and feline '
                   'coronaviruses, only infect animals. Rarely, animal '
                   'coronaviruses that infect animals have emerged to infect '
                   'people and can spread between people. This is suspected to '
                   'have occurred for the virus that causes COVID-19. Middle '
                   'East Respiratory Syndrome (MERS) and Severe Acute '
                   'Respiratory Syndrome (SARS) are two other examples of '
                   'coronaviruses that originated from animals and then spread '
                   'to people. More information about the source and spread of '
                   'COVID-19 is available on the Situation Summary: Source and '
                   'Spread of the Virus.',
        'score': 0.6935744285583496}]

