📘
Momento Vector Indexを使って、PDFファイルの検索をしてみる
Momento Advent Calendar 2023の24日目の投稿になります。
前提条件
モジュールのバージョン
langchain 0.0.352
momento 1.16.
python-dotenv 1.0.0
PDF検索するコードの説明
もろもろ必要なモジュールをインポート。
from pathlib import Path
from langchain.document_loaders import UnstructuredURLLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.prompts import PromptTemplate
from langchain.vectorstores import FAISS
from langchain.embeddings import VertexAIEmbeddings
from langchain.vectorstores import MomentoVectorIndex
import os
from dotenv import load_dotenv
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000,chunk_overlap=0)
MOMENTO_API_KEY環境変数をセット。
load_dotenv()
MOMENTO_API_KEYは、Momento Consoleから作成します。
インターネット上のPDFファイルをLangchainのLoaderでよみこみます。
loader = UnstructuredURLLoader(urls=['https://storage.googleapis.com/deepmind-media/gemini/gemini_1_report.pdf','https://storage.googleapis.com/deepmind-media/AlphaCode2/AlphaCode2_Tech_Report.pdf'])
document = loader.load()
読み込んだものをchunk_sizeで分割します。
doc_splits = text_splitter.split_documents(document)
FAISSに登録する場合は、こちらで行えます。
db = FAISS.from_documents(doc_splits, VertexAIEmbeddings())
MomentoVectorIndexの場合は、こちらにおきかえればVector indexに登録することができます。
vector_db = MomentoVectorIndex.from_documents(
doc_splits, VertexAIEmbeddings(), index_name="gemini_doc_index"
)
インデックス登録できればMomento consoleからでも確認できます。
インデックスまだ作成されていない状態でも上記実行することで、作成されます。
読み込んだファイルが、GeminiとAlphaCode の資料になるので、PDFにかかれていることを検索してみます。
query = "Please explain AlphaCode 2 feature ."
docs = vector_db.similarity_search(query, 3)
for index, doc in enumerate(docs):
print(f"No{index+1}. {doc.page_content}")
下記のように検索することができました。
No1. AlphaCode 2 Technical Report
AlphaCode Team, Google DeepMind
AlphaCode (Li et al., 2022) was the first AI system to perform at the level of the median competitor in competitive programming, a difficult reasoning task involving advanced maths, logic and computer science. This paper introduces AlphaCode 2, a new and enhanced system with massively improved performance, powered by Gemini (Gemini Team, Google, 2023). AlphaCode 2 relies on the combination of powerful language models and a bespoke search and reranking mechanism. When evaluated on the same platform as the original AlphaCode, we found that AlphaCode 2 solved 1.7× more problems, and performed better than 85% of competition participants.
Introduction
No2. AlphaCode 2 Technical Report
AlphaCode Team, Google DeepMind
AlphaCode (Li et al., 2022) was the first AI system to perform at the level of the median competitor in competitive programming, a difficult reasoning task involving advanced maths, logic and computer science. This paper introduces AlphaCode 2, a new and enhanced system with massively improved performance, powered by Gemini (Gemini Team, Google, 2023). AlphaCode 2 relies on the combination of powerful language models and a bespoke search and reranking mechanism. When evaluated on the same platform as the original AlphaCode, we found that AlphaCode 2 solved 1.7× more problems, and performed better than 85% of competition participants.
Introduction
No3. AlphaCode 2 Technical Report
AlphaCode Team, Google DeepMind
AlphaCode (Li et al., 2022) was the first AI system to perform at the level of the median competitor in competitive programming, a difficult reasoning task involving advanced maths, logic and computer science. This paper introduces AlphaCode 2, a new and enhanced system with massively improved performance, powered by Gemini (Gemini Team, Google, 2023). AlphaCode 2 relies on the combination of powerful language models and a bespoke search and reranking mechanism. When evaluated on the same platform as the original AlphaCode, we found that AlphaCode 2 solved 1.7× more problems, and performed better than 85% of competition participants.
Introduction
まとめ
VPCとか必要なく、Momento vector indexを利用することで、手軽にVector indexが使えることがわかると思います。
Discussion