
新しくなったLlamaIndexのGetting Startedをやってみる


Jupyter Labでやる。

$ pip install jupyterlab ipywidgets
$ jupyter-lab --ip=''


!pip install llama-index


公式レポジトリにサンプルが含まれているのでcloneしてくる。おなじみPaul Grahamのエッセイ。

!git clone
%cd llama_index/examples/paul_graham_essay


DavinciComparison.ipynb       TestEssay.ipynb	index_tree_insert.json
GPT4Comparison.ipynb	      data		index_with_query.json
InsertDemo.ipynb	      index.json	splitting_1.txt
KeywordTableComparison.ipynb  index_list.json	splitting_2.txt
SentenceSplittingDemo.ipynb   index_table.json


!ls data


import os


from llama_index import VectorStoreIndex, SimpleDirectoryReader

documents = SimpleDirectoryReader('data').load_data()
index = VectorStoreIndex.from_documents(documents)


query_engine = index.as_query_engine()
response = query_engine.query("What did the author do growing up?")


The author grew up writing essays, learning Italian, exploring Florence, painting people, working with computers, attending RISD, living in a rent-stabilized apartment, building an online store builder, editing Lisp expressions, publishing essays online, writing essays, painting still life, working on spam filters, cooking for groups, and buying a building in Cambridge.


import logging
import sys

logging.basicConfig(stream=sys.stdout, level=logging.DEBUG)


  • ユーザの入力値をembeddings APIに渡してvector化
  • Paul Grahamのvectorインデックス化データから類似したものを検索
  • それをコンテキストとしてプロンプトに含めてcompletion APIに渡して、回答を生成する。
DEBUG:openai:message='Request to OpenAI API' method=post path=
DEBUG:openai:api_version=None data='{"input": ["What did the author do growing up?"], "model": "text-embedding-ada-002", "encoding_format": "base64"}' message='Post details'
DEBUG:llama_index.indices.utils:> Top 2 nodes:
> [Node ef2bceb9-2f7d-4881-919e-a67cb843f194] [Similarity score:             0.81405] I could write essays again, I wrote a bunch about topics I'd had stacked up. I kept writing essay...
> [Node 5eb083c4-4845-4f7e-b838-6a26e4921113] [Similarity score:             0.811131] page views. What on earth had happened? The referring urls showed that someone had posted it on S...
DEBUG:openai:message='Request to OpenAI API' method=post path=
DEBUG:openai:api_version=None data='{"prompt": ["Context information is below. \\n---------------------\\nI could write essays again, I wrote a bunch about topics I\'d had stacked up. I kept writing essays through 2020, but I also started to think about other things I could work on. How should I choose what to do? Well, how had I chosen what to work on in the past? I wrote an essay for myself to answer that question, and I was surprised how long and messy 
Jessica was in charge of marketing at a Boston investment
Given the context information and not prior knowledge, answer the question: What did the author do growing up?

DEBUG:llama_index.indices.response.base_builder:> Initial response: 
The author grew up writing essays, learning Italian, exploring Florence, painting people, working with computers, attending RISD, living in a rent-stabilized apartment, building an online store builder, editing Lisp expressions, publishing essays online, writing essays, painting still life, working on spam filters, cooking for groups, and buying a building in Cambridge.
INFO:llama_index.token_counter.token_counter:> [get_response] Total LLM token usage: 1880 tokens
INFO:llama_index.token_counter.token_counter:> [get_response] Total embedding token usage: 0 tokens
The author grew up writing essays, learning Italian, exploring Florence, painting people, working with computers, attending RISD, living in a rent-stabilized apartment, building an online store builder, editing Lisp expressions, publishing essays online, writing essays, painting still life, working on spam filters, cooking for groups, and buying a building in Cambridge.




!ls -l
!ls -l storage


合計 952
-rw-rw-r-- 1 kun432 kun432   7773  6月  7 18:14 DavinciComparison.ipynb
-rw-rw-r-- 1 kun432 kun432  24692  6月  7 18:14 GPT4Comparison.ipynb
-rw-rw-r-- 1 kun432 kun432   8402  6月  7 18:14 InsertDemo.ipynb
-rw-rw-r-- 1 kun432 kun432  19039  6月  7 18:14 KeywordTableComparison.ipynb
-rw-rw-r-- 1 kun432 kun432   6987  6月  7 18:14 SentenceSplittingDemo.ipynb
-rw-rw-r-- 1 kun432 kun432  24866  6月  7 18:14 TestEssay.ipynb
drwxrwxr-x 2 kun432 kun432   4096  6月  7 18:59 data
-rw-rw-r-- 1 kun432 kun432 172219  6月  7 18:14 index.json
-rw-rw-r-- 1 kun432 kun432 156103  6月  7 18:14 index_list.json
-rw-rw-r-- 1 kun432 kun432 159574  6月  7 18:14 index_table.json
-rw-rw-r-- 1 kun432 kun432  36104  6月  7 18:14 index_tree_insert.json
-rw-rw-r-- 1 kun432 kun432 166103  6月  7 18:14 index_with_query.json
-rw-rw-r-- 1 kun432 kun432  78252  6月  7 18:14 splitting_1.txt
-rw-rw-r-- 1 kun432 kun432  75176  6月  7 18:14 splitting_2.txt
drwxrwxr-x 2 kun432 kun432   4096  6月  7 18:58 storage
合計 772
-rw-rw-r-- 1 kun432 kun432  90833  6月  7 18:58 docstore.json
-rw-rw-r-- 1 kun432 kun432   1927  6月  7 18:58 index_store.json
-rw-rw-r-- 1 kun432 kun432 691051  6月  7 18:58 vector_store.json


from llama_index import StorageContext, load_index_from_storage

storage_context = StorageContext.from_defaults(persist_dir="./storage")
index = load_index_from_storage(storage_context)

query_engine = index.as_query_engine()
response = query_engine.query("What did the author do growing up?")