🦙

LlamaIndex 🦙 0.8.7のStarter TutorialをGoogle Colaboratoryで行う

Tatsuya Otake

2023/10/19に公開

LlamaIndexを利用するにあたってまずはこちらのドキュメントにあるStarter TutorialをGoogle Colaboratory上で行いました。基本的にはこのドキュメント通りに進めれば問題ないのですが、若干躓いた箇所もあったのでチュートリアルの手順を共有させていただきます！

1. インストール

!pip install llama-index

2. ダウンロード

サンプルが公式レポジトリにあるので、それをクローンします。

!git clone https://github.com/jerryjliu/llama_index.git

3. paul_graham_essayフォルダに移動

ここには、Paul Grahamのエッセイ「What I Worked On」に関するLlamaIndexの例が含まれています。

%cd llama_index/examples/paul_graham_essay

4. OpneAIのAPIキーを環境変数に設定する

※ チュートリアル上にはこのコードがないので注意

import os
os.environ["OPENAI_API_KEY"] = "XXXXXXXXXX"

5. インデックスの作成

ここはチュートリアルにあるコードをそのまま実行します。
dataフォルダ下にあるファイル(エッセイ)を読み込んでインデックスを作成しています。

from llama_index import VectorStoreIndex, SimpleDirectoryReader

documents = SimpleDirectoryReader('data').load_data()
index = VectorStoreIndex.from_documents(documents)

6. インデックスの検索

ここもチュートリアルにあるコードをそのまま実行します。

query_engine = index.as_query_engine()
response = query_engine.query("What did the author do growing up?")
print(response)

結果は以下の通りです。

The author worked on writing and programming outside of school before college. They wrote short stories and tried writing programs on an IBM 1401 computer. They also built a microcomputer kit and started programming on it, writing simple games and a word processor.

7. ログの有効化

※ チュートリアルにあるコードにはforce=Trueがないので注意

import logging
import sys

logging.basicConfig(stream=sys.stdout, level=logging.DEBUG, force=True)
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))

再度、インデックス検索のコードを実行する。

query_engine = index.as_query_engine()
response = query_engine.query("What did the author do growing up?")

結果は、以下の通りです。
裏でどういう処理が行われているかを確認することができます。

DEBUG:openai:message='Request to OpenAI API' method=post path=https://api.openai.com/v1/embeddings
message='Request to OpenAI API' method=post path=https://api.openai.com/v1/embeddings
DEBUG:openai:api_version=None data='{"input": ["What did the author do growing up?"], "model": "text-embedding-ada-002", "encoding_format": "base64"}' message='Post details'
api_version=None data='{"input": ["What did the author do growing up?"], "model": "text-embedding-ada-002", "encoding_format": "base64"}' message='Post details'
DEBUG:urllib3.util.retry:Converted retries value: 2 -> Retry(total=2, connect=None, read=None, redirect=None, status=None)
Converted retries value: 2 -> Retry(total=2, connect=None, read=None, redirect=None, status=None)
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): api.openai.com:443
Starting new HTTPS connection (1): api.openai.com:443
DEBUG:urllib3.connectionpool:https://api.openai.com:443 "POST /v1/embeddings HTTP/1.1" 200 None
https://api.openai.com:443 "POST /v1/embeddings HTTP/1.1" 200 None
DEBUG:openai:message='OpenAI API response' path=https://api.openai.com/v1/embeddings processing_ms=54 request_id=a7e125fa2315015fdf9ab9b9366f3472 response_code=200
message='OpenAI API response' path=https://api.openai.com/v1/embeddings processing_ms=54 request_id=a7e125fa2315015fdf9ab9b9366f3472 response_code=200
DEBUG:llama_index.indices.utils:> Top 2 nodes:
> [Node 521da104-730c-44d7-bab2-72fbcc6c8a59] [Similarity score:             0.820828] What I Worked On

February 2021

Before college the two main things I worked on, outside of schoo...
> [Node 198fe1c0-3553-4725-b4a1-b467236c8dc2] [Similarity score:             0.812763] There, right on the wall, was something you could make that would last.Paintings didn't become ob...
> Top 2 nodes:
> [Node 521da104-730c-44d7-bab2-72fbcc6c8a59] [Similarity score:             0.820828] What I Worked On

February 2021

Before college the two main things I worked on, outside of schoo...
> [Node 198fe1c0-3553-4725-b4a1-b467236c8dc2] [Similarity score:             0.812763] There, right on the wall, was something you could make that would last.Paintings didn't become ob...
DEBUG:openai:message='Request to OpenAI API' method=post path=https://api.openai.com/v1/chat/completions
message='Request to OpenAI API' method=post path=https://api.openai.com/v1/chat/completions
DEBUG:openai:api_version=None data='{"messages": [{"role": "system", "content": "You are an expert Q&A system that is trusted around the world.\\nAlways answer the query using the provided context information, and not prior knowledge.\\nSome rules to follow:\\n1. Never directly reference the given context in your answer.\\n2. Avoid statements like \'Based on the context, ...\' or \'The context information ...\' or anything along those lines."}, {"role": "user", "content": "Context information is below.\\n---------------------\\nWhat I Worked On\\n\\nFebruary 2021\\n\\nBefore college the two main things I worked on, outside of school, were writing and programming.I didn\'t write essays.I wrote what beginning writers were supposed to write then, and probably still are: short stories.My stories were awful.They had hardly any plot, just characters with strong feelings, which I imagined made them deep.The first programs I tried writing were on the IBM 1401 that our school district used for what was then called \\"data processing.\\"This was in 9th grade, so I was 13 or 14.The school district\'s 1401 happened to be in the basement of our junior high school, and my friend Rich Draves and I got permission to use it.It was like a mini Bond villain\'s lair down there, with all these alien-looking machines \\u2014 CPU, disk drives, printer, card reader \\u2014 sitting up on a raised floor under bright fluorescent lights.The language we used was an early version of Fortran.You had to type programs on punch cards, then stack them in the card reader and press a button to load the program into memory and run it.The result would ordinarily be to print something on the spectacularly loud printer.I was puzzled by the 1401.I couldn\'t figure out what to do with it.And in retrospect there\'s not much I could have done with it.The only form of input to programs was data stored on punched cards, and I didn\'t have any data stored on punched cards.The only other option was to do things that didn\'t rely on any input, like calculate approximations of pi, but I didn\'t know enough math to do anything interesting of that type.So I\'m not surprised I can\'t remember any programs I wrote, because they can\'t have done much.My clearest memory is of the moment I learned it was possible for programs not to terminate, when one of mine didn\'t.On a machine without time-sharing, this was a social as well as a technical error, as the data center manager\'s expression made clear.With microcomputers, everything changed.Now you could have a computer sitting right in front of you, on a desk, that could respond to your keystrokes as it was running instead of just churning through a stack of punch cards and then stopping.[1]\\n\\nThe first of my friends to get a microcomputer built it himself.It was sold as a kit by Heathkit.I remember vividly how impressed and envious I felt watching him sitting in front of it, typing programs right into the computer.Computers were expensive in those days and it took me years of nagging before I convinced my father to buy one, a TRS-80, in about 1980.The gold standard then was the Apple II, but a TRS-80 was good enough.This was when I really started programming.I wrote simple games, a program to predict how high my model rockets would fly, and a word processor that my father used to write at least one book.There was only room in memory for about 2 pages of text, so he\'d write 2 pages at a time and then print them out, but it was a lot better than a typewriter.Though I liked programming, I didn\'t plan to study it in college.In college I was going to study philosophy, which sounded much more powerful.It seemed, to my naive high school self, to be the study of the ultimate truths, compared to which the things studied in other fields would be mere domain knowledge.What I discovered when I got to college was that the other fields took up so much of the space of ideas that there wasn\'t much left for these supposed ultimate truths.All that seemed left for philosophy were edge cases that people in other fields felt could safely be ignored.I couldn\'t have put this into words when I was 18.All I knew at the time was that I kept taking philosophy courses and they kept being boring.So I decided to switch to AI.AI was in the air in the mid 1980s, but there were two things especially that made me want to work on it: a novel by Heinlein called The Moon is a Harsh Mistress, which featured an intelligent computer called Mike, and a PBS documentary that showed Terry Winograd using SHRDLU.I haven\'t tried rereading The Moon is a Harsh Mistress, so I don\'t know how well it has aged, but when I read it I was drawn entirely into its world.It seemed only a matter of time before we\'d have Mike, and when I saw Winograd using SHRDLU, it seemed like that time would be a few years at most.All you had to do was teach SHRDLU more words.There weren\'t any classes in AI at Cornell then, not even graduate classes, so I started trying to teach myself.Which meant learning Lisp, since in those days Lisp was regarded as the language of AI.\\n\\nThere, right on the wall, was something you could make that would last.Paintings didn\'t become obsolete.Some of the best ones were hundreds of years old.And moreover this was something you could make a living doing.Not as easily as you could by writing software, of course, but I thought if you were really industrious and lived really cheaply, it had to be possible to make enough to survive.And as an artist you could be truly independent.You wouldn\'t have a boss, or even need to get research funding.I had always liked looking at paintings.Could I make them?I had no idea.I\'d never imagined it was even possible.I knew intellectually that people made art \\u2014 that it didn\'t just appear spontaneously \\u2014 but it was as if the people who made it were a different species.They either lived long ago or were mysterious geniuses doing strange things in profiles in Life magazine.The idea of actually being able to make art, to put that verb before that noun, seemed almost miraculous.That fall I started taking art classes at Harvard.Grad students could take classes in any department, and my advisor, Tom Cheatham, was very easy going.If he even knew about the strange classes I was taking, he never said anything.So now I was in a PhD program in computer science, yet planning to be an artist, yet also genuinely in love with Lisp hacking and working away at On Lisp.In other words, like many a grad student, I was working energetically on multiple projects that were not my thesis.I didn\'t see a way out of this situation.I didn\'t want to drop out of grad school, but how else was I going to get out?I remember when my friend Robert Morris got kicked out of Cornell for writing the internet worm of 1988, I was envious that he\'d found such a spectacular way to get out of grad school.Then one day in April 1990 a crack appeared in the wall.I ran into professor Cheatham and he asked if I was far enough along to graduate that June.I didn\'t have a word of my dissertation written, but in what must have been the quickest bit of thinking in my life, I decided to take a shot at writing one in the 5 weeks or so that remained before the deadline, reusing parts of On Lisp where I could, and I was able to respond, with no perceptible delay \\"Yes, I think so.I\'ll give you something to read in a few days.\\"I picked applications of continuations as the topic.In retrospect I should have written about macros and embedded languages.There\'s a whole world there that\'s barely been explored.But all I wanted was to get out of grad school, and my rapidly written dissertation sufficed, just barely.Meanwhile I was applying to art schools.I applied to two: RISD in the US, and the Accademia di Belli Arti in Florence, which, because it was the oldest art school, I imagined would be good.RISD accepted me, and I never heard back from the Accademia, so off to Providence I went.I\'d applied for the BFA program at RISD, which meant in effect that I had to go to college again.This was not as strange as it sounds, because I was only 25, and art schools are full of people of different ages.RISD counted me as a transfer sophomore and said I had to do the foundation that summer.The foundation means the classes that everyone has to take in fundamental subjects like drawing, color, and design.Toward the end of the summer I got a big surprise: a letter from the Accademia, which had been delayed because they\'d sent it to Cambridge England instead of Cambridge Massachusetts, inviting me to take the entrance exam in Florence that fall.This was now only weeks away.My nice landlady let me leave my stuff in her attic.I had some money saved from consulting work I\'d done in grad school; there was probably enough to last a year if I lived cheaply.Now all I had to do was learn Italian.Only stranieri (foreigners) had to take this entrance exam.In retrospect it may well have been a way of excluding them, because there were so many stranieri attracted by the idea of studying art in Florence that the Italian students would otherwise have been outnumbered.I was in decent shape at painting and drawing from the RISD foundation that summer, but I still don\'t know how I managed to pass the written exam.I remember that I answered the essay question by writing about Cezanne, and that I cranked up the intellectual level as high as I could to make the most of my limited vocabulary.[2]\\n\\nI\'m only up to age 25 and already there are such conspicuous patterns.Here I was, yet again about to attend some august institution in the hopes of learning about some prestigious subject, and yet again about to be disappointed.\\n---------------------\\nGiven the context information and not prior knowledge, answer the query.\\nQuery: What did the author do growing up?\\nAnswer: "}], "stream": false, "model": "gpt-3.5-turbo", "temperature": 0, "max_tokens": null}' message='Post details'
api_version=None data='{"messages": [{"role": "system", "content": "You are an expert Q&A system that is trusted around the world.\\nAlways answer the query using the provided context information, and not prior knowledge.\\nSome rules to follow:\\n1. Never directly reference the given context in your answer.\\n2. Avoid statements like \'Based on the context, ...\' or \'The context information ...\' or anything along those lines."}, {"role": "user", "content": "Context information is below.\\n---------------------\\nWhat I Worked On\\n\\nFebruary 2021\\n\\nBefore college the two main things I worked on, outside of school, were writing and programming.I didn\'t write essays.I wrote what beginning writers were supposed to write then, and probably still are: short stories.My stories were awful.They had hardly any plot, just characters with strong feelings, which I imagined made them deep.The first programs I tried writing were on the IBM 1401 that our school district used for what was then called \\"data processing.\\"This was in 9th grade, so I was 13 or 14.The school district\'s 1401 happened to be in the basement of our junior high school, and my friend Rich Draves and I got permission to use it.It was like a mini Bond villain\'s lair down there, with all these alien-looking machines \\u2014 CPU, disk drives, printer, card reader \\u2014 sitting up on a raised floor under bright fluorescent lights.The language we used was an early version of Fortran.You had to type programs on punch cards, then stack them in the card reader and press a button to load the program into memory and run it.The result would ordinarily be to print something on the spectacularly loud printer.I was puzzled by the 1401.I couldn\'t figure out what to do with it.And in retrospect there\'s not much I could have done with it.The only form of input to programs was data stored on punched cards, and I didn\'t have any data stored on punched cards.The only other option was to do things that didn\'t rely on any input, like calculate approximations of pi, but I didn\'t know enough math to do anything interesting of that type.So I\'m not surprised I can\'t remember any programs I wrote, because they can\'t have done much.My clearest memory is of the moment I learned it was possible for programs not to terminate, when one of mine didn\'t.On a machine without time-sharing, this was a social as well as a technical error, as the data center manager\'s expression made clear.With microcomputers, everything changed.Now you could have a computer sitting right in front of you, on a desk, that could respond to your keystrokes as it was running instead of just churning through a stack of punch cards and then stopping.[1]\\n\\nThe first of my friends to get a microcomputer built it himself.It was sold as a kit by Heathkit.I remember vividly how impressed and envious I felt watching him sitting in front of it, typing programs right into the computer.Computers were expensive in those days and it took me years of nagging before I convinced my father to buy one, a TRS-80, in about 1980.The gold standard then was the Apple II, but a TRS-80 was good enough.This was when I really started programming.I wrote simple games, a program to predict how high my model rockets would fly, and a word processor that my father used to write at least one book.There was only room in memory for about 2 pages of text, so he\'d write 2 pages at a time and then print them out, but it was a lot better than a typewriter.Though I liked programming, I didn\'t plan to study it in college.In college I was going to study philosophy, which sounded much more powerful.It seemed, to my naive high school self, to be the study of the ultimate truths, compared to which the things studied in other fields would be mere domain knowledge.What I discovered when I got to college was that the other fields took up so much of the space of ideas that there wasn\'t much left for these supposed ultimate truths.All that seemed left for philosophy were edge cases that people in other fields felt could safely be ignored.I couldn\'t have put this into words when I was 18.All I knew at the time was that I kept taking philosophy courses and they kept being boring.So I decided to switch to AI.AI was in the air in the mid 1980s, but there were two things especially that made me want to work on it: a novel by Heinlein called The Moon is a Harsh Mistress, which featured an intelligent computer called Mike, and a PBS documentary that showed Terry Winograd using SHRDLU.I haven\'t tried rereading The Moon is a Harsh Mistress, so I don\'t know how well it has aged, but when I read it I was drawn entirely into its world.It seemed only a matter of time before we\'d have Mike, and when I saw Winograd using SHRDLU, it seemed like that time would be a few years at most.All you had to do was teach SHRDLU more words.There weren\'t any classes in AI at Cornell then, not even graduate classes, so I started trying to teach myself.Which meant learning Lisp, since in those days Lisp was regarded as the language of AI.\\n\\nThere, right on the wall, was something you could make that would last.Paintings didn\'t become obsolete.Some of the best ones were hundreds of years old.And moreover this was something you could make a living doing.Not as easily as you could by writing software, of course, but I thought if you were really industrious and lived really cheaply, it had to be possible to make enough to survive.And as an artist you could be truly independent.You wouldn\'t have a boss, or even need to get research funding.I had always liked looking at paintings.Could I make them?I had no idea.I\'d never imagined it was even possible.I knew intellectually that people made art \\u2014 that it didn\'t just appear spontaneously \\u2014 but it was as if the people who made it were a different species.They either lived long ago or were mysterious geniuses doing strange things in profiles in Life magazine.The idea of actually being able to make art, to put that verb before that noun, seemed almost miraculous.That fall I started taking art classes at Harvard.Grad students could take classes in any department, and my advisor, Tom Cheatham, was very easy going.If he even knew about the strange classes I was taking, he never said anything.So now I was in a PhD program in computer science, yet planning to be an artist, yet also genuinely in love with Lisp hacking and working away at On Lisp.In other words, like many a grad student, I was working energetically on multiple projects that were not my thesis.I didn\'t see a way out of this situation.I didn\'t want to drop out of grad school, but how else was I going to get out?I remember when my friend Robert Morris got kicked out of Cornell for writing the internet worm of 1988, I was envious that he\'d found such a spectacular way to get out of grad school.Then one day in April 1990 a crack appeared in the wall.I ran into professor Cheatham and he asked if I was far enough along to graduate that June.I didn\'t have a word of my dissertation written, but in what must have been the quickest bit of thinking in my life, I decided to take a shot at writing one in the 5 weeks or so that remained before the deadline, reusing parts of On Lisp where I could, and I was able to respond, with no perceptible delay \\"Yes, I think so.I\'ll give you something to read in a few days.\\"I picked applications of continuations as the topic.In retrospect I should have written about macros and embedded languages.There\'s a whole world there that\'s barely been explored.But all I wanted was to get out of grad school, and my rapidly written dissertation sufficed, just barely.Meanwhile I was applying to art schools.I applied to two: RISD in the US, and the Accademia di Belli Arti in Florence, which, because it was the oldest art school, I imagined would be good.RISD accepted me, and I never heard back from the Accademia, so off to Providence I went.I\'d applied for the BFA program at RISD, which meant in effect that I had to go to college again.This was not as strange as it sounds, because I was only 25, and art schools are full of people of different ages.RISD counted me as a transfer sophomore and said I had to do the foundation that summer.The foundation means the classes that everyone has to take in fundamental subjects like drawing, color, and design.Toward the end of the summer I got a big surprise: a letter from the Accademia, which had been delayed because they\'d sent it to Cambridge England instead of Cambridge Massachusetts, inviting me to take the entrance exam in Florence that fall.This was now only weeks away.My nice landlady let me leave my stuff in her attic.I had some money saved from consulting work I\'d done in grad school; there was probably enough to last a year if I lived cheaply.Now all I had to do was learn Italian.Only stranieri (foreigners) had to take this entrance exam.In retrospect it may well have been a way of excluding them, because there were so many stranieri attracted by the idea of studying art in Florence that the Italian students would otherwise have been outnumbered.I was in decent shape at painting and drawing from the RISD foundation that summer, but I still don\'t know how I managed to pass the written exam.I remember that I answered the essay question by writing about Cezanne, and that I cranked up the intellectual level as high as I could to make the most of my limited vocabulary.[2]\\n\\nI\'m only up to age 25 and already there are such conspicuous patterns.Here I was, yet again about to attend some august institution in the hopes of learning about some prestigious subject, and yet again about to be disappointed.\\n---------------------\\nGiven the context information and not prior knowledge, answer the query.\\nQuery: What did the author do growing up?\\nAnswer: "}], "stream": false, "model": "gpt-3.5-turbo", "temperature": 0, "max_tokens": null}' message='Post details'
DEBUG:urllib3.connectionpool:https://api.openai.com:443 "POST /v1/chat/completions HTTP/1.1" 200 None
https://api.openai.com:443 "POST /v1/chat/completions HTTP/1.1" 200 None
DEBUG:openai:message='OpenAI API response' path=https://api.openai.com/v1/chat/completions processing_ms=1813 request_id=8ac0fd5cd12d9dde845fab44243dad1e response_code=200
message='OpenAI API response' path=https://api.openai.com/v1/chat/completions processing_ms=1813 request_id=8ac0fd5cd12d9dde845fab44243dad1e response_code=200
DEBUG:llama_index.llm_predictor.base:The author worked on writing and programming outside of school before college. They wrote short stories and tried writing programs on an IBM 1401 computer. They also built a microcomputer kit and started programming on it, writing simple games and a word processor.
The author worked on writing and programming outside of school before college. They wrote short stories and tried writing programs on an IBM 1401 computer. They also built a microcomputer kit and started programming on it, writing simple games and a word processor.

8. インデックスのファイル出力

作成されたインデックスはデフォルトだとメモリ内に保存されます。
ここではチュートリアル通りにファイルに出力します。

index.storage_context.persist()

上記のコードを実行するとstorageフォルダが作成され、その下にjsonファイルも作成されます。

9. インデックスをファイルから読み込む

作成したインデックスをファイルに出力することで、次回以降はインデックスを新たに作成する必要がなくなり、ファイルを読み込むだけでよくなります。

from llama_index import StorageContext, load_index_from_storage

# rebuild storage context
storage_context = StorageContext.from_defaults(persist_dir="./storage")
# load index
index = load_index_from_storage(storage_context)

株式会社piponでは定期的に技術勉強会を開催しています。
ChatGPT・AI・データサイエンスについてご興味がある方は是非、ご参加ください。
https://chatgptllm.connpass.com/
株式会社piponではChatGPT・AI・データサイエンスについて業界ごとの事例を紹介しています。ご興味ある方はこちらのオウンドメディアをご覧ください。
https://bigdata-tools.com/

株式会社piponのテックブログPublication

株式会社piponのテックブログです。 ChatGPTやAzureをメインに情報発信していきます！お問い合わせはフォームへお願いします。会社HP pipon.co.jp/ フォーム share.hsforms.com/19XNce4U5TZuPebGH_BB9Igegfgt

Discussion

ログインするとコメントできます