Closed10
MS GraphRAGを試す
MS GraphRAGを試す。
以下の記事では、肝腎のindex作成部分の記述が欠けている。
(colab notebookを見ればよかったようだ。ただし公式のGetting Startedを日本語訳した域を出ない)
結局、公式の Getting Startedに倣う。pypiからパッケージインストール。
index作成中。まずは費用が掛かってもOpenAI APIを呼ぶことにする。
OpenAI API互換ならば、他のLLMを利用することも簡単ではないか? tool choiceやfunction callingを使うのでなければ。
ローカルモデルに差し替えている。global searchは動くが、local searchでエラーが返るらしい。また、回答品質はよろしくないらしい
morioka@legion:~$ pyenv virtualenv 3.11.8 graphrag
morioka@legion:~$ mkdir graphgrag
morioka@legion:~$ cd graphgrag/
morioka@legion:~/graphgrag$ pyenv local graphrag
素材のダウンロードと、index作成の準備。
(graphrag) morioka@legion:~/graphgrag$ curl https://www.gutenberg.org/cache/epub/24022/pg24022.txt > ./ragtest/input/book.txt
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 184k 100 184k 0 0 114k 0 0:00:01 0:00:01 --:--:-- 114k
(graphrag) morioka@legion:~/graphgrag$ python -m graphrag.index --init --root ./ragtest
Initializing project at ./ragtest
⠋ GraphRAG Indexer (graphrag)
index作成...完了。
(graphrag) morioka@legion:~/graphgrag$ python -m graphrag.index --root ./ragtest
🚀 Reading settings from ragtest/settings.yaml
/home/morioka/.pyenv/versions/graphrag/lib/python3.11/site-packages/numpy/core/fromnumeric.py:59: FutureWarning: 'DataFrame.swapaxes' is
deprecated and will be removed in a future version. Please use 'DataFrame.transpose' instead.
return bound(*args, **kwds)
🚀 create_base_text_units
id chunk ... document_ids n_tokens
0 680dd6d2a970a49082fa4f34bf63a34e The Project Gutenberg eBook of A Christmas Ca... ... 300
1 95f1f8f5bdbf0bee3a2c6f2f4a4907f6 THE PROJECT GUTENBERG EBOOK A CHRISTMAS CAROL... ... 300
2 3a450ed2b7fb1e5fce66f92698c13824 1958,\n 1962, 1964, 1966, 1967, 1969, 1971, 1... ... 300
3 95b143eba145d91eacae7be3e4ebaf0c .\n Mr. Fezziwig, a kind-hearted, jovial old ... ... 300
4 c390f1b92e2888f78b58f6af5b12afa0 debtors.\n Mrs. Cratchit, wife of Bob Cratch... ... 300
.. ... ... ... ... ...
226 972bb34ddd371530f06d006480526d3e harmless from all liability, costs and expens... ... 300
227 2f918cd94d1825eb5cbdc2a9d3ce094e \nGutenberg Literary Archive Foundation was cr... ... 300
228 eec5fc1a2be814473698e220b303dc1b . Email contact links and up\nto date contact ... ... 300
229 535f6bed392a62760401b1d4f2aa5e2f compliance. To SEND\nDONATIONS or determine t... ... 300
230 9e59af410db84b25757e3bf90e036f39 could be\nfreely shared with anyone. For fort... ... 155
[231 rows x 5 columns]
⠙ GraphRAG Indexer
🚀 create_base_extracted_entities
entity_graph
0 <graphml xmlns="http://graphml.graphdrawing.or...
🚀 create_summarized_entities
entity_graph
0 <graphml xmlns="http://graphml.graphdrawing.or...
🚀 create_base_entity_graph
level clustered_graph
0 0 <graphml xmlns="http://graphml.graphdrawing.or...
1 1 <graphml xmlns="http://graphml.graphdrawing.or...
2 2 <graphml xmlns="http://graphml.graphdrawing.or...
/home/morioka/.pyenv/versions/graphrag/lib/python3.11/site-packages/numpy/core/fromnumeric.py:59: FutureWarning: 'DataFrame.swapaxes' is
deprecated and will be removed in a future version. Please use 'DataFrame.transpose' instead.
return bound(*args, **kwds)
/home/morioka/.pyenv/versions/graphrag/lib/python3.11/site-packages/numpy/core/fromnumeric.py:59: FutureWarning: 'DataFrame.swapaxes' is
deprecated and will be removed in a future version. Please use 'DataFrame.transpose' instead.
return bound(*args, **kwds)
🚀 create_final_entities
id ... description_embedding
0 b45241d70f0e43fca764df95b2b81f77 ... [0.025451932102441788, 0.03978753089904785, -0...
1 4119fd06010c494caa07f439b333f4c5 ... [0.021342700347304344, 0.024907737970352173, -...
2 d3835bf3dda84ead99deadbeac5d0d7d ... [0.0048078252002596855, 0.02126268297433853, -...
3 077d2820ae1845bcbb1803379a3d1eae ... [0.02204022742807865, -0.007841000333428383, -...
4 3671ea0dd4e84c1a9b02c5ab2c8f4bac ... [-0.024365561082959175, -0.006287779193371534,...
.. ... ... ...
149 9a6f414210e14841a5b0e661aedc898d ... [-0.05559733510017395, -0.012709809467196465, ...
150 db541b7260974db8bac94e953009f60e ... [-0.003418784821406007, -0.01189712155610323, ...
151 f2ff8044718648e18acef16dd9a65436 ... [-0.04298556223511696, 0.008937294594943523, 0...
152 00d785e7d76b47ec81b508e768d40584 ... [-0.03399103134870529, -0.008763907477259636, ...
153 87915637da3e474c9349bd0ae604bd95 ... [-0.02068764716386795, 0.0021940593142062426, ...
[154 rows x 8 columns]
/home/morioka/.pyenv/versions/graphrag/lib/python3.11/site-packages/numpy/core/fromnumeric.py:59: FutureWarning: 'DataFrame.swapaxes' is
deprecated and will be removed in a future version. Please use 'DataFrame.transpose' instead.
return bound(*args, **kwds)
/home/morioka/.pyenv/versions/graphrag/lib/python3.11/site-packages/datashaper/engine/verbs/convert.py:72: FutureWarning: errors='ignore' is
deprecated and will raise in a future version. Use to_datetime without passing `errors` and catch exceptions explicitly instead
datetime_column = pd.to_datetime(column, errors="ignore")
/home/morioka/.pyenv/versions/graphrag/lib/python3.11/site-packages/datashaper/engine/verbs/convert.py:72: UserWarning: Could not infer
format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please
specify a format.
datetime_column = pd.to_datetime(column, errors="ignore")
🚀 create_final_nodes
level title type ... top_level_node_id x y
0 0 "BOB CRATCHIT" "PERSON" ... b45241d70f0e43fca764df95b2b81f77 0 0
1 0 "PETER CRATCHIT" "PERSON" ... 4119fd06010c494caa07f439b333f4c5 0 0
2 0 "TIM CRATCHIT" "PERSON" ... d3835bf3dda84ead99deadbeac5d0d7d 0 0
3 0 "MR. FEZZIWIG" "PERSON" ... 077d2820ae1845bcbb1803379a3d1eae 0 0
4 0 "FRED" "PERSON" ... 3671ea0dd4e84c1a9b02c5ab2c8f4bac 0 0
.. ... ... ... ... ... .. ..
457 2 "INTERNAL REVENUE SERVICE" "ORGANIZATION" ... 9a6f414210e14841a5b0e661aedc898d 0 0
458 2 "MISSISSIPPI" "GEO" ... db541b7260974db8bac94e953009f60e 0 0
459 2 "SALT LAKE CITY, UT" "GEO" ... f2ff8044718648e18acef16dd9a65436 0 0
460 2 "IRS" "ORGANIZATION" ... 00d785e7d76b47ec81b508e768d40584 0 0
461 2 "MICHAEL S. HART" "PERSON" ... 87915637da3e474c9349bd0ae604bd95 0 0
[462 rows x 14 columns]
/home/morioka/.pyenv/versions/graphrag/lib/python3.11/site-packages/numpy/core/fromnumeric.py:59: FutureWarning: 'DataFrame.swapaxes' is
deprecated and will be removed in a future version. Please use 'DataFrame.transpose' instead.
return bound(*args, **kwds)
/home/morioka/.pyenv/versions/graphrag/lib/python3.11/site-packages/numpy/core/fromnumeric.py:59: FutureWarning: 'DataFrame.swapaxes' is
deprecated and will be removed in a future version. Please use 'DataFrame.transpose' instead.
return bound(*args, **kwds)
🚀 create_final_communities
id title ... relationship_ids text_unit_ids
0 6 Community 6 ... [8f1eba29f39e411188200bf0d14628ec, 7282c73622b... [0d6bc6e701a0025632e41dc3387c641d,13f70e4c705f...
1 1 Community 1 ... [59c726a8792d443e84ab052cb7942b4a, 4f2c665decf... [13f70e4c705fb134466c125b05af3440,3a450ed2b7fb...
2 0 Community 0 ... [896d2a51e8de47de85ba8ced108c3d53, f97011b2a99... [3a450ed2b7fb1e5fce66f92698c13824,694fe3a93b65...
3 4 Community 4 ... [4517768fc4e24bd2a790be0e08a7856e, 586bccefb1e... [1d5a3ea2bdc7eb02878c9733fae3924b,715dd9466e12...
4 9 Community 9 ... [9376ce8940e647a99e5e087514b88fa4, 35489ca6a63...
5 11 Community 11 ... [8169efeea3ce473d9fd2f1c688126a1c, d203efdbfb2... [a31b7f9a68e95c6a014501ba8710513c,f567abf22400...
6 10 Community 10 ... [c2d48b75af6a4d7989ccf9eceabd934e, 68e0c60d2e8...
7 5 Community 5 ... [f9005e5c01b44bb489f7112322fd1162, d9ef0175497... [320c285a98f252d567b2005902763e5c,ddc8697a7671...
8 7 Community 7 ... [cf6115e69d6649cc99ef2bd11854ccfb, 496f17c2f74... [8abfa46c9318e287361dc792381e06e5,9875af54dfa7...
9 2 Community 2 ... [9ed7e3d187b94ab0a90830b17d66615e, b4c7432f712... [eab0a98a24212548adc9252b20b29dce, db04de01e9b...
10 12 Community 12 ... [40450f2c91944a81944621b94f190b49, ed559fb4ebd...
11 3 Community 3 ... [71a0a8c1beb64da08124205e9a803d98, f84314943be... [d6d510a8b60a7597b6b907023d156777,eea518e2ef1c...
12 8 Community 8 ... [5c13c7d61e6c4bfe839f21e7ad3530a7, a621663edba... [2b16778c9beeb9bbb5c770960d7bd492,7b4ad7c69598...
13 18 Community 18 ... [59c726a8792d443e84ab052cb7942b4a, 4f2c665decf... [13f70e4c705fb134466c125b05af3440,3a450ed2b7fb...
14 13 Community 13 ... [896d2a51e8de47de85ba8ced108c3d53, f97011b2a99... [3a450ed2b7fb1e5fce66f92698c13824,694fe3a93b65...
15 20 Community 20 ... [14555b518e954637b83aa762dc03164e, b1f6164116d... [0e13fd0aca5720eb614104772f20077b,0eb69b9f79f6...
16 17 Community 17 ... [545edff337344e518f68d1301d745455, d405c3154d0... [45ac76a7dea29addc4542c64d7eae68f,715dd9466e12...
17 19 Community 19 ... [b38a636e86984600bb4b57c2e2df9747, 4bc7440b8f4... [547563001cad1df48dfcd4ee4ecc8ee9,8abfa46c9318...
18 14 Community 14 ... [222f0ea8a5684123a7045986640ec844, 668cf1fdfd6... [1d5a3ea2bdc7eb02878c9733fae3924b,3dc28534d844...
19 15 Community 15 ... [82b0446e7c9d4fc793f7b97f890e9049, 70634e10a5e... [da8b22fcbea495d042facb17b364be42, 9c7e56ef067...
20 16 Community 16 ... [5f1fc373a8f34050a5f7dbd8ac852c1b, c725babdb14... [5cf14e5d111ef8cbfd7d32e30b6cdb8a,7348862d5fd2...
21 23 Community 23 ... [9ed7e3d187b94ab0a90830b17d66615e, b4c7432f712... [eab0a98a24212548adc9252b20b29dce, db04de01e9b...
22 21 Community 21 ... [5b9fa6a959294dc29c8420b2d7d3096f, b84d71ed9c3... [56649632e1d3a637a756905477c99002,6997e1ff5fab...
23 22 Community 22 ... [0111777c4e9e4260ab2e5ddea7cbcf58, 785f7f32471... [0d6bc6e701a0025632e41dc3387c641d,25666ca46011...
24 24 Community 24 ... [bcfdc48e5f044e1d84c5d217c1992d4b, b232fb0f2ac... [7ed8b64d3fcf6b96c9c86c53e3fb7ce7, 7ed8b64d3fc...
25 25 Community 25 ... [896d2a51e8de47de85ba8ced108c3d53, f97011b2a99... [3a450ed2b7fb1e5fce66f92698c13824,694fe3a93b65...
26 26 Community 26 ... [c2999bdca08a478b84b10219875b285e, 351abba16e5... [21acfa3b8dca20f03f4a2d7133013952,3dc28534d844...
27 27 Community 27 ... [263d07354a1b4336b462024288f9bcd3, 50ea7d3b696... [320c285a98f252d567b2005902763e5c,45ac76a7dea2...
[28 rows x 6 columns]
🚀 join_text_units_to_entity_ids
text_unit_ids entity_ids id
0 0d6bc6e701a0025632e41dc3387c641d [b45241d70f0e43fca764df95b2b81f77, de988724cfd... 0d6bc6e701a0025632e41dc3387c641d
1 13f70e4c705fb134466c125b05af3440 [b45241d70f0e43fca764df95b2b81f77, 3671ea0dd4e... 13f70e4c705fb134466c125b05af3440
2 25666ca46011e54363d13007959f45fb [b45241d70f0e43fca764df95b2b81f77, de988724cfd... 25666ca46011e54363d13007959f45fb
3 2818d4194a37f4573f7a83b49cd59b21 [b45241d70f0e43fca764df95b2b81f77, de988724cfd... 2818d4194a37f4573f7a83b49cd59b21
4 3a450ed2b7fb1e5fce66f92698c13824 [b45241d70f0e43fca764df95b2b81f77, 4119fd06010... 3a450ed2b7fb1e5fce66f92698c13824
.. ... ... ...
99 9e59af410db84b25757e3bf90e036f39 [eeef6ae5c464400c8755900b4f1ac37a, 422433aa458... 9e59af410db84b25757e3bf90e036f39
100 da3ca9f93aac15c67f6acf3cca2fc229 da3ca9f93aac15c67f6acf3cca2fc229
101 e8cf7d2eec5c3bcbeefc60d9f15941ed [eeef6ae5c464400c8755900b4f1ac37a, 1af9faf341e... e8cf7d2eec5c3bcbeefc60d9f15941ed
102 eec5fc1a2be814473698e220b303dc1b [422433aa45804c7ebb973b2fafce5da6, 1af9faf341e... eec5fc1a2be814473698e220b303dc1b
103 b3c35247f91923027d9bd7d476467f4f [1af9faf341e14a5bbf4ddc9080e8dc0b, 353d91abc68... b3c35247f91923027d9bd7d476467f4f
[104 rows x 3 columns]
/home/morioka/.pyenv/versions/graphrag/lib/python3.11/site-packages/numpy/core/fromnumeric.py:59: FutureWarning: 'DataFrame.swapaxes' is
deprecated and will be removed in a future version. Please use 'DataFrame.transpose' instead.
return bound(*args, **kwds)
/home/morioka/.pyenv/versions/graphrag/lib/python3.11/site-packages/numpy/core/fromnumeric.py:59: FutureWarning: 'DataFrame.swapaxes' is
deprecated and will be removed in a future version. Please use 'DataFrame.transpose' instead.
return bound(*args, **kwds)
/home/morioka/.pyenv/versions/graphrag/lib/python3.11/site-packages/datashaper/engine/verbs/convert.py:65: FutureWarning: errors='ignore' is
deprecated and will raise in a future version. Use to_numeric without passing `errors` and catch exceptions explicitly instead
column_numeric = cast(pd.Series, pd.to_numeric(column, errors="ignore"))
🚀 create_final_relationships
source target weight ... source_degree target_degree rank
0 "BOB CRATCHIT" "EBENEZER SCROOGE" 1.0 ... 9 4 13
1 "BOB CRATCHIT" "PETER CRATCHIT" 2.0 ... 9 2 11
2 "BOB CRATCHIT" "TIM CRATCHIT" 1.0 ... 9 1 10
3 "BOB CRATCHIT" "SCROOGE" 6.0 ... 9 69 78
4 "BOB CRATCHIT" "CORNHILL" 1.0 ... 9 1 10
.. ... ... ... ... ... ... ...
157 "PROJECT GUTENBERG LITERARY ARCHIVE FOUNDATION" "TRADEMARK OWNER" 1.0 ... 4 1 5
158 "PROJECT GUTENBERG LITERARY ARCHIVE FOUNDATION" "IRS" 1.0 ... 4 1 5
159 "GUTENBERG LITERARY ARCHIVE FOUNDATION" "INTERNAL REVENUE SERVICE" 1.0 ... 4 1 5
160 "GUTENBERG LITERARY ARCHIVE FOUNDATION" "MISSISSIPPI" 1.0 ... 4 1 5
161 "GUTENBERG LITERARY ARCHIVE FOUNDATION" "SALT LAKE CITY, UT" 1.0 ... 4 1 5
[162 rows x 10 columns]
🚀 join_text_units_to_relationship_ids
id relationship_ids
0 3a450ed2b7fb1e5fce66f92698c13824 [8f1eba29f39e411188200bf0d14628ec, 7282c73622b...
1 d95d1ec14f9c4293fab4e36bbe5d9fd1 [7282c73622b8408e97289d959faff483, af7a1584dd1...
2 0d6bc6e701a0025632e41dc3387c641d [af7a1584dd15492cb9a4940e285f57fc, 6090e736374...
3 13f70e4c705fb134466c125b05af3440 [af7a1584dd15492cb9a4940e285f57fc, 4f2c665decf...
4 25666ca46011e54363d13007959f45fb [af7a1584dd15492cb9a4940e285f57fc, f422035f8b7...
.. ... ...
91 e8cf7d2eec5c3bcbeefc60d9f15941ed [089b9b9841714b8da043777e2cda3767]
92 10bab8e9773ee6dfbb465bfa45794c34 [38f1e44579d0437dac1203c34678d3c3]
93 2f918cd94d1825eb5cbdc2a9d3ce094e [1ca24718a96b47f3a8855550506c4b41, 4b8aa4587c7...
94 eec5fc1a2be814473698e220b303dc1b [f23484b1b45d44c3b7847e1906dddd37, 4920fda0318...
95 b3c35247f91923027d9bd7d476467f4f [929f30875e1744b49e7b416eaf5a790c]
[96 rows x 2 columns]
🚀 create_final_community_reports
community ... id
0 25 ... 88b49b31-f8d7-4e2d-8d1a-8d6cc623f120
1 26 ... a8731b73-a194-42c2-acb5-46660cd47971
2 27 ... d18abd4d-2c8f-4d11-8fb6-4955c3f43565
3 13 ... c1730928-6a64-4958-b483-b88cbf2fc829
4 14 ... 7eadaa3d-599b-4f27-b8f0-eefc1591c972
5 15 ... a35bd39e-9acb-4ed9-9c33-f5ecdebd8335
6 16 ... aed0422f-ee68-45a6-aa66-bba3a0658720
7 17 ... 51e137a4-18fa-42ca-9b52-ec8cd8af44af
8 18 ... 3bbe1bbb-2486-42d7-9184-b8e8af4f4813
9 19 ... bd33037c-2e6f-4ea1-829d-15f8801f75d5
10 20 ... 3369e9aa-c808-4009-a832-08f3a6cdb8e3
11 21 ... 60331a17-1ecf-48a5-8bfb-8808814d598f
12 23 ... 873fc6bc-1f5b-4a55-be5f-3c1b51d155b2
13 24 ... c82ad56f-0734-4efb-9036-65c73eaa26e4
14 0 ... 8b08c146-dd6d-4e2a-bc97-3f581b731f22
15 1 ... 2f6d74c3-0d66-4a9b-875d-408ef7934815
16 10 ... ccd21290-7825-4ea2-8d8b-7392f48e0352
17 11 ... 79fd9cbb-5854-49bc-a779-ffd157f53a6a
18 12 ... fe0add06-b770-45dd-acc5-f308d3fc6698
19 2 ... 55d9c915-188e-475b-b7ac-79400fbe049b
20 3 ... d3f034e6-4269-493b-beb0-b035c3e14907
21 4 ... e2141b76-c398-436e-87da-bde2b892a939
22 5 ... 17b8c50f-9f53-4222-bdf0-87d6363a6029
23 6 ... c46bcb13-36d2-451c-9961-0bda356d8337
24 7 ... b8a65762-0c89-4d03-b35a-caf1072311cb
25 8 ... bda48fd2-d260-42c7-9662-23374c0ccf10
26 9 ... df636add-aa7d-444f-8af8-38793ecf73eb
[27 rows x 10 columns]
🚀 create_final_text_units
id ... relationship_ids
0 3a450ed2b7fb1e5fce66f92698c13824 ... [8f1eba29f39e411188200bf0d14628ec, 7282c73622b...
1 b6a337c6f91c648c7432dc9e9e01b797 ...
2 bcd3d11eb719b981ca5c674cbc9a123e ... [35489ca6a63b47d6a8913cf333818bc1, 5d3344f45e6...
3 547563001cad1df48dfcd4ee4ecc8ee9 ... [5d3344f45e654d2c808481672f2f08dd, 68762e6f0d1...
4 da8b22fcbea495d042facb17b364be42 ... [5d3344f45e654d2c808481672f2f08dd, 70634e10a5e...
.. ... ... ...
226 01e84646075b255eab0a34d872336a89 ... None
227 879b3fc36c9a2427cdb8d5d41b60e11b ... None
228 28f242c45159426edb8589f5ca3c10e6 ... None
229 f96b5ddf7fae853edbc4d916f66c623f ... None
230 958e8453c6299cf980b3e6f962240699 ... None
[231 rows x 6 columns]
/home/morioka/.pyenv/versions/graphrag/lib/python3.11/site-packages/datashaper/engine/verbs/convert.py:72: FutureWarning: errors='ignore' is
deprecated and will raise in a future version. Use to_datetime without passing `errors` and catch exceptions explicitly instead
datetime_column = pd.to_datetime(column, errors="ignore")
🚀 create_base_documents
id ... title
0 c305886e4aa2f6efcf64b57762777055 ... book.txt
[1 rows x 4 columns]
🚀 create_final_documents
id ... title
0 c305886e4aa2f6efcf64b57762777055 ... book.txt
[1 rows x 4 columns]
⠦ GraphRAG Indexer
├── Loading Input (InputFileType.text) - 1 files loaded (0 filtered) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00 0:00:00
├── create_base_text_units
├── create_base_extracted_entities
├── create_summarized_entities
├── create_base_entity_graph
├── create_final_entities
├── create_final_nodes
├── create_final_communities
├── join_text_units_to_entity_ids
├── create_final_relationships
├── join_text_units_to_relationship_ids
├── create_final_community_reports
├── create_final_text_units
├── create_base_documents
└── create_final_documents
🚀 All workflows completed successfully.
(graphrag) morioka@legion:~/graphgrag$
(graphrag) morioka@legion:~/graphgrag$
質問を実行。その1。
"Here is an example using Global search to ask a high-level question"
(graphrag) morioka@legion:~/graphgrag$ python -m graphrag.query \
--root ./ragtest \
--method global \
"What are the top themes in this story?"
INFO: Reading settings from ragtest/settings.yaml
creating llm client with {'api_key': 'REDACTED,len=51', 'type': "openai_chat", 'model': 'gpt-4-turbo-preview', 'max_tokens': 4000, 'request_timeout': 180.0, 'api_base': None, 'api_version': None, 'organization': None, 'proxy': None, 'cognitive_services_endpoint': None, 'deployment_name': None, 'model_supports_json': True, 'tokens_per_minute': 0, 'requests_per_minute': 0, 'max_retries': 10, 'max_retry_wait': 10.0, 'sleep_on_rate_limit_recommendation': True, 'concurrent_requests': 25}
SUCCESS: Global Search Response: ### Top Themes in the Story
The story weaves together a rich tapestry of themes, each contributing to the narrative's depth and the characters' journeys. Central to these is the **transformation and redemption** of Ebenezer Scrooge, who evolves from a figure of miserliness to one of generosity. This change is not just a personal journey but a reflection of the broader potential for human growth and redemption through relationships and self-reflection [Data: Reports (6, 18, 26, 27, 25, +more)].
**Family and social connections** emerge as another significant theme, illustrated through the interactions of characters like the Cratchit family and Scrooge's nephew. These relationships highlight the importance of community and the impact of personal bonds on individuals' lives [Data: Reports (6, 19, 20, 24, +more)].
The narrative places a strong emphasis on the **Christmas spirit**, which acts as a catalyst for reflection, joy, and change. This theme underscores the transformative power of the season and its ability to inspire generosity and kindness [Data: Reports (14, 19, 24)].
**Generosity versus greed** is explored through contrasting behaviors and attitudes, particularly through Scrooge's interactions with others. This theme delves into the moral implications of both sets of values, advocating for a life led by kindness and compassion [Data: Reports (6, 14, 18, 20)].
Lastly, the **role of supernatural guidance** in personal growth is highlighted through the visits of the Ghosts of Christmas Past, Present, and Yet to Come. These spectral figures guide Scrooge on a journey of self-discovery, emphasizing the importance of mentorship and positive influence in catalyzing change [Data: Reports (26, 27, 25)].
In summary, the story's themes are deeply interwoven, each playing a crucial role in the narrative's exploration of human nature, the importance of community, and the potential for personal transformation. Through the lens of Christmas and the supernatural, the story offers a timeless message of hope, redemption, and the enduring power of kindness and generosity.
(graphrag) morioka@legion:~/graphgrag$
(graphrag) morioka@legion:~/graphgrag$ python -m graphrag.query --root ./ragtest --method global "What are the top themes in this story? 日 本語で回答して"
INFO: Reading settings from ragtest/settings.yaml
creating llm client with {'api_key': 'REDACTED,len=51', 'type': "openai_chat", 'model': 'gpt-4-turbo-preview', 'max_tokens': 4000, 'request_timeout': 180.0, 'api_base': None, 'api_version': None, 'organization': None, 'proxy': None, 'cognitive_services_endpoint': None, 'deployment_name': None, 'model_supports_json': True, 'tokens_per_minute': 0, 'requests_per_minute': 0, 'max_retries': 10, 'max_retry_wait': 10.0, 'sleep_on_rate_limit_recommendation': True, 'concurrent_requests': 25}
SUCCESS: Global Search Response: この物語における主要なテーマは、希望と変容、家族とコミュニティの絆、贖罪と個人の成長、そして社会的責任と慈善です。これらのテーマは、エベネーザー・スクルージの人生と彼の周りの人々との関係を通じて探求されています。
### 希望と変容
物語の中心は、スクルージの人格と関係性の変化にあります。彼の変化やクラチット家族の経済的困難にもかかわらず見せる強さと希望が、このテーマを際立たせています。スクルージの変化を促す重要な要素として、ジェイコブ・マーリーとの関係や三人の霊の訪問が挙げられます [Data: Reports (25, 6, 20, 26, 27)]。
### 家族とコミュニティの絆
家族とコミュニティの絆の重要性が強調されています。クラチット家族の結束力やスクルージの甥との関係が、このテーマを具体化しています。フェジウィグのクリスマスイブの祝賀会は、コミュニティの絆と喜びの価値を強調しており、物語における肯定的な影響力の例です [Data: Reports (6, 19, 20, 4)]。
### 贖罪と個人の成長
贖罪と個人の成長のテーマが、スクルージの人生の変化を通じて探求されています。彼の過去、現在、未来を訪れることで、スクルージは自己反省と変化の旅を経験します [Data: Reports (26, 27, 18)]。
### 社会的責任と慈善
社会的責任と慈善のテーマが、スクルージと他のキャラクターの間の対話や行動を通じて展開されます。特に貧困層への支援の必要性が強調されています。物語全体を通じて、慈善、家族、コミュニティの絆の重要性が強調されています [Data: Reports (11, 3, 25, 4)]。
これらのテーマは、物語を通じて織り交ぜられ、読者に深い印象を与える要素となっています。スクルージの人生とコミュニティへの影響は、彼の変化が個人だけでなく、広い社会にも良い影響を与えることを示しています [Data: Reports (25)]。
(graphrag) morioka@legion:~/graphgrag$
質問を実行。その2。
"Here is an example using Local search to ask a more specific question about a particular character"
(graphrag) morioka@legion:~/graphgrag$ python -m graphrag.query \
--root ./ragtest \
--method local \
"Who is Scrooge, and what are his main relationships?"
INFO: Reading settings from ragtest/settings.yaml
[2024-07-13T06:19:41Z WARN lance::dataset] No existing dataset at /home/morioka/graphgrag/lancedb/description_embedding.lance, it will be created
creating llm client with {'api_key': 'REDACTED,len=51', 'type': "openai_chat", 'model': 'gpt-4-turbo-preview', 'max_tokens': 4000, 'request_timeout': 180.0, 'api_base': None, 'api_version': None, 'organization': None, 'proxy': None, 'cognitive_services_endpoint': None, 'deployment_name': None, 'model_supports_json': True, 'tokens_per_minute': 0, 'requests_per_minute': 0, 'max_retries': 10, 'max_retry_wait': 10.0, 'sleep_on_rate_limit_recommendation': True, 'concurrent_requests': 25}
creating embedding llm client with {'api_key': 'REDACTED,len=51', 'type': "openai_embedding", 'model': 'text-embedding-3-small', 'max_tokens': 4000, 'request_timeout': 180.0, 'api_base': None, 'api_version': None, 'organization': None, 'proxy': None, 'cognitive_services_endpoint': None, 'deployment_name': None, 'model_supports_json': None, 'tokens_per_minute': 0, 'requests_per_minute': 0, 'max_retries': 10, 'max_retry_wait': 10.0, 'sleep_on_rate_limit_recommendation': True, 'concurrent_requests': 25}
SUCCESS: Local Search Response: ### Who is Scrooge?
Ebenezer Scrooge is a central character in a narrative that explores themes of redemption, compassion, and the transformative power of the Christmas spirit. Initially depicted as a miserly, solitary, and bitter individual, Scrooge is known for his disdain towards Christmas and his lack of empathy towards others. His life is characterized by gloom, isolation, and a meticulous nature that prioritizes wealth accumulation over human connections [Data: Entities (9, 11)].
### Scrooge's Main Relationships
#### Jacob Marley
Jacob Marley, Scrooge's late business partner, plays a pivotal role in initiating Scrooge's transformation. Marley's ghost visits Scrooge to warn him of the dire consequences of continuing his current path of greed and isolation, setting the stage for the visits from the three spirits [Data: Entities (42)].
#### The Three Spirits
The Ghosts of Christmas Past, Present, and Yet to Come are supernatural entities that guide Scrooge through a journey of self-reflection. Each spirit shows Scrooge the impact of his actions on himself and others, leading him to realize the importance of compassion and community. These spirits are instrumental in breaking down Scrooge's walls of indifference, showcasing the consequences of his miserliness, and ultimately guiding him towards redemption [Data: Entities (43, 48, 114)].
#### Bob Cratchit and Family
Bob Cratchit, Scrooge's underpaid and overworked clerk, represents the human face of Scrooge's harshness. Initially, Scrooge's relationship with Cratchit is emblematic of his miserliness. However, as Scrooge's transformation unfolds, he becomes a figure of generosity towards the Cratchit family, symbolizing his broader change from a life of isolation to one of engagement and benevolence. This shift is particularly highlighted by Scrooge's newfound concern for Tiny Tim, Cratchit's ailing son, and his actions to assist the family [Data: Entities (0, 95)].
#### Scrooge's Nephew, Fred
Fred, Scrooge's nephew, embodies the spirit of Christmas that Scrooge initially rejects. Fred's persistent invitations to Scrooge to join the Christmas celebrations and his optimistic outlook on life play a crucial role in highlighting the contrast between Scrooge's miserliness and the potential for joy and familial warmth. Scrooge's eventual acceptance of Fred's invitation symbolizes his reintegration into family and society [Data: Entities (4, 15, 131)].
#### The Broader Community
Scrooge's interactions with the broader community, including his charitable actions and his newfound willingness to engage with others, reflect the essence of his transformation. His change impacts not only his immediate circle but also the wider community, as he becomes a symbol of hope and generosity [Data: Entities (21, 104, 138)].
### Conclusion
Ebenezer Scrooge's journey from a miserly recluse to a generous benefactor is marked by significant relationships that catalyze his transformation. Through encounters with Jacob Marley, the three spirits, Bob Cratchit and his family, and his nephew Fred, Scrooge learns the value of compassion, generosity, and community. These relationships are central to the narrative, illustrating the profound impact of personal change on oneself and society at large.
(graphrag) morioka@legion:~/graphgrag$
(graphrag) morioka@legion:~/graphgrag$ python -m graphrag.query --root ./ragtest --method local "Who is Scrooge, and what are his main relationships? 日本語で回答して"
INFO: Reading settings from ragtest/settings.yaml
creating llm client with {'api_key': 'REDACTED,len=51', 'type': "openai_chat", 'model': 'gpt-4-turbo-preview', 'max_tokens': 4000, 'request_timeout': 180.0, 'api_base': None, 'api_version': None, 'organization': None, 'proxy': None, 'cognitive_services_endpoint': None, 'deployment_name': None, 'model_supports_json': True, 'tokens_per_minute': 0, 'requests_per_minute': 0, 'max_retries': 10, 'max_retry_wait': 10.0, 'sleep_on_rate_limit_recommendation': True, 'concurrent_requests': 25}
creating embedding llm client with {'api_key': 'REDACTED,len=51', 'type': "openai_embedding", 'model': 'text-embedding-3-small', 'max_tokens': 4000, 'request_timeout': 180.0, 'api_base': None, 'api_version': None, 'organization': None, 'proxy': None, 'cognitive_services_endpoint': None, 'deployment_name': None, 'model_supports_json': None, 'tokens_per_minute': 0, 'requests_per_minute': 0, 'max_retries': 10, 'max_retry_wait': 10.0, 'sleep_on_rate_limit_recommendation': True, 'concurrent_requests': 25}
SUCCESS: Local Search Response: スクルージは、チャールズ・ディケンズの『クリスマス・キャロル』に登場する架空のキャラクターであり、物語の主人公です。彼は最初、非常にケチで心の冷たい人物として描かれていますが、物語の進行と共に大きな変化を遂げます。スクルージの人生と変化において重要な役割を果たす主な関係性には、ジェイコブ・マーリー、ボブ・クラチットとその家族、そしてクリスマスの三霊が含まれます。
### ジェイコブ・マーリーとの関係
ジェイコブ・マーリーはスクルージの亡くなったビジネスパートナーで、物語の始まりで幽霊としてスクルージの前に現れます。マーリーの訪問は、スクルージの変化のきっかけとなります。彼はスクルージに対し、自分のように後悔と苦しみの鎖を担ぐ運命を避けるために、生き方を変えるよう警告します[Data: Entities (42)]。
### ボブ・クラチットとその家族との関係
ボブ・クラチットはスクルージの事務員であり、彼の家族は物語の中で重要な役割を果たします。スクルージは当初、クラチット家に対して冷たく、ケチな態度を取っていましたが、クリスマスの霊たちの訪問を通じて彼らの苦労と温かさを知り、心を開いていきます。特に、クラチット家の息子であるタイニー・ティムの純粋さと弱さは、スクルージの心に深い影響を与えます[Data: Entities (0), Relationships (117, 119)]。
### クリスマスの三霊との関係
クリスマスの過去の霊、現在の霊、そして未来の霊は、スクルージに自分の過去、現在、そして未来を見せることで、彼の人生と行動の影響を理解させます。これらの霊たちの訪問は、スクルージが自分自身と周りの世界に対する彼の態度を根本的に変えるきっかけとなります[Data: Entities (48, 43, 114)]。
### まとめ
スクルージの物語は、彼の人生における重要な関係性を通じて語られます。ジェイコブ・マーリーの警告、ボブ・クラチットとその家族との絆、そしてクリスマスの三霊の教えは、スクルージを変えるための重要な要素です。これらの関係性は、スクルージが自己中心的で冷たい心を捨て、愛と慈悲の心を持つようになる過程を描いています。
(graphrag) morioka@legion:~/graphgrag$
わざわざ質問タイプ(global, local)を与えているが、質問タイプを推測させればよいのでは?
コストが気にならなければ、それぞれのタイプで回答を生成させて、どちらか選べば。
TODO:
- 内容理解
- 他のGraphRAG手法の把握と比較。neo4jらのGraphRAGや、llamaindexのproperty-graphなど。
↑これは、下記の翻訳+データ差し替え
settings.yaml はこんな感じ。
APIの振り向け先の変更は簡単にできる印象。でも、LLMはともかく、embeddingモデルをどうするか。
encoding_model: cl100k_base
skip_workflows: []
llm:
api_key: ${GRAPHRAG_API_KEY}
type: openai_chat # or azure_openai_chat
model: gpt-4-turbo-preview
model_supports_json: true # recommended if this is available for your model.
# max_tokens: 4000
# request_timeout: 180.0
# api_base: https://<instance>.openai.azure.com
# api_version: 2024-02-15-preview
# organization: <organization_id>
# deployment_name: <azure_model_deployment_name>
# tokens_per_minute: 150_000 # set a leaky bucket throttle
# requests_per_minute: 10_000 # set a leaky bucket throttle
# max_retries: 10
# max_retry_wait: 10.0
# sleep_on_rate_limit_recommendation: true # whether to sleep when azure suggests wait-times
# concurrent_requests: 25 # the number of parallel inflight requests that may be made
parallelization:
stagger: 0.3
# num_threads: 50 # the number of threads to use for parallel processing
async_mode: threaded # or asyncio
embeddings:
## parallelization: override the global parallelization settings for embeddings
async_mode: threaded # or asyncio
llm:
api_key: ${GRAPHRAG_API_KEY}
type: openai_embedding # or azure_openai_embedding
model: text-embedding-3-small
# api_base: https://<instance>.openai.azure.com
# api_version: 2024-02-15-preview
# organization: <organization_id>
# deployment_name: <azure_model_deployment_name>
# tokens_per_minute: 150_000 # set a leaky bucket throttle
# requests_per_minute: 10_000 # set a leaky bucket throttle
# max_retries: 10
# max_retry_wait: 10.0
# sleep_on_rate_limit_recommendation: true # whether to sleep when azure suggests wait-times
# concurrent_requests: 25 # the number of parallel inflight requests that may be made
# batch_size: 16 # the number of documents to send in a single request
# batch_max_tokens: 8191 # the maximum number of tokens to send in a single request
# target: required # or optional
chunks:
size: 300
overlap: 100
group_by_columns: [id] # by default, we don't allow chunks to cross documents
input:
type: file # or blob
file_type: text # or csv
base_dir: "input"
file_encoding: utf-8
file_pattern: ".*\\.txt$"
cache:
type: file # or blob
base_dir: "cache"
# connection_string: <azure_blob_storage_connection_string>
# container_name: <azure_blob_storage_container_name>
storage:
type: file # or blob
base_dir: "output/${timestamp}/artifacts"
# connection_string: <azure_blob_storage_connection_string>
# container_name: <azure_blob_storage_container_name>
reporting:
type: file # or console, blob
base_dir: "output/${timestamp}/reports"
# connection_string: <azure_blob_storage_connection_string>
# container_name: <azure_blob_storage_container_name>
entity_extraction:
## llm: override the global llm settings for this task
## parallelization: override the global parallelization settings for this task
## async_mode: override the global async_mode settings for this task
prompt: "prompts/entity_extraction.txt"
entity_types: [organization,person,geo,event]
max_gleanings: 0
summarize_descriptions:
## llm: override the global llm settings for this task
## parallelization: override the global parallelization settings for this task
## async_mode: override the global async_mode settings for this task
prompt: "prompts/summarize_descriptions.txt"
max_length: 500
claim_extraction:
## llm: override the global llm settings for this task
## parallelization: override the global parallelization settings for this task
## async_mode: override the global async_mode settings for this task
# enabled: true
prompt: "prompts/claim_extraction.txt"
description: "Any claims or facts that could be relevant to information discovery."
max_gleanings: 0
community_report:
## llm: override the global llm settings for this task
## parallelization: override the global parallelization settings for this task
## async_mode: override the global async_mode settings for this task
prompt: "prompts/community_report.txt"
max_length: 2000
max_input_length: 8000
cluster_graph:
max_cluster_size: 10
embed_graph:
enabled: false # if true, will generate node2vec embeddings for nodes
# num_walks: 10
# walk_length: 40
# window_size: 2
# iterations: 3
# random_seed: 597832
umap:
enabled: false # if true, will generate UMAP embeddings for nodes
snapshots:
graphml: false
raw_entities: false
top_level_nodes: false
local_search:
# text_unit_prop: 0.5
# community_prop: 0.1
# conversation_history_max_turns: 5
# top_k_mapped_entities: 10
# top_k_relationships: 10
# max_tokens: 12000
global_search:
# max_tokens: 12000
# data_max_tokens: 12000
# map_max_tokens: 1000
# reduce_max_tokens: 2000
# concurrency: 32
このスクラップは5ヶ月前にクローズされました