M1 Maxのローカル環境にChatGPTのようなチャット出来る仕組みを構築するChatRWKVを動かしてみた
はじめに
shi3zさんの記事を読んで、ローカル環境でもChatGPTのような、人工知能チャットボットできるのは知ってはいましたが。
さらに、↓の記事を見てこの手順を、なぞれば手元のMacでも簡単に出来そうだなと思い試してみました。
うまく動いたので、手順のログを残しておきます。
実行環境
ハード
% system_profiler SPHardwareDataType
Hardware:
Hardware Overview:
Model Name: Mac Studio
Model Identifier: Mac13,1
Chip: Apple M1 Max
Total Number of Cores: 10 (8 performance and 2 efficiency)
Memory: 32 GB
OS
% sw_vers
ProductName: macOS
ProductVersion: 12.6.3
BuildVersion: 21G419
Git
HomebrewでインストールしたPythonを使用しています。
% git --version
git version 2.40.0
Python
HomebrewでインストールしたPythonを使用しています。
% python3 --version
Python 3.11.2
% pip3 --version
pip 23.0.1 from /opt/homebrew/lib/python3.11/site-packages/pip (python 3.11)
Visual Studio Code
テキストが編集できればエディターはどんなもので問題ありません。
% code --version
1.76.2
ee2b180d582a7f601fa6ecfdad8d9fd269ab1884
arm64
環境構築
ChatRWKV準備
作業フォルダーへ移動
自分の環境に合わせて移動します。
$ cd develpoment/Python/
ChatRWKVをCloneする
% git clone https://github.com/BlinkDL/ChatRWKV
パッケージの追加
足りパッケージをインストールします。
% pip3 install -r requirements.txt
% pip3 install numpy
% pip3 install torch
ソース編集
エディターでchat.pyを開いて編集します。
$ cd ChatRWKV
$ code .
3つの設定を変更します。
RUN_DEVICEをCPUへ変更します。
FLOAT_MODEをbf16へ変更します。
########################################################################################################
args.RUN_DEVICE = "cpu" # cuda // cpu
# fp16 (good for GPU, does NOT support CPU) // fp32 (good for CPU) // bf16 (worse accuracy, supports CPU)
args.FLOAT_MODE = "bf16"
MODEL_NAMEをRWKV-4-Pile-7B-Instruct-test4-20230326へ変更します。[1]
# Download RWKV-4 models from https://huggingface.co/BlinkDL (don't use Instruct-test models unless you use their prompt templates)
if CHAT_LANG == 'English':
args.MODEL_NAME = 'RWKV-4-Pile-7B-Instruct-test4-20230326'
# args.MODEL_NAME = '/fsx/BlinkDL/HF-MODEL/rwkv-4-pile-7b/RWKV-4-Pile-7B-20221115-8047'
モデル準備
ダウンロード
RWKV-4-Pile-7B-Instruct-test4-20230326.pth · BlinkDL/rwkv-4-pile-7b at main
↑のページの「NeverlandPeter」の下あたりにある「↓download」をクリックしてモデルをダウンロードします。
このモデルでメモリを約14.35GB使用します。メモリ32GBあればオンメモリーで動きます。
他サイズのモデルが使用したい場合はBlinkDL (BlinkDL)から探してみましょう。
配置
ChatRWKVの直下へ「RWKV-4-Pile-7B-Instruct-test4-20230326.pth」ファイルを移動します。[2]
コマンドは例です。ファイル場所やファイル名前は調整してください。
$ cd mv ~/Downloads/RWKV-4-Pile-7B-Instruct-test4-20230326.pth .
実行結果
英語
「python3 chat.py」を実行すると起動します。
「Run prompt...」のあたりで起動待ちがあります。数分待つと起動します。[3]
実行結果は下記のログを参照してください。[4]
% python3 chat.py
ChatRWKV project: https://github.com/BlinkDL/ChatRWKV
NOTE: This code is v1 and only for reference. Use v2 instead.
NOTE: This code is v1 and only for reference. Use v2 instead.
NOTE: This code is v1 and only for reference. Use v2 instead.
NOTE: This code is v1 and only for reference. Use v2 instead.
NOTE: This code is v1 and only for reference. Use v2 instead.
NOTE: This code is v1 and only for reference. Use v2 instead.
NOTE: This code is v1 and only for reference. Use v2 instead.
NOTE: This code is v1 and only for reference. Use v2 instead.
NOTE: This code is v1 and only for reference. Use v2 instead.
NOTE: This code is v1 and only for reference. Use v2 instead.
Loading ChatRWKV - English - cpu - bf16 - QA_PROMPT False
RWKV_JIT_ON 1
Loading model - RWKV-4-Pile-7B-Instruct-test4-20230326
blocks.0.ln1.weight bfloat16 cpu 4096
blocks.0.ln1.bias bfloat16 cpu 4096
blocks.0.ln2.weight bfloat16 cpu 4096
blocks.0.ln2.bias bfloat16 cpu 4096
blocks.0.att.time_decay float32 cpu 4096
blocks.0.att.time_first float32 cpu 4096
blocks.0.att.time_mix_k bfloat16 cpu 4096
blocks.0.att.time_mix_v bfloat16 cpu 4096
blocks.0.att.time_mix_r bfloat16 cpu 4096
blocks.0.att.key.weight bfloat16 cpu 4096 4096
blocks.0.att.value.weight bfloat16 cpu 4096 4096
blocks.0.att.receptance.weight bfloat16 cpu 4096 4096
blocks.0.att.output.weight bfloat16 cpu 4096 4096
blocks.0.ffn.time_mix_k bfloat16 cpu 4096
blocks.0.ffn.time_mix_r bfloat16 cpu 4096
blocks.0.ffn.key.weight bfloat16 cpu 4096 16384
blocks.0.ffn.receptance.weight bfloat16 cpu 4096 4096
blocks.0.ffn.value.weight bfloat16 cpu 16384 4096
..............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
ln_out.weight bfloat16 cpu 4096
ln_out.bias bfloat16 cpu 4096
head.weight bfloat16 cpu 50277 4096
n_layer 32 n_embd 4096 ctx_len 1024
Run prompt...
Commands:
say something --> chat with bot. use \n for new line.
+ --> alternate chat reply
+reset --> reset chat
+gen YOUR PROMPT --> free generation with any prompt. use \n for new line.
+qa YOUR QUESTION --> free generation - ask any question (just ask the question). use \n for new line.
+++ --> continue last free generation (only for +gen / +qa)
++ --> retry last free generation (only for +gen / +qa)
Now talk with the bot and enjoy. Remember to +reset periodically to clean up the bot's memory. Use RWKV-4 14B for best results.
This is not instruct-tuned for conversation yet, so don't expect good quality. Better use +gen for free generation.
Prompt is VERY important. Try all prompts on https://github.com/BlinkDL/ChatRWKV first.
Ready - English cpu bf16 QA_PROMPT=False RWKV-4-Pile-7B-Instruct-test4-20230326
The following is a verbose detailed conversation between Bob and a young girl Alice. Alice is intelligent, friendly and cute. Alice is unlikely to disagree with Bob.
Bob: Hello Alice, how are you doing?
Alice: Hi Bob! Thanks, I'm fine. What about you?
Bob: I am very good! It's nice to see you. Would you mind me chatting with you for a while?
Alice: Not at all! I'm listening.
Bob: hello!
Alice: hello!
Bob: Who are you?
Alice: I am Alice. Nice to meet you, how about you?
Bob: good evening
Alice: hello! nice to meet you too!
Bob: Do you know how high Mount Fuji ?
Alice: yes, Mount Fuji is a famous volcanic peak, its currently the third highest in the world, at 2986.7 meters above sea level. It is an iconic landmark of Japan and one of the Seven Summits.
Bob:
日本語
「RWKV-4-Pile-7B-Instruct-test4-20230326.pth」のまま日本語で話しかけたら。日本語で会話できた。
% python3 chat.py
ChatRWKV project: https://github.com/BlinkDL/ChatRWKV
NOTE: This code is v1 and only for reference. Use v2 instead.
NOTE: This code is v1 and only for reference. Use v2 instead.
NOTE: This code is v1 and only for reference. Use v2 instead.
NOTE: This code is v1 and only for reference. Use v2 instead.
NOTE: This code is v1 and only for reference. Use v2 instead.
NOTE: This code is v1 and only for reference. Use v2 instead.
NOTE: This code is v1 and only for reference. Use v2 instead.
NOTE: This code is v1 and only for reference. Use v2 instead.
NOTE: This code is v1 and only for reference. Use v2 instead.
NOTE: This code is v1 and only for reference. Use v2 instead.
Loading ChatRWKV - English - cpu - bf16 - QA_PROMPT False
RWKV_JIT_ON 1
Loading model - RWKV-4-Pile-7B-Instruct-test4-20230326
blocks.0.ln1.weight bfloat16 cpu 4096
blocks.0.ln1.bias bfloat16 cpu 4096
blocks.0.ln2.weight bfloat16 cpu 4096
blocks.0.ln2.bias bfloat16 cpu 4096
blocks.0.att.time_decay float32 cpu 4096
blocks.0.att.time_first float32 cpu 4096
blocks.0.att.time_mix_k bfloat16 cpu 4096
blocks.0.att.time_mix_v bfloat16 cpu 4096
blocks.0.att.time_mix_r bfloat16 cpu 4096
blocks.0.att.key.weight bfloat16 cpu 4096 4096
blocks.0.att.value.weight bfloat16 cpu 4096 4096
blocks.0.att.receptance.weight bfloat16 cpu 4096 4096
blocks.0.att.output.weight bfloat16 cpu 4096 4096
blocks.0.ffn.time_mix_k bfloat16 cpu 4096
blocks.0.ffn.time_mix_r bfloat16 cpu 4096
blocks.0.ffn.key.weight bfloat16 cpu 4096 16384
blocks.0.ffn.receptance.weight bfloat16 cpu 4096 4096
blocks.0.ffn.value.weight bfloat16 cpu 16384 4096
..............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
ln_out.weight bfloat16 cpu 4096
ln_out.bias bfloat16 cpu 4096
head.weight bfloat16 cpu 50277 4096
n_layer 32 n_embd 4096 ctx_len 1024
Run prompt...
Commands:
say something --> chat with bot. use \n for new line.
+ --> alternate chat reply
+reset --> reset chat
+gen YOUR PROMPT --> free generation with any prompt. use \n for new line.
+qa YOUR QUESTION --> free generation - ask any question (just ask the question). use \n for new line.
+++ --> continue last free generation (only for +gen / +qa)
++ --> retry last free generation (only for +gen / +qa)
Now talk with the bot and enjoy. Remember to +reset periodically to clean up the bot's memory. Use RWKV-4 14B for best results.
This is not instruct-tuned for conversation yet, so don't expect good quality. Better use +gen for free generation.
Prompt is VERY important. Try all prompts on https://github.com/BlinkDL/ChatRWKV first.
Ready - English cpu bf16 QA_PROMPT=False RWKV-4-Pile-7B-Instruct-test4-20230326
The following is a verbose detailed conversation between Bob and a young girl Alice. Alice is intelligent, friendly and cute. Alice is unlikely to disagree with Bob.
Bob: Hello Alice, how are you doing?
Alice: Hi Bob! Thanks, I'm fine. What about you?
Bob: I am very good! It's nice to see you. Would you mind me chatting with you for a while?
Alice: Not at all! I'm listening.
Bob: +i こんにちは
Alice: +i こんにちは Bob.
(simultaneous conversation)
Bob: +qa 富士山の高さを教えてください
山の高さは約3600メートルです。
Bob: まあそうか。たしかに山の高さはすごいですね。
Alice: そうですね。たしかに高いですね。
Bob: たしか最大の山だね。この山の海面をひっくり返せるほどの勢いがあるのかもしれませんね。
時計の分により固定した。眺望が広がりますよね。
Alice: そうかい。世界中にそのような大きな山があれば、その印象は変わってきたかもしれないと思いますね。海地によって見たら固定されたことなんかなくなります
Discussion