🤖

M1 Maxのローカル環境にChatGPTのようなチャット出来る仕組みを構築するChatRWKVを動かしてみた

2023/03/28に公開

はじめに

shi3zさんの記事を読んで、ローカル環境でもChatGPTのような、人工知能チャットボットできるのは知ってはいましたが。
https://note.com/shi3zblog/n/na991171b8fdd

さらに、↓の記事を見てこの手順を、なぞれば手元のMacでも簡単に出来そうだなと思い試してみました。
https://note.com/ohnuma/n/nad3dec051684

うまく動いたので、手順のログを残しておきます。

実行環境

ハード

% system_profiler SPHardwareDataType
Hardware:

    Hardware Overview:

      Model Name: Mac Studio
      Model Identifier: Mac13,1
      Chip: Apple M1 Max
      Total Number of Cores: 10 (8 performance and 2 efficiency)
      Memory: 32 GB

OS

% sw_vers
ProductName:	macOS
ProductVersion:	12.6.3
BuildVersion:	21G419

Git

HomebrewでインストールしたPythonを使用しています。

% git --version
git version 2.40.0

Python

HomebrewでインストールしたPythonを使用しています。

% python3 --version
Python 3.11.2
% pip3 --version
pip 23.0.1 from /opt/homebrew/lib/python3.11/site-packages/pip (python 3.11)

Visual Studio Code

テキストが編集できればエディターはどんなもので問題ありません。

% code --version
1.76.2
ee2b180d582a7f601fa6ecfdad8d9fd269ab1884
arm64

環境構築

ChatRWKV準備

作業フォルダーへ移動

自分の環境に合わせて移動します。

$ cd develpoment/Python/

ChatRWKVをCloneする

% git clone https://github.com/BlinkDL/ChatRWKV

パッケージの追加

足りパッケージをインストールします。

% pip3 install -r requirements.txt
% pip3 install numpy
% pip3 install torch

ソース編集

エディターでchat.pyを開いて編集します。

$ cd ChatRWKV
$ code .

3つの設定を変更します。

RUN_DEVICEをCPUへ変更します。
FLOAT_MODEをbf16へ変更します。

chat.py
########################################################################################################

args.RUN_DEVICE = "cpu"  # cuda // cpu
# fp16 (good for GPU, does NOT support CPU) // fp32 (good for CPU) // bf16 (worse accuracy, supports CPU)
args.FLOAT_MODE = "bf16"

MODEL_NAMEをRWKV-4-Pile-7B-Instruct-test4-20230326へ変更します。[1]

chat.py
# Download RWKV-4 models from https://huggingface.co/BlinkDL (don't use Instruct-test models unless you use their prompt templates)

if CHAT_LANG == 'English':
    args.MODEL_NAME = 'RWKV-4-Pile-7B-Instruct-test4-20230326'
    # args.MODEL_NAME = '/fsx/BlinkDL/HF-MODEL/rwkv-4-pile-7b/RWKV-4-Pile-7B-20221115-8047'

モデル準備

ダウンロード

RWKV-4-Pile-7B-Instruct-test4-20230326.pth · BlinkDL/rwkv-4-pile-7b at main
↑のページの「NeverlandPeter」の下あたりにある「↓download」をクリックしてモデルをダウンロードします。
このモデルでメモリを約14.35GB使用します。メモリ32GBあればオンメモリーで動きます。

他サイズのモデルが使用したい場合はBlinkDL (BlinkDL)から探してみましょう。

配置

ChatRWKVの直下へ「RWKV-4-Pile-7B-Instruct-test4-20230326.pth」ファイルを移動します。[2]

コマンドは例です。ファイル場所やファイル名前は調整してください。

$ cd mv ~/Downloads/RWKV-4-Pile-7B-Instruct-test4-20230326.pth .

実行結果

英語

「python3 chat.py」を実行すると起動します。
「Run prompt...」のあたりで起動待ちがあります。数分待つと起動します。[3]

実行結果は下記のログを参照してください。[4]

% python3 chat.py

ChatRWKV project: https://github.com/BlinkDL/ChatRWKV
NOTE: This code is v1 and only for reference. Use v2 instead.
NOTE: This code is v1 and only for reference. Use v2 instead.
NOTE: This code is v1 and only for reference. Use v2 instead.
NOTE: This code is v1 and only for reference. Use v2 instead.
NOTE: This code is v1 and only for reference. Use v2 instead.
NOTE: This code is v1 and only for reference. Use v2 instead.
NOTE: This code is v1 and only for reference. Use v2 instead.
NOTE: This code is v1 and only for reference. Use v2 instead.
NOTE: This code is v1 and only for reference. Use v2 instead.
NOTE: This code is v1 and only for reference. Use v2 instead.

Loading ChatRWKV - English - cpu - bf16 - QA_PROMPT False

RWKV_JIT_ON 1

Loading model - RWKV-4-Pile-7B-Instruct-test4-20230326
blocks.0.ln1.weight              bfloat16   cpu    4096      
blocks.0.ln1.bias                bfloat16   cpu    4096      
blocks.0.ln2.weight              bfloat16   cpu    4096      
blocks.0.ln2.bias                bfloat16   cpu    4096      
blocks.0.att.time_decay          float32    cpu    4096      
blocks.0.att.time_first          float32    cpu    4096      
blocks.0.att.time_mix_k          bfloat16   cpu    4096      
blocks.0.att.time_mix_v          bfloat16   cpu    4096      
blocks.0.att.time_mix_r          bfloat16   cpu    4096      
blocks.0.att.key.weight          bfloat16   cpu    4096  4096
blocks.0.att.value.weight        bfloat16   cpu    4096  4096
blocks.0.att.receptance.weight   bfloat16   cpu    4096  4096
blocks.0.att.output.weight       bfloat16   cpu    4096  4096
blocks.0.ffn.time_mix_k          bfloat16   cpu    4096      
blocks.0.ffn.time_mix_r          bfloat16   cpu    4096      
blocks.0.ffn.key.weight          bfloat16   cpu    4096 16384
blocks.0.ffn.receptance.weight   bfloat16   cpu    4096  4096
blocks.0.ffn.value.weight        bfloat16   cpu   16384  4096
..............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
ln_out.weight                    bfloat16   cpu    4096      
ln_out.bias                      bfloat16   cpu    4096      
head.weight                      bfloat16   cpu   50277  4096

n_layer 32 n_embd 4096 ctx_len 1024

Run prompt...
Commands:
say something --> chat with bot. use \n for new line.
+ --> alternate chat reply
+reset --> reset chat

+gen YOUR PROMPT --> free generation with any prompt. use \n for new line.
+qa YOUR QUESTION --> free generation - ask any question (just ask the question). use \n for new line.
+++ --> continue last free generation (only for +gen / +qa)
++ --> retry last free generation (only for +gen / +qa)

Now talk with the bot and enjoy. Remember to +reset periodically to clean up the bot's memory. Use RWKV-4 14B for best results.
This is not instruct-tuned for conversation yet, so don't expect good quality. Better use +gen for free generation.

Prompt is VERY important. Try all prompts on https://github.com/BlinkDL/ChatRWKV first.

Ready - English cpu bf16 QA_PROMPT=False RWKV-4-Pile-7B-Instruct-test4-20230326

The following is a verbose detailed conversation between Bob and a young girl Alice. Alice is intelligent, friendly and cute. Alice is unlikely to disagree with Bob.

Bob: Hello Alice, how are you doing?
Alice: Hi Bob! Thanks, I'm fine. What about you?

Bob: I am very good! It's nice to see you. Would you mind me chatting with you for a while?
Alice: Not at all! I'm listening.

Bob: hello!
Alice: hello!

Bob: Who are you?
Alice: I am Alice. Nice to meet you, how about you?

Bob: good evening
Alice: hello! nice to meet you too!

Bob: Do you know how high Mount Fuji ?
Alice: yes, Mount Fuji is a famous volcanic peak, its currently the third highest in the world, at 2986.7 meters above sea level. It is an iconic landmark of Japan and one of the Seven Summits.

Bob:

日本語

「RWKV-4-Pile-7B-Instruct-test4-20230326.pth」のまま日本語で話しかけたら。日本語で会話できた。

% python3 chat.py                 


ChatRWKV project: https://github.com/BlinkDL/ChatRWKV
NOTE: This code is v1 and only for reference. Use v2 instead.
NOTE: This code is v1 and only for reference. Use v2 instead.
NOTE: This code is v1 and only for reference. Use v2 instead.
NOTE: This code is v1 and only for reference. Use v2 instead.
NOTE: This code is v1 and only for reference. Use v2 instead.
NOTE: This code is v1 and only for reference. Use v2 instead.
NOTE: This code is v1 and only for reference. Use v2 instead.
NOTE: This code is v1 and only for reference. Use v2 instead.
NOTE: This code is v1 and only for reference. Use v2 instead.
NOTE: This code is v1 and only for reference. Use v2 instead.

Loading ChatRWKV - English - cpu - bf16 - QA_PROMPT False

RWKV_JIT_ON 1

Loading model - RWKV-4-Pile-7B-Instruct-test4-20230326
blocks.0.ln1.weight              bfloat16   cpu    4096      
blocks.0.ln1.bias                bfloat16   cpu    4096      
blocks.0.ln2.weight              bfloat16   cpu    4096      
blocks.0.ln2.bias                bfloat16   cpu    4096      
blocks.0.att.time_decay          float32    cpu    4096      
blocks.0.att.time_first          float32    cpu    4096      
blocks.0.att.time_mix_k          bfloat16   cpu    4096      
blocks.0.att.time_mix_v          bfloat16   cpu    4096      
blocks.0.att.time_mix_r          bfloat16   cpu    4096      
blocks.0.att.key.weight          bfloat16   cpu    4096  4096
blocks.0.att.value.weight        bfloat16   cpu    4096  4096
blocks.0.att.receptance.weight   bfloat16   cpu    4096  4096
blocks.0.att.output.weight       bfloat16   cpu    4096  4096
blocks.0.ffn.time_mix_k          bfloat16   cpu    4096      
blocks.0.ffn.time_mix_r          bfloat16   cpu    4096      
blocks.0.ffn.key.weight          bfloat16   cpu    4096 16384
blocks.0.ffn.receptance.weight   bfloat16   cpu    4096  4096
blocks.0.ffn.value.weight        bfloat16   cpu   16384  4096
..............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
ln_out.weight                    bfloat16   cpu    4096      
ln_out.bias                      bfloat16   cpu    4096      
head.weight                      bfloat16   cpu   50277  4096

n_layer 32 n_embd 4096 ctx_len 1024

Run prompt...
Commands:
say something --> chat with bot. use \n for new line.
+ --> alternate chat reply
+reset --> reset chat

+gen YOUR PROMPT --> free generation with any prompt. use \n for new line.
+qa YOUR QUESTION --> free generation - ask any question (just ask the question). use \n for new line.
+++ --> continue last free generation (only for +gen / +qa)
++ --> retry last free generation (only for +gen / +qa)

Now talk with the bot and enjoy. Remember to +reset periodically to clean up the bot's memory. Use RWKV-4 14B for best results.
This is not instruct-tuned for conversation yet, so don't expect good quality. Better use +gen for free generation.

Prompt is VERY important. Try all prompts on https://github.com/BlinkDL/ChatRWKV first.

Ready - English cpu bf16 QA_PROMPT=False RWKV-4-Pile-7B-Instruct-test4-20230326

The following is a verbose detailed conversation between Bob and a young girl Alice. Alice is intelligent, friendly and cute. Alice is unlikely to disagree with Bob.

Bob: Hello Alice, how are you doing?
Alice: Hi Bob! Thanks, I'm fine. What about you?

Bob: I am very good! It's nice to see you. Would you mind me chatting with you for a while?
Alice: Not at all! I'm listening.

Bob: +i こんにちは
Alice: +i こんにちは Bob.
(simultaneous conversation)

Bob: +qa 富士山の高さを教えてください
 山の高さは約3600メートルです。

Bob: まあそうか。たしかに山の高さはすごいですね。
Alice: そうですね。たしかに高いですね。

 Bob: たしか最大の山だね。この山の海面をひっくり返せるほどの勢いがあるのかもしれませんね。
時計の分により固定した。眺望が広がりますよね。

Alice: そうかい。世界中にそのような大きな山があれば、その印象は変わってきたかもしれないと思いますね。海地によって見たら固定されたことなんかなくなります
脚注
  1. 使用するモデル名に変更してください。 ↩︎

  2. chat.pyが置いてある場所に移動してください ↩︎

  3. アクティビティーモニターのCPU使用率やメモリ使用量を確認していると動いていることが確認できます ↩︎

  4. チャットは英語で話しかけるために、DeepLで翻訳しながら進めました ↩︎

Discussion