💭

HF版のCommand R+を推論する。

2024/04/26に公開

LLM

生成 AI

tech

Command R+に関して

104B billion parameterのモデル。大きい。

環境

pip install 'git+https://github.com/huggingface/transformers.git' bitsandbytes accelerate

検証自体はdatabricksで実施しているので、その他環境に合わせて、ライブラリをインストールする。

リモートからダウンロードまたはローカルにモデルを保存する。

＊HF_TOKENはなくて良い。

from huggingface_hub import snapshot_download

LOCAL_PATH="保存する場所"
HF_TOKEN = "HFのトークン"

download_path = snapshot_download(repo_id="CohereForAI/c4ai-command-r-plus",
                                  local_dir=LOCAL_PATH,
                                  token=HF_TOKEN,
                                  local_dir_use_symlinks=False)

ローカルからモデルロード

import torch
from transformers import AutoTokenizer,AutoModelForCausalLM
LOCAL_PATH = "/Path/to/command-r-plus"

tokenizer = AutoTokenizer.from_pretrained(LOCAL_PATH)
model = AutoModelForCausalLM.from_pretrained(LOCAL_PATH, 
                                             device_map='auto',
                                             torch_dtype="auto")

推論

# Format message with the command-r-plus chat template
messages = [{"role": "user", "content": "8 + 17はいくらになりますか？"}]
input_ids = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt")
## <BOS_TOKEN><|START_OF_TURN_TOKEN|><|USER_TOKEN|>Hello, how are you?<|END_OF_TURN_TOKEN|><|START_OF_TURN_TOKEN|><|CHATBOT_TOKEN|>

gen_tokens = model.generate(
    input_ids, 
    max_new_tokens=100, 
    do_sample=True, 
    temperature=0.3,
    )

gen_text = tokenizer.decode(gen_tokens[0])
print(gen_text)

入力と出力結果

推論には少し時間がかかる。

<BOS_TOKEN><|START_OF_TURN_TOKEN|><|USER_TOKEN|>8 + 17はいくらになりますか？<|END_OF_TURN_TOKEN|><|START_OF_TURN_TOKEN|><|CHATBOT_TOKEN|>8 + 17 = 25<|END_OF_TURN_TOKEN|>

<BOS_TOKEN><|START_OF_TURN_TOKEN|><|USER_TOKEN|>あなたは優秀なエンジニアです。
上記の文を英訳してください。<|END_OF_TURN_TOKEN|><|START_OF_TURN_TOKEN|><|CHATBOT_TOKEN|>You are an excellent engineer.<|END_OF_TURN_TOKEN|>

GPU

69570MiB / 81920MiB

70GB使っていた。A100やH100一枚で推論は可能。

Command R+に関して

環境

リモートからダウンロードまたはローカルにモデルを保存する。

ローカルからモデルロード

推論

入力と出力結果

GPU

Discussion