🤪

CyberAgentの追加学習R1をMacで動かしてClineするDeepSeek-R1-Distill-Qwen-32B-Japanese

2025/01/27に公開

CAがDeepSeek R1 を日本語データで追加学習したモデルを公開した。
ので、M2 Ultra 128G な Macで動かしてClineでAI驚き屋御用達のTODOアプリを作らせる。

前提条件
M2 Ultra 128GB
そんなに重くないからM4でメモリ多めなら動くかも

試す

conda create --name ca python=3.10
conda activate ca

ライブラリをインスコする

pip install transformers
pip3 install torch torchvision torchaudio
python3 -m pip install tensorflow
python3 -c "import tensorflow as tf; 
pip install 'accelerate>=0.26.0'
pip install mlx-lm

量子化やmlxなCoreMLな形式にせずそのまま動かす main.py

from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer

model = AutoModelForCausalLM.from_pretrained("cyberagent/DeepSeek-R1-Distill-Qwen-32B-Japanese", device_map="auto", torch_dtype="auto")
tokenizer = AutoTokenizer.from_pretrained("cyberagent/DeepSeek-R1-Distill-Qwen-32B-Japanese")
streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)

messages = [
    {"role": "user", "content": "AIによって私たちの暮らしはどのように変わりますか？"}
]

input_ids = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").to(model.device)
output_ids = model.generate(input_ids,
                            max_new_tokens=4096,
                            temperature=0.7,
                            streamer=streamer)

動くが、かなり重い。

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:151643 for open-end generation.
The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.

<think>
まず、ユーザーが「AIによって私たちの暮らしはどのように変わりますか？」と尋ねているので、この質問に対する包括的な回答を考える必要があります。まず、AIが既にどのように影響を与えているかと、今後さらにどのように進化するかを整理します。

次

mlxなCoreMLな感じ+ 4bitに量子化する。（量子化しないと重いのは変わらなかった。）

mlx_lm.convert --hf-path cyberagent/DeepSeek-R1-Distill-Qwen-32B-Japanese --mlx-path ./mlx_model_4bit -q --q-bits 4

プロンプト（これはあまり正しくない叩き方）

mlx_lm.generate --model ./mlx_model_4bit --prompt "アニメ「ぼっち・ざ・ろっく！」の主人公である「ぼっちちゃん」の本名を教えて下さい。" --max-tokens 4096

動く！早い！ぼざろはしらない！

Clineで動かす

LM Studioは適当にインストールする。リンク: https://lmstudio.ai/

コンバートしたmlx_model_4bitのフォルダを /Users/admin/.lmstudio/models/ca に移動させる。

Clineで設定する

かわいそうなのでcreate-next-appはしてあげる。

npx create-next-app@latest .

実行させる

create-next-appしました。簡単なTODOアプリを作成してください。

動いた（4bitは早いので良い感じカモ）
（ここでエラーで死んだらMac再起動したら治った。LM studioの最大Token数をMAXにするのを忘れずに）

まあ動いた。

ソースコートはこれ

Clineが書いたコミットはコレ

Discussion

ログインするとコメントできます