Llama-2-7BをQLoRAファインチューニングしてござる口調にする

Koichiro Mori

これを追試してみたい。

https://note.com/npaka/n/na7c631175111

概要

Llama2-7Bを4bit量子化状態でFine-tuningするQLoRAを試す
ござるデータセットを使ってLLMの口調を変える
QLoRAのオリジナルリポジトリを使う
- 内部ではHuggingFaceのPEFTを使っている

https://github.com/artidoro/qlora

ベースモデル

Instructモデル
peftコマンドの引数で指定されている

データセット

Colab

Koichiro Mori

QLoRAのリポジトリをcloneしてから使う
ござるデータセットはAlpacaと同じフォーマット
一部コードの修正が必要
- ござるデータセットのリンクへ置き換え
- model.config.pretraining_tp = 1 の追加（7Bモデルのため）

!python qlora.py \
    --model_name meta-llama/Llama-2-7b-hf \
    --output_dir "./output/test_peft" \
    --dataset "alpaca" \
    --max_steps 1000 \
    --use_auth \
    --logging_steps 10 \
    --save_strategy steps \
    --data_seed 42 \
    --save_steps 50 \
    --save_total_limit 40 \
    --max_new_tokens 32 \
    --dataloader_num_workers 1 \
    --group_by_length \
    --logging_strategy steps \
    --remove_unused_columns False \
    --do_train \
    --lora_r 64 \
    --lora_alpha 16 \
    --lora_modules all \
    --double_quant \
    --quant_type nf4 \
    --bf16 \
    --bits 4 \
    --warmup_ratio 0.03 \
    --lr_scheduler_type constant \
    --gradient_checkpointing \
    --source_max_len 16 \
    --target_max_len 512 \
    --per_device_train_batch_size 1 \
    --gradient_accumulation_steps 16 \
    --eval_steps 187 \
    --learning_rate 0.0002 \
    --adam_beta2 0.999 \
    --max_grad_norm 0.3 \
    --lora_dropout 0.1 \
    --weight_decay 0.0 \
    --seed 0 \
    --load_in_4bit \
    --use_peft \
    --batch_size 4 \
    --gradient_accumulation_steps 2

Koichiro Mori

推論

推論するときはベースモデルをロードしたあとにLoRAモデルをロードする

# 推論
import torch
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig

# ベースモデルをロード
model_id = "meta-llama/Llama-2-7b-hf"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    quantization_config=BitsAndBytesConfig(
        load_in_4bit=True,
        bnb_4bit_use_double_quant=True,
        bnb_4bit_quant_type="nf4",
        bnb_4bit_compute_dtype=torch.bfloat16
    ),
    device_map={"":0}
)

# LoRAのロード
model = PeftModel.from_pretrained(
    model,
    "./output/test_peft/checkpoint-1000/adapter_model/",
    device_map={"":0}
)
model.eval()

Koichiro Mori

プロンプト

チューニングにつかったのはInstruct形式のプロンプト
プロンプトはチューニング時に使ったプロンプトと完全一致するほうがよさそうか？

# リポジトリでチューニングに使われているプロンプトと同じもの
# https://github.com/artidoro/qlora/blob/main/qlora.py#L517-L528
prompt = """Below is an instruction that describes a task.
Write a response that appropriately completes the request.

### Instruction:
富士山とは？

### Response: """

Koichiro Mori

推論結果

口調をうまく反映できたがときどき終了判定されずにだらだらと生成してしまうことがある
チューニングに使うプロンプトに終端記号を付けていないのが原因？

<s> Below is an instruction that describes a task.
Write a response that appropriately completes the request.

### Instruction:
富士山とは？

### Response: 
富士山は、日本の山でござる。知らんけど。</s>

<s> Below is an instruction that describes a task.
Write a response that appropriately completes the request.

### Instruction:
侍とは？

### Response: 
侍は、戦士や武士として、独特の価値観と規範を持つ人間の集団でござる。知らんけど。</s>

<s> Below is an instruction that describes a task.
Write a response that appropriately completes the request.

### Instruction:
人工知能とは？

### Response: 
人工知能は、人間が持つ知識、感情、行動を模倣したコンピュータプログラムでござる。知らんけど。</s>

Koichiro Mori

Llama-2-7BをQLoRAファインチューニングしてござる口調にする

概要

ベースモデル

データセット

Colab

推論

プロンプト

推論結果

NEXT

関連リンク