Closed6

AutoTrain AdvancedでPhi-3のファインチューニング

kun432kun432

準備

環境

  • CPU: Intel Core i9-13900F
  • メモリ: 96GB
  • GPU: NVIDIA GeForce RTX 4090 24GB
  • OS: Ubutu 22.04

仮想環境作ってJupyterLabで実施する(ここは割愛)

パッケージインストール

!pip install autotrain-advanced torch torchvision torchaudio

autotrainのコマンドオプションは以下の様な感じ

!autotrain llm --help
usage: autotrain <command> [<args>] llm [-h] [--train] [--deploy]
                                        [--inference] [--username USERNAME]
                                        [--backend {local-cli,spaces-a10gl,spaces-a10gs,spaces-a100,spaces-t4m,spaces-t4s,spaces-cpu,spaces-cpuf}]
                                        [--token TOKEN] [--push-to-hub]
                                        --model MODEL --project-name
                                        PROJECT_NAME [--data-path DATA_PATH]
                                        [--train-split TRAIN_SPLIT]
                                        [--valid-split VALID_SPLIT]
                                        [--batch-size BATCH_SIZE]
                                        [--seed SEED] [--epochs EPOCHS]
                                        [--gradient_accumulation GRADIENT_ACCUMULATION]
                                        [--disable_gradient_checkpointing]
                                        [--lr LR]
                                        [--log {none,wandb,tensorboard}]
                                        [--text_column TEXT_COLUMN]
                                        [--rejected_text_column REJECTED_TEXT_COLUMN]
                                        [--prompt-text-column PROMPT_TEXT_COLUMN]
                                        [--model-ref MODEL_REF]
                                        [--warmup_ratio WARMUP_RATIO]
                                        [--optimizer OPTIMIZER]
                                        [--scheduler SCHEDULER]
                                        [--weight_decay WEIGHT_DECAY]
                                        [--max_grad_norm MAX_GRAD_NORM]
                                        [--add_eos_token]
                                        [--block_size BLOCK_SIZE] [--peft]
                                        [--lora_r LORA_R]
                                        [--lora_alpha LORA_ALPHA]
                                        [--lora_dropout LORA_DROPOUT]
                                        [--logging_steps LOGGING_STEPS]
                                        [--evaluation_strategy {epoch,steps,no}]
                                        [--save_total_limit SAVE_TOTAL_LIMIT]
                                        [--auto_find_batch_size]
                                        [--mixed_precision {fp16,bf16,None}]
                                        [--quantization {int4,int8,None}]
                                        [--model_max_length MODEL_MAX_LENGTH]
                                        [--max_prompt_length MAX_PROMPT_LENGTH]
                                        [--max_completion_length MAX_COMPLETION_LENGTH]
                                        [--trainer {default,dpo,sft,orpo,reward}]
                                        [--target_modules TARGET_MODULES]
                                        [--merge_adapter]
                                        [--use_flash_attention_2]
                                        [--dpo-beta DPO_BETA]
                                        [--chat_template {tokenizer,chatml,zephyr,None}]
                                        [--padding {left,right,None}]

✨ Run AutoTrain LLM

options:
  -h, --help            show this help message and exit
  --train               Command to train the model
  --deploy              Command to deploy the model (limited availability)
  --inference           Command to run inference (limited availability)
  --username USERNAME   Hugging Face Hub Username
  --backend {local-cli,spaces-a10gl,spaces-a10gs,spaces-a100,spaces-t4m,spaces-t4s,spaces-cpu,spaces-cpuf}
                        Backend to use: default or spaces. Spaces backend
                        requires push_to_hub & username. Advanced users only.
  --token TOKEN         Your Hugging Face API token. Token must have write
                        access to the model hub.
  --push-to-hub         Push to hub after training will push the trained model
                        to the Hugging Face model hub.
  --model MODEL         Base model to use for training
  --project-name PROJECT_NAME
                        Output directory / repo id for trained model (must be
                        unique on hub)
  --data-path DATA_PATH
                        Train dataset to use. When using cli, this should be a
                        directory path containing training and validation data
                        in appropriate formats
  --train-split TRAIN_SPLIT
                        Train dataset split to use
  --valid-split VALID_SPLIT
                        Validation dataset split to use
  --batch-size BATCH_SIZE, --train-batch-size BATCH_SIZE
                        Training batch size to use
  --seed SEED           Random seed for reproducibility
  --epochs EPOCHS       Number of training epochs
  --gradient_accumulation GRADIENT_ACCUMULATION, --gradient-accumulation GRADIENT_ACCUMULATION
                        Gradient accumulation steps
  --disable_gradient_checkpointing, --disable-gradient-checkpointing, --disable-gc
                        Disable gradient checkpointing
  --lr LR               Learning rate
  --log {none,wandb,tensorboard}
                        Use experiment tracking
  --text_column TEXT_COLUMN, --text-column TEXT_COLUMN
                        Specify the dataset column to use for text data. This
                        parameter is essential for models processing textual
                        information. Default is 'text'.
  --rejected_text_column REJECTED_TEXT_COLUMN, --rejected-text-column REJECTED_TEXT_COLUMN
                        Define the column to use for storing rejected text
                        entries, which are typically entries that do not meet
                        certain criteria for processing. Default is
                        'rejected'. Used only for orpo, dpo and reward
                        trainerss
  --prompt-text-column PROMPT_TEXT_COLUMN, --prompt-text-column PROMPT_TEXT_COLUMN
                        Identify the column that contains prompt text for
                        tasks requiring contextual inputs, such as
                        conversation or completion generation. Default is
                        'prompt'. Used only for dpo trainer
  --model-ref MODEL_REF
                        Reference model to use for DPO when not using PEFT
  --warmup_ratio WARMUP_RATIO, --warmup-ratio WARMUP_RATIO
                        Set the proportion of training allocated to warming up
                        the learning rate, which can enhance model stability
                        and performance at the start of training. Default is
                        0.1
  --optimizer OPTIMIZER
                        Choose the optimizer algorithm for training the model.
                        Different optimizers can affect the training speed and
                        model performance. 'adamw_torch' is used by default.
  --scheduler SCHEDULER
                        Select the learning rate scheduler to adjust the
                        learning rate based on the number of epochs. 'linear'
                        decreases the learning rate linearly from the initial
                        lr set. Default is 'linear'. Try 'cosine' for a cosine
                        annealing schedule.
  --weight_decay WEIGHT_DECAY, --weight-decay WEIGHT_DECAY
                        Define the weight decay rate for regularization, which
                        helps prevent overfitting by penalizing larger
                        weights. Default is 0.0
  --max_grad_norm MAX_GRAD_NORM, --max-grad-norm MAX_GRAD_NORM
                        Set the maximum norm for gradient clipping, which is
                        critical for preventing gradients from exploding
                        during backpropagation. Default is 1.0.
  --add_eos_token, --add-eos-token
                        Toggle whether to automatically add an End Of Sentence
                        (EOS) token at the end of texts, which can be critical
                        for certain types of models like language models. Only
                        used for `default` trainer
  --block_size BLOCK_SIZE, --block-size BLOCK_SIZE
                        Specify the block size for processing sequences. This
                        is maximum sequence length or length of one block of
                        text. Setting to -1 determines block size
                        automatically. Default is -1.
  --peft, --use-peft    Enable LoRA-PEFT
  --lora_r LORA_R, --lora-r LORA_R
                        Set the 'r' parameter for Low-Rank Adaptation (LoRA).
                        Default is 16.
  --lora_alpha LORA_ALPHA, --lora-alpha LORA_ALPHA
                        Specify the 'alpha' parameter for LoRA. Default is 32.
  --lora_dropout LORA_DROPOUT, --lora-dropout LORA_DROPOUT
                        Set the dropout rate within the LoRA layers to help
                        prevent overfitting during adaptation. Default is
                        0.05.
  --logging_steps LOGGING_STEPS, --logging-steps LOGGING_STEPS
                        Determine how often to log training progress in terms
                        of steps. Setting it to '-1' determines logging steps
                        automatically.
  --evaluation_strategy {epoch,steps,no}, --evaluation-strategy {epoch,steps,no}
                        Choose how frequently to evaluate the model's
                        performance, with 'epoch' as the default, meaning at
                        the end of each training epoch
  --save_total_limit SAVE_TOTAL_LIMIT, --save-total-limit SAVE_TOTAL_LIMIT
                        Limit the total number of saved model checkpoints to
                        manage disk usage effectively. Default is to save only
                        the latest checkpoint
  --auto_find_batch_size, --auto-find-batch-size
                        Automatically determine the optimal batch size based
                        on system capabilities to maximize efficiency.
  --mixed_precision {fp16,bf16,None}, --mixed-precision {fp16,bf16,None}
                        Choose the precision mode for training to optimize
                        performance and memory usage. Options are 'fp16',
                        'bf16', or None for default precision. Default is
                        None.
  --quantization {int4,int8,None}, --quantization {int4,int8,None}
                        Choose the quantization level to reduce model size and
                        potentially increase inference speed. Options include
                        'int4', 'int8', or None. Enabling requires --peft
  --model_max_length MODEL_MAX_LENGTH, --model-max-length MODEL_MAX_LENGTH
                        Set the maximum length for the model to process in a
                        single batch, which can affect both performance and
                        memory usage. Default is 1024
  --max_prompt_length MAX_PROMPT_LENGTH, --max-prompt-length MAX_PROMPT_LENGTH
                        Specify the maximum length for prompts used in
                        training, particularly relevant for tasks requiring
                        initial contextual input. Used only for `orpo`
                        trainer.
  --max_completion_length MAX_COMPLETION_LENGTH, --max-completion-length MAX_COMPLETION_LENGTH
                        Completion length to use, for orpo: encoder-decoder
                        models only
  --trainer {default,dpo,sft,orpo,reward}
                        Trainer type to use
  --target_modules TARGET_MODULES, --target-modules TARGET_MODULES
                        Identify specific modules within the model
                        architecture to target with adaptations or
                        optimizations, such as LoRA. Comma separated list of
                        module names. Default is 'all-linear'.
  --merge_adapter, --merge-adapter
                        Use this flag to merge PEFT adapter with the model
  --use_flash_attention_2, --use-flash-attention-2, --use-fa2
                        Use flash attention 2
  --dpo-beta DPO_BETA, --dpo-beta DPO_BETA
                        Beta for DPO trainer
  --chat_template {tokenizer,chatml,zephyr,None}, --chat-template {tokenizer,chatml,zephyr,None}
                        Apply a specific template for chat-based interactions,
                        with options including 'tokenizer', 'chatml',
                        'zephyr', or None. This setting can shape the model's
                        conversational behavior.
  --padding {left,right,None}, --padding {left,right,None}
                        Specify the padding direction for sequences, critical
                        for models sensitive to input alignment. Options
                        include 'left', 'right', or None
kun432kun432

SFT

SFTに使うデータセットは以下の2種類が使える模様

  • 単一のテキスト列のデータセット
  • チャットテンプレート
    • --chat-templateパラメータで指定。
    • 選択できるのは以下
      • zephyr
      • chatml
      • tokenizer
      • None

チャットテンプレートはよくわからなかったので、わかりやすい最初の例にしたがって、単一のテキスト列のデータセットを持つ日本語データセットを作成した。

https://huggingface.co/datasets/kun432/databricks-dolly-15k-ja-gozaru-simple

これを使ってファインチューニングを行う。自分はPhi-3をすでにダウンロードしていたのでディレクトリで指定しているが、HFのモデルパスでもOK。

!autotrain llm \
--train \
--model ./models/Phi-3-mini-4k-instruct \
--data-path kun432/databricks-dolly-15k-ja-gozaru-simple \
--lr 2e-4 \
--batch-size 2 \
--epochs 1 \
--trainer sft \
--peft \
--project-name kun432-Phi-3-mini-4k-instruct

なお以下のオプションを付けるとHuggingFaceへのアップロードも行ってくれる模様だが今回は指定しなかった。

(snip)
--username kun432 \
--push-to-hub \
--token $HF_TOKEN

こんな感じでSFTが行われる。

INFO     | 2024-04-26 08:44:25 | autotrain.trainers.common:on_train_begin:231 - Starting to train...
  0%|                                                  | 0/1813 [00:00<?, ?it/s]You are not running the flash-attention implementation, expect numerical differences.
  1%|▌                                        | 25/1813 [00:40<47:45,  1.60s/it]INFO     | 2024-04-26 08:45:05 | autotrain.trainers.common:on_log:226 - {'loss': 1.8595, 'grad_norm': 0.8130138516426086, 'learning_rate': 2.7472527472527476e-05, 'epoch': 0.013789299503585218}
{'loss': 1.8595, 'grad_norm': 0.8130138516426086, 'learning_rate': 2.7472527472527476e-05, 'epoch': 0.01}
(snip)
 99%|██████████████████████████████████████▋| 1800/1813 [48:32<00:21,  1.62s/it]INFO     | 2024-04-26 09:32:57 | autotrain.trainers.common:on_log:226 - {'loss': 1.2187, 'grad_norm': 0.4867367744445801, 'learning_rate': 1.594114040465972e-06, 'epoch': 0.9928295642581357}
{'loss': 1.2187, 'grad_norm': 0.4867367744445801, 'learning_rate': 1.594114040465972e-06, 'epoch': 0.99}
100%|███████████████████████████████████████| 1813/1813 [48:52<00:00,  1.37s/it]INFO     | 2024-04-26 09:33:17 | autotrain.trainers.common:on_log:226 - {'train_runtime': 2932.306, 'train_samples_per_second': 1.236, 'train_steps_per_second': 0.618, 'train_loss': 1.313210038670636, 'epoch': 1.0}
{'train_runtime': 2932.306, 'train_samples_per_second': 1.236, 'train_steps_per_second': 0.618, 'train_loss': 1.313210038670636, 'epoch': 1.0}
100%|███████████████████████████████████████| 1813/1813 [48:52<00:00,  1.62s/it]
INFO     | 2024-04-26 09:33:17 | __main__:train:574 - Finished training, saving model...

なお、今回は約50分で完了。VRAM使用量は18GB程度だった。

Fri Apr 26 08:45:51 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 545.23.08              Driver Version: 545.23.08    CUDA Version: 12.3     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 4090        On  | 00000000:01:00.0 Off |                  Off |
| 70%   77C    P2             425W / 450W |  18624MiB / 24564MiB |    100%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

できたモデルを使って推論

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

tokenizer = AutoTokenizer.from_pretrained(
    "./kun432-Phi-3-mini-4k-instruct"
)

model = AutoModelForCausalLM.from_pretrained(
    "./kun432-Phi-3-mini-4k-instruct",
    device_map="cuda",
    torch_dtype="auto",
    trust_remote_code=True,
)

chat = [
    {"role": "user", "content": "まどか☆マギカでは誰が一番かわいい?"},
]
prompt = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)

token_ids = tokenizer.encode(prompt, add_special_tokens=False, return_tensors="pt")
with torch.no_grad():
    output_ids = model.generate(
        token_ids.to(model.device),
        do_sample=True,
        temperature=0.6,
        max_new_tokens=256,
    )
output = tokenizer.decode(output_ids.tolist()[0][token_ids.size(1) :], skip_special_tokens=True)
print(output)

この質問は人工知能によって答えることができますが、人間が答えることが多いので、私は人間の答えを提供することになりますでござる。 人間は、ある程度固定されたランクであることを知っていますでござる。 まどか☆マギカのチャンネルでは、最も人気のあるキャラクターは、拙者たちの拙者たちを愛しているのですでござる。 そのキャラクターは、拙者たちの拙者たちを愛しているのですでござる。 最も人気のあるキャラクターは、拙者たちの拙者たちを愛しているので

chat = [
    {"role": "user", "content": "競馬で馬券を当てるにはどうすればいい?ポイントを5つ挙げて。"},
]
prompt = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)

token_ids = tokenizer.encode(prompt, add_special_tokens=False, return_tensors="pt")
with torch.no_grad():
    output_ids = model.generate(
        token_ids.to(model.device),
        do_sample=True,
        temperature=0.6,
        max_new_tokens=512,
    )
output = tokenizer.decode(output_ids.tolist()[0][token_ids.size(1) :], skip_special_tokens=True)
print(output)

ポイント1:競馬というアクティビティの習得を通じて、様々な種類の馬を知ることができ、それにより、より多くの選択肢ができるようになりますでござる。

ポイント2:馬の運動能力、健康状態、種類、そしてそれが競馬でどのように機能するかを理解することです。

ポイント3:準備をするための練習をすることです。準備の過程で、馬の運動能力や旅程を学び、それらを採用する必要があることを学ぶことができますでござる。

ポイント4:馬の選択と対戦相手の選択を分析することです。馬の特性、旅程、ランニングスピードなど、マスターに適したものを選択することができますでござる。

ポイント5:自信を持って、よくマスターにある優勝を収めることができますでござる。

kun432kun432

以下の学習に対応している模様。

 --trainer {default,dpo,sft,orpo,reward}
kun432kun432

AutoTrain Advanced、ファインチューニング経験なくても、とても簡単にできる印象。GUIもあるみたい。

このスクラップは10日前にクローズされました