Closed2024/04/15にクローズ3

HuggingFace PEFT

Hugging Face

LLM

Koichiro Mori

Parameter-Efficient Fine-Tuning

https://huggingface.co/docs/peft/index

PEFTの例として以下のような手法がある

LoRA
Prefix tuning
P tuning
Prompt tuning

https://huggingface.co/blog/peft

Koichiro Mori

概要

bigscience/bloom-7b1 を Abirate/english_quotes のデータで訓練
bloom-7bは70億パラメータのモデルだが、LoRAを使うと訓練パラメータは全体の0.11%の700万パラメータのみで済む

モデルのロード

model = AutoModelForCausalLM.from_pretrained(
    "bigscience/bloom-7b1",
    load_in_8bit=True,
    device_map='auto',
)

tokenizer = AutoTokenizer.from_pretrained("bigscience/bloom-7b1")

for param in model.parameters():
  param.requires_grad = False  # freeze the model - train adapters later
  if param.ndim == 1:
    # cast the small parameters (e.g. layernorm) to fp32 for stability
    param.data = param.data.to(torch.float32)

model.gradient_checkpointing_enable()  # reduce number of stored activations
model.enable_input_require_grads()

class CastOutputToFloat(nn.Sequential):
  def forward(self, x): return super().forward(x).to(torch.float32)
model.lm_head = CastOutputToFloat(model.lm_head)

パラメータの大部分はfreeze
ヘッドのみfloat32にする

LoRa Adapters

from peft import LoraConfig, get_peft_model

config = LoraConfig(
    r=16, #attention heads
    lora_alpha=32, #alpha scaling
    # target_modules=["q_proj", "v_proj"], #if you know the
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM" # set this for CLM or Seq2Seq
)

model = get_peft_model(model, config)
model.print_trainable_parameters()

trainable params: 7,864,320 || all params: 7,076,880,384 || trainable%: 0.11112693126452029

Koichiro Mori

PEFTのタイプ

このスクラップは2024/04/15にクローズされました