nanoGPT で GPT 実装と training を味見するメモ
ChatGPT っぽいやつ, 自前でも実装や training できるように理解したいね...
nanoGPT がありました!
nanoGPT
ありがとうございます.
標準的な python/pytorch 環境がセットアップされていればそのまま動きます.
$ conda create -n nanogpt python=3.10
$ conda activate nanogpt
$ git clone https://github.com/karpathy/nanoGPT
requirements.txt
が無いので手動 pip します.
$ python -m pip install torch numpy transformers datasets tiktoken wandb tqdm
データセットを用意します.
$ python data/shakespeare_char/prepare.py
length of dataset in characters: 1,115,394
all the unique characters:
!$&',-.3:;?ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz
vocab size: 65
train has 1,003,854 tokens
val has 111,540 tokens
Train します!
$ python train.py config/train_shakespeare_char.py
GPU ない場合は --device=cpu
で CPU で train します.
iter 4930: loss 0.8147, time 83.83ms, mfu 4.40%
iter 4940: loss 0.8028, time 83.77ms, mfu 4.40%
iter 4950: loss 0.8258, time 82.90ms, mfu 4.41%
iter 4960: loss 0.8346, time 83.06ms, mfu 4.42%
iter 4970: loss 0.7972, time 83.55ms, mfu 4.42%
iter 4980: loss 0.8012, time 83.56ms, mfu 4.43%
iter 4990: loss 0.8412, time 83.07ms, mfu 4.43%
step 5000: train loss 0.6253, val loss 1.6949
iter 5000: loss 0.8252, time 9155.13ms, mfu 3.99%
$ python sample.py --out_dir=out-shakespeare-char
GLOUCESTER:
BUCKINGHAM:
How now, and you first: you rest like the crown,
'Tis some could in the swelling hath a soldier?
GLOUCESTER:
Then is Angelo, the time of the Bolingbroke,
I shall not be contented by him.
Come, know your age, rascal!
You will not conclude among by his households,
But not he to drink a humble for 'twas put to us,
And he was not to be but a wife of it.
ANGELO:
Happily, here, my lord, my lord, I have a virtue,
And I am slain. As you are you thrue as my soul.
ISABELLA:
Wha
---------------
Her charge is there, having through
That strived that his letter did trunk our interious
Embargard than this courtty bids thee down,
Destroy'd this married by me with the prince,
And do I now breathe most doubt of his night
To the royal disloyal drumset here: do not feel
In land-discontented upon her hate,
Bring in his war
To the earth o' the feast, whereof he does said him the
necessessity in it. A prisoner of the habit
To father of his haste. These of face bitings
Would you keep, I did the Cre
なんかそれっぽいのが出てきます.
fine tuning する!
fine tuning は train とそれほど変わるものではありません.
char ベースではないデータセット用意します.
$ python data/shakespeare/prepare.py
$ python train.py config/finetune_shakespeare.py
デフォでは gpt2-xl(1558M params. 6.4 GB) の pretrained model を huggingface から落としてきますので, 必要に応じてサイズ小さくしておきます.
今回は Tesla P100 16GB で fine tuning しました.
bf16 サポートしておらず, また pytorch2.0 での triton compiler では Tesla P100(CUDA capability 6.0)はサポートしていないため(CUDA capability 7.0 or later のみ),
train.py
で, float16
利用と compile = False
しておきます.
config/finetune_shakespeare.py
では, gpt2-medium
を指定しておきます(1.5 GB).
gpt2-large(3.2 GB) 以上は CUDA メモリ不足でした.
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 530.30.02 Driver Version: 530.30.02 CUDA Version: 12.1 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 Tesla P100-PCIE-16GB On | 00000000:03:00.0 Off | 0 |
| N/A 75C P0 129W / 125W| 9584MiB / 16384MiB | 100% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
メモリ消費は 9.5 GB ほど.
(fp16 では)概ね元モデルの 8~10 倍はメモリが必要という感じでしょうか.
...
step 15: train loss 3.2630, val loss 3.1418
iter 15: loss 3.1496, time 45894.53ms, mfu 0.89%
iter 16: loss 3.2725, time 25267.80ms, mfu 0.90%
iter 17: loss 2.8428, time 25303.24ms, mfu 0.91%
iter 18: loss 2.7150, time 25267.56ms, mfu 0.92%
iter 19: loss 2.8012, time 25260.80ms, mfu 0.93%
step 20: train loss 3.1946, val loss 3.1525
iter 20: loss 3.2055, time 47285.73ms, mfu 0.89%
WHAT IS THIS?
A GUY
THIS IS A MESSAGE FROM A BEAUTIFUL GUY
* * *
A GUY
This is a message from a lovely lady.
A GUY
This is a message from a mighty lady.
A GUY
This is a message from a not so very beautiful lady.
A GUY
This is a message from a very ugly lady.
A GUY
This is a message from a very ugly man.
A GUY
This is a message from a very ugly man.
...
なんかようわからん結果になりました...
ファインチューンパラメータ設定間違えたかしらん?
<|endoftext|>
とか出てくるからトークナイザがおかしい?
でもとりあえず味見できたからヨシ!
GPT2 再現(TODO)
8 x A100 があれば GPT2(gpt2-xl) 学習が 4 日でできます...!!!
fp16 化したり, Pytorch lightning(deepspeed)使えば 2 x 3090 or 4 x 3090 で学習できるかも?
もしくは 2 x 3090 x 4 node とか.
multinode で学習する場合, どこのご家庭にもある InfiniBand 接続 mini cluster がおすすめです.
QDR(40 Gbps(実効 32 Gbps)) x 8 の IS5022 スイッチは ebay 1.5 万円くらいで調達できます.
たまにメルカリやヤフオクでも同価格帯で売ってます.
あとはおとなしく gpt2-medium(350M params)を Tesla P100 16GB x 1 でじっくり時間かけて学習ですかね.
GPT2 再現学習は TODO ですね.
Discussion