Closed2023/04/09にクローズ1

llama.cppをWindowsで動かす

John.K.Happy 2023/04/08

Gerganov氏の手順に従いllama.cppを実行してみる
https://github.com/ggerganov/llama.cpp

Ubuntuは以前llamaの動作は確認したので今回はWindows11の環境で実行する。
gitをインストールした上で下記のコードをPowerShellで実行する。

# リポジトリのクローンとビルド
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp

#For Windows and CMake, use the following command instead:
mkdir build
cd build
#wingetでcmakeをインストール
winget install cmake
#いったんPowerShellを再起動してビルド
cmake ..
cmake --build . --config Release

# obtain the original LLaMA model weights and place them in ./models
ls ./models
65B 30B 13B 7B tokenizer_checklist.chk tokenizer.model

# install Python dependencies
python3 -m pip install torch numpy sentencepiece

# convert the 7B model to ggml FP16 format
python3 convert-pth-to-ggml.py models/7B/ 1

# quantize the model to 4-bits (using method 2 = q4_0)
./quantize ./models/7B/ggml-model-f16.bin ./models/7B/ggml-model-q4_0.bin 2

# run the inference
./main -m ./models/7B/ggml-model-q4_0.bin -n 128

このスクラップは2023/04/09にクローズされました