FastChatのGPTQを使うときに出会ったエラー

ピン留めされたアイテム

実行

python setup_cuda.py install

エラー

OSError: CUDA_HOME environment variable is not set. Please set it to your CUDA install root.

解決方法

システム環境変数の編集をします
値にpytorchと同じバージョンのCUDAを探し、変数CUDA_HOMEに変更する

CUDAのバージョンを確認方法

nvcc -V

しろ 3ヶ月前

CUDAとPyTorchのバージョンを合わせる

The detected CUDA version (12.3) mismatches the version that was used to compile
PyTorch (11.8). Please make sure to use the same CUDA versions.

しろ 3ヶ月前

モデルを8bitでインストールするときに出たエラー

実行コマンド

py -m fastchat.serve.cli --model-path rinna/youri-7b-chat --load-8bit

エラー

AttributeError: 'LlamaRotaryEmbedding' object has no attribute 'cos_cached'. Did you mean: 'sin_cached'?

解決方法

transformersのバージョンを下げる

pip install transformers==4.37.2

しろ 3ヶ月前に更新

FastChatでGPTQを使ったときに出たエラー

順序よく進めていたら出ないエラー

実行

py -m fastchat.serve.controller
py -m fastchat.serve.model_worker --model-path models/youri-7b-chat-gptq --gptq-wbits 4 --gptq-groupsize 128
py -m fastchat.serve.openai_api_server --host localhost --port 8000

エラー

openai.BadRequestError: Error code: 400 - {'object': 'error', 'message': 'Internal Server Error', 'code': 50001}

解決方法

セットアップした環境のrepositories/GPTQ-for-LLaMa内のrequirements.txtの必要ライブラリのインストール

備考

py -m fastchat.serve.model_workerの受付可能の時に送信したときにも出る

しろ 3ヶ月前

FastChatでGPTQ4 ビット推論を使うときに出たエラー

実行

python -m fastchat.serve.cli --model-path models/vicuna-7B-1.1-GPTQ-4bit-128g --gptq-wbits 4 --gptq-groupsize 128

KeyError: 'Cache only has 0 layers, attempted to access layer with index 0'

解決方法

セットアップした環境のrepositories/GPTQ-for-LLaMa内のrequirements.txtの必要ライブラリのインストール