Open2024/12/21にコメント追加4

Raspberry Pi 4 で VILA-7B を動かす

VILA-7Bについて

GitHub

https://github.com/NVlabs/VILA を読むと、
https://github.com/mit-han-lab/TinyChatEngine へのリンクの記載がある。
これを使うことで、VILA-7Bをエッジデバイスで動作させられるとのこと。

Yoshiharu Kubota

WindowsのWSL上で動作させられることは既に確認できたため、
ラズパイで動作させられるか検証する

Key	Value
機種	Raspberry Pi 4 Model B
RAM	8GB
OS	Raspberry Pi OS (64-bit)
PRETTY_NAME	Debian GNU/Linux 11 (bullseye)

raspi@raspberrypi4:~ $ cat /etc/os-release
PRETTY_NAME="Debian GNU/Linux 12 (bookworm)"
NAME="Debian GNU/Linux"
VERSION_ID="12"
VERSION="12 (bookworm)"
VERSION_CODENAME=bookworm
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"

raspi@raspberrypi4:~ $ uname -a
Linux raspberrypi4 6.6.62+rpt-rpi-v8 #1 SMP PREEMPT Debian 1:6.6.62-1+rpt1 (2024-11-25) aarch64 GNU/Linux

Yoshiharu Kubota

git clone --recursive https://github.com/mit-han-lab/TinyChatEngine
cd TinyChatEngine
python -m venv .venv
source .venv/bin/activate
python -m pip install --upgrade pip
python -m pip install -r requirements.txt
cd llm
python tools/download_model.py --model LLaMA_3_8B_Instruct_awq_int4 --QM QM_ARM
# make chat -j # 並列化オプションを付けるとフリーズしてしまう
make chat

# 普通のチャットを試す
./chat

メモリが足りない

raspi@raspberrypi4:~/work/TinyChatEngine/llm $ ./chat
TinyChatEngine by MIT HAN Lab: https://github.com/mit-han-lab/TinyChatEngine
Using model: LLaMA_3_8B_Instruct
Using AWQ for 4bit quantization: https://github.com/mit-han-lab/llm-awq
Loading model... Killed

# ランレベルを切り変える
systemctl set-default multi-user.target

SWAPのサイズを確認
デフォルトでは512MG

raspi@raspberrypi4:~ $ sudo swapon --show
NAME      TYPE SIZE  USED PRIO
/var/swap file 512M 50.2M   -2

SWAPのサイズを増やす

sudo dphys-swapfile swapoff
cat /etc/dphys-swapfile

/etc/dphys-swapfile

# /etc/dphys-swapfile - user settings for dphys-swapfile package
# author Neil Franklin, last modification 2010.05.05
# copyright ETH Zuerich Physics Departement
#   use under either modified/non-advertising BSD or GPL license

# this file is sourced with . so full normal sh syntax applies

# the default settings are added as commented out CONF_*=* lines


# where we want the swapfile to be, this is the default
#CONF_SWAPFILE=/var/swap

# set size to absolute value, leaving empty (default) then uses computed value
#   you most likely don't want this, unless you have an special disk situation
CONF_SWAPSIZE=512

# set size to computed value, this times RAM size, dynamically adapts,
#   guarantees that there is enough swap without wasting disk space on excess
#CONF_SWAPFACTOR=2

# restrict size (computed and absolute!) to maximally this limit
#   can be set to empty for no limit, but beware of filled partitions!
#   this is/was a (outdated?) 32bit kernel limit (in MBytes), do not overrun it
#   but is also sensible on 64bit to prevent filling /var or even / partition
#CONF_MAXSWAP=2048

CONF_SWAPSIZE=512 を
CONF_SWAPSIZE=8192 に変更する

sudo vi /etc/dphys-swapfile
sudo dphys-swapfile setup

raspi@raspberrypi4:~ $ sudo dphys-swapfile setup
want /var/swap=8192MByte, restricting to config limit: 2048MBytes, checking existing: deleting wrong size file (536870912), generating swapfile ... of 2048MBytes

8192MBで設定したのだが、2048MBが限界とのこと

sudo dphys-swapfile swapon
sudo reboot

SWAPのサイズが2Gに拡張された

raspi@raspberrypi4:~ $ sudo swapon --show
NAME      TYPE SIZE USED PRIO
/var/swap file   2G   0B   -2

でも、結局 killed になってしまった...

raspi@raspberrypi4:~/work/TinyChatEngine/llm $ ./chat
TinyChatEngine by MIT HAN Lab: https://github.com/mit-han-lab/TinyChatEngine
Using model: LLaMA_3_8B_Instruct
Using AWQ for 4bit quantization: https://github.com/mit-han-lab/llm-awq
Loading model... Killed

マルチモーダルのモデルが動くか検証

python -m pip install termvisage
python tools/download_model.py --model VILA_7B_awq_int4_CLIP_ViT-L --QM QM_ARM
./vila ../assets/figures/vlm_demo/pedestrian.png

エラー

(.venv) raspi@raspberrypi4:~/work/TinyChatEngine/llm $
(.venv) raspi@raspberrypi4:~/work/TinyChatEngine/llm $ ./vila ../assets/figures/vlm_demo/pedestrian.png
TinyChatEngine by MIT HAN Lab: https://github.com/mit-han-lab/TinyChatEngine
Using model: VILA1.5_8B
Using AWQ for 4bit quantization: https://github.com/mit-han-lab/llm-awq
Loading model... No such file or directory: models/CLIP_ViT_Large/encoder/layer0/layer_norm1/weight.bin
terminate called after throwing an instance of 'char const*'
./vila: line 7:   857 Aborted                 ./chat VILA1.5_8B INT4 5 $image_path

なぜか
VILA1.5_8B モデルが使われてしまっているので、
VILA_7B を使うように変更する

--- a/llm/vila
+++ b/llm/vila
@@ -4,4 +4,4 @@ image_path="$1"
 termvisage $image_path -w 70

-./chat VILA1.5_8B INT4 5 $image_path
+./chat VILA_7B INT4 5 $image_path

(.venv) raspi@raspberrypi4:~/work/TinyChatEngine/llm $ ./vila ../assets/figures/vlm_demo/pedestrian.png
TinyChatEngine by MIT HAN Lab: https://github.com/mit-han-lab/TinyChatEngine
Using model: VILA_7B
Using AWQ for 4bit quantization: https://github.com/mit-han-lab/llm-awq
Loading model... No such file or directory: models/CLIP_ViT_Large/encoder/layer0/layer_norm1/weight.bin
terminate called after throwing an instance of 'char const*'
./vila: line 8:   892 Aborted                 ./chat VILA_7B INT4 5 $image_path

No such file or directory: models/CLIP_ViT_Large/encoder/layer0/layer_norm1/weight.bin
異なるモデルがロードされてしまっている
ソースコードを修正する必要がありそう。