Closed11

メモ:EC2でstablediffusion

sun-yryrsun-yryr

環境

EC2
instance type: g4dn.xlarge
os: ubuntu 22.04
arch: x86
ebs: 30GB, gp3の最低スペック

おおよその値段: $0.7/hour + $2.88/month

sun-yryrsun-yryr

nvidia gpu driver と cudaを入れる

エラー1

sudo apt update
sudo apt install -y nvidia-cuda-toolkit ubuntu-drivers-common
sudo ubuntu-drivers install
nvidia-smi
# NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.
sun-yryrsun-yryr

apt autoremoveしてリトライ

$ sudo apt-key adv --fetch-keys "https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/3bf863cc.pub"
$ echo "deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64 /" | sudo tee /etc/apt/sources.list.d/cuda.list
$ sudo apt update
$ sudo apt upgrade -y
$ sudo apt install -y "linux-headers-$(uname -r)" build-essential
$ sudo apt install -y cuda-drivers

$ nvidia-smi
Sat Aug 27 08:31:40 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.65.01    Driver Version: 515.65.01    CUDA Version: 11.7     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla T4            Off  | 00000000:00:1E.0 Off |                    0 |
| N/A   79C    P0    33W /  70W |      2MiB / 15360MiB |      5%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

$ sudo apt install -y nvidia-cuda-toolkit
$ df -h
Filesystem       Size  Used Avail Use% Mounted on
/dev/root         20G  8.8G   11G  46% /
tmpfs            7.7G     0  7.7G   0% /dev/shm
tmpfs            3.1G  868K  3.1G   1% /run
tmpfs            5.0M     0  5.0M   0% /run/lock
/dev/nvme0n1p15  105M  5.3M  100M   5% /boot/efi
tmpfs            1.6G  4.0K  1.6G   1% /run/user/1000

参考
https://zenn.dev/190ikp/articles/how_to_install_nvidia_drivers

sun-yryrsun-yryr

anacondaを入れる

$ curl -o anaconda.sh "https://repo.anaconda.com/archive/Anaconda3-5.3.1-Linux-x86_64.sh"
# 一応ハッシュ値の確認 公式サイト→ https://repo.anaconda.com/archive/
$ md5sum anaconda.sh
$ bash anaconda.sh
# インストールウィザードを進める
$ source ~/.bashrc
$ conda --version
conda 4.5.11
sun-yryrsun-yryr

現時点の容量

$ df -h
Filesystem       Size  Used Avail Use% Mounted on
/dev/root         20G   13G  6.6G  66% /
tmpfs            7.7G     0  7.7G   0% /dev/shm
tmpfs            3.1G  868K  3.1G   1% /run
tmpfs            5.0M     0  5.0M   0% /run/lock
/dev/nvme0n1p15  105M  5.3M  100M   5% /boot/efi
tmpfs            1.6G  4.0K  1.6G   1% /run/user/1000
sun-yryrsun-yryr

stable diffusionを入れる

$ git clone https://github.com/CompVis/stable-diffusion
$ cd stable-diffusion
$ conda env create -f environment.yaml
OSError(28, 'No space left on device')
# はぁ......

EBSを30GBに変更〜

$ conda env create -f environment.yaml
# なんか言われたのでアップデート
$ conda update -n base -c defaults conda
$ conda activate ldm
$ pip install huggingface_hub
$ huggingface-cli login
$ cd ../
$ sudo apt install -y git-lfs
$ git clone https://huggingface.co/CompVis/stable-diffusion-v-1-4-original

# また容量不足か......

モデルの容量えぐいのでインスタンスモデルに退避する

$ sudo mkdir /data
$ sudo mount /dev/nvme1n1 /data
$ sudo chmod 757 /data
$ cd /data
$ git clone https://huggingface.co/CompVis/stable-diffusion-v-1-4-original
$ cd ~/stable-diffusion
$ python scripts/txt2img.py --prompt "Mt.Fuji" --plms --outdir ../outputs --ckpt "/data/stable-diffusion-v-1-4-original/sd-v1-4.ckpt"
sun-yryrsun-yryr

RuntimeError: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 803: system has unsupported display driver / cuda driver combination

nvidia-smi がエラーになった。なぜかnvidiaのドライバーが消えていた

$ sudo apt install -y cuda-drivers
sun-yryrsun-yryr

ローカルに持ってくる

# ローカルにて
scp -r -i "証明書" ubuntu@{アドレス}:~/outputs ./
sun-yryrsun-yryr

インスタンスストアだから、起動するたびにこれやる

# sudo mkdir /data いらないかも
sudo mkfs -t xfs /dev/nvme1n1
sudo mount /dev/nvme1n1 /data
sudo chmod 757 /data
cd /data
git clone https://huggingface.co/CompVis/stable-diffusion-v-1-4-original
cd ~/stable-diffusion
conda activate ldm
python scripts/txt2img.py --prompt "Mt.Fuji" --plms --outdir ../outputs --ckpt "/data/stable-diffusion-v-1-4-original/sd-v1-4.ckpt"
このスクラップは2022/08/27にクローズされました