Closed11
メモ:EC2でstablediffusion
環境
EC2
instance type: g4dn.xlarge
os: ubuntu 22.04
arch: x86
ebs: 30GB, gp3の最低スペック
おおよその値段: $0.7/hour + $2.88/month
nvidia gpu driver と cudaを入れる
エラー1
sudo apt update
sudo apt install -y nvidia-cuda-toolkit ubuntu-drivers-common
sudo ubuntu-drivers install
nvidia-smi
# NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.
apt autoremoveしてリトライ
$ sudo apt-key adv --fetch-keys "https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/3bf863cc.pub"
$ echo "deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64 /" | sudo tee /etc/apt/sources.list.d/cuda.list
$ sudo apt update
$ sudo apt upgrade -y
$ sudo apt install -y "linux-headers-$(uname -r)" build-essential
$ sudo apt install -y cuda-drivers
$ nvidia-smi
Sat Aug 27 08:31:40 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.65.01 Driver Version: 515.65.01 CUDA Version: 11.7 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla T4 Off | 00000000:00:1E.0 Off | 0 |
| N/A 79C P0 33W / 70W | 2MiB / 15360MiB | 5% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
$ sudo apt install -y nvidia-cuda-toolkit
$ df -h
Filesystem Size Used Avail Use% Mounted on
/dev/root 20G 8.8G 11G 46% /
tmpfs 7.7G 0 7.7G 0% /dev/shm
tmpfs 3.1G 868K 3.1G 1% /run
tmpfs 5.0M 0 5.0M 0% /run/lock
/dev/nvme0n1p15 105M 5.3M 100M 5% /boot/efi
tmpfs 1.6G 4.0K 1.6G 1% /run/user/1000
参考
anacondaを入れる
$ curl -o anaconda.sh "https://repo.anaconda.com/archive/Anaconda3-5.3.1-Linux-x86_64.sh"
# 一応ハッシュ値の確認 公式サイト→ https://repo.anaconda.com/archive/
$ md5sum anaconda.sh
$ bash anaconda.sh
# インストールウィザードを進める
$ source ~/.bashrc
$ conda --version
conda 4.5.11
現時点の容量
$ df -h
Filesystem Size Used Avail Use% Mounted on
/dev/root 20G 13G 6.6G 66% /
tmpfs 7.7G 0 7.7G 0% /dev/shm
tmpfs 3.1G 868K 3.1G 1% /run
tmpfs 5.0M 0 5.0M 0% /run/lock
/dev/nvme0n1p15 105M 5.3M 100M 5% /boot/efi
tmpfs 1.6G 4.0K 1.6G 1% /run/user/1000
stable diffusionを入れる
$ git clone https://github.com/CompVis/stable-diffusion
$ cd stable-diffusion
$ conda env create -f environment.yaml
OSError(28, 'No space left on device')
# はぁ......
EBSを30GBに変更〜
$ conda env create -f environment.yaml
# なんか言われたのでアップデート
$ conda update -n base -c defaults conda
$ conda activate ldm
$ pip install huggingface_hub
$ huggingface-cli login
$ cd ../
$ sudo apt install -y git-lfs
$ git clone https://huggingface.co/CompVis/stable-diffusion-v-1-4-original
# また容量不足か......
モデルの容量えぐいのでインスタンスモデルに退避する
$ sudo mkdir /data
$ sudo mount /dev/nvme1n1 /data
$ sudo chmod 757 /data
$ cd /data
$ git clone https://huggingface.co/CompVis/stable-diffusion-v-1-4-original
$ cd ~/stable-diffusion
$ python scripts/txt2img.py --prompt "Mt.Fuji" --plms --outdir ../outputs --ckpt "/data/stable-diffusion-v-1-4-original/sd-v1-4.ckpt"
RuntimeError: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 803: system has unsupported display driver / cuda driver combination
nvidia-smi
がエラーになった。なぜかnvidiaのドライバーが消えていた
$ sudo apt install -y cuda-drivers
ローカルに持ってくる
# ローカルにて
scp -r -i "証明書" ubuntu@{アドレス}:~/outputs ./
インスタンスストアだから、起動するたびにこれやる
# sudo mkdir /data いらないかも
sudo mkfs -t xfs /dev/nvme1n1
sudo mount /dev/nvme1n1 /data
sudo chmod 757 /data
cd /data
git clone https://huggingface.co/CompVis/stable-diffusion-v-1-4-original
cd ~/stable-diffusion
conda activate ldm
python scripts/txt2img.py --prompt "Mt.Fuji" --plms --outdir ../outputs --ckpt "/data/stable-diffusion-v-1-4-original/sd-v1-4.ckpt"
デフォルトの設定(512*512が6枚+grid)で2分30秒
このスクラップは2022/08/27にクローズされました