🗂

CompVis/stable-diffusionの環境をApple Siliconで構築する

2022/08/24に公開

Python

PyTorch

Stable Diffusion

tech

Macで画像を生成したいだけだったらGoogle Colab ではじめる Stable Diffusion v1.4｜npaka｜noteのColabで実行していることをローカルに持ってきて、CUDAのかわりにCPU使ってやればできると思う。

main.py

from diffusers import StableDiffusionPipeline

pipe = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4", use_auth_token=YOUR_TOKEN)
pipe.to("cpu")

prompt = """
a photograph of an astronaut riding a horse
"""

images = pipe(prompt)["sample"]
for image in images:
    image.save(f"{prompt}.png")

今回は別のモデルに変えたりPyTorchのMPSサポートとかCompVis/stable-diffusionが何をやっているのか理解しながらソースコードをいじりたかったのでリポジトリから構築した。

M1 MacBook ProでStable Diffusionを動かすまでのメモと手順はほぼ同じだけどcondaではなくasdf以下に入れたpython3.10の環境でやってる。

レポジトリのチェックアウト

git clone --depth 1 git@github.com:CompVis/stable-diffusion.git

magnusviriさんのMPSサポートブランチを追い掛ける。マージされてると見せかけてミスらしい。

CompVisのメインブランチにこの更新が入ったらこの手順はいらない

git remote add magnusviri git@github.com:magnusviri/stable-diffusion.git
git fetch magnusviri
git rebase -i magnusviri/apple-silicon-mps-support

ライブラリのインストール

pip install opencv-python omegaconf invisible-watermark einops clip kornia
pip install -e scripts/taming-transformers/

PyTorchの調整

pytorchはMPS対応したnightlyが必要。

pip3 install --pre torch torchvision pytorch_lightning --extra-index-url https://download.pytorch.org/whl/nightly/cpu

$ python3 -c 'import torch; print(torch.__version__) '
1.13.0.dev20220824

$ python -c"import torch; print(torch.backends.mps.is_available())"
True

さらに site-packages/torch/nn/functional.py へのモンキーパッチが必要になるらしい

layer_norm()内のinput が input.contiguous() になる

def layer_norm(
    input: Tensor,
    normalized_shape: List[int],
    weight: Optional[Tensor] = None,
    bias: Optional[Tensor] = None,
    eps: float = 1e-5,
) -> Tensor:
    r"""Applies Layer Normalization for last certain number of dimensions.

    See :class:`~torch.nn.LayerNorm` for details.
    """
    if has_torch_function_variadic(input, weight, bias):
        return handle_torch_function(
            layer_norm, (input.contiguous(), weight, bias), input, normalized_shape, weight=weight, bias=bias, eps=eps
        )
    return torch.layer_norm(input.contiguous(), normalized_shape, weight, bias, eps, torch.backends.cudnn.enabled)

モデルのダウンロード

git-lfsは巨大なファイルをGitで扱うやつ。

brew install git-lfs
# huggingfaceのログイン情報をpromptで入力
git lfs clone https://huggingface.co/CompVis/stable-diffusion-v-1-4-original  models/ldm/stable-diffusion-v1

txt2img.pyの実行

PYTHONPATHつけてるのは ./ldm を見付けてもらうため(PyPIにあるldmでは動かないので注意)。

PYTHONPATH=. python scripts/txt2img.py --ckpt models/ldm/stable-diffusion-v1/sd-v1-4.ckpt --prompt "a photograph of an astronaut riding a horse"

これで環境構築ができた。ただ最初の例のStableDiffusionPipelineの素のCPUで実行した時よりMPS+DDIサンプリングしたtxt2img.pyのが超時間かかっててなぜ・・と首をかしげている