📘

Kandinsky2.1をdiffusersとgoogle colabで試してみた

2023/06/14に公開

Kandinskyとは

Kandinsky 2.1はunCLIPとlatent diffusionからなるTransformerベースのstable diffusionです。なんか物凄いクオリティが、、diffusersのv0.17.0から使えるようになりました。
https://huggingface.co/kandinsky-community/kandinsky-2-1

リンク

Colab
github

準備

Google Colabを開き、メニューから「ランタイム→ランタイムのタイプを変更」でランタイムを「GPU」に変更します。

環境構築

インストール手順です。

!pip install diffusers transformers accelerate

推論

(1) Text-to-Image

# library
from diffusers import DiffusionPipeline
import torch

# load model
pipe_prior = DiffusionPipeline.from_pretrained("kandinsky-community/kandinsky-2-1-prior", torch_dtype=torch.float16)
pipe_prior.to("cuda")
t2i_pipe = DiffusionPipeline.from_pretrained("kandinsky-community/kandinsky-2-1", torch_dtype=torch.float16)
t2i_pipe.to("cuda")

# inference
prompt = "A alien cheeseburger creature eating itself, claymation, cinematic, moody lighting"
negative_prompt = "low quality, bad quality"

generator = torch.Generator(device="cuda").manual_seed(12)
image_embeds, negative_image_embeds = pipe_prior(prompt, negative_prompt, guidance_scale=1.0, generator=generator).to_tuple()

image = t2i_pipe(prompt, negative_prompt=negative_prompt, image_embeds=image_embeds, negative_image_embeds=negative_image_embeds).images[0]
# image.save("cheeseburger_monster.png")
display(image)

結果画像です。芸術的、、

(2) Text guided Image2Image

# library
from diffusers import KandinskyImg2ImgPipeline, KandinskyPriorPipeline
import torch

from PIL import Image
import requests
from io import BytesIO

# prepare input image
url = "https://raw.githubusercontent.com/CompVis/stable-diffusion/main/assets/stable-samples/img2img/sketch-mountains-input.jpg"
response = requests.get(url)
original_image = Image.open(BytesIO(response.content)).convert("RGB")
original_image = original_image.resize((768, 512))
display(original_image)

入力画像です。

# load model
# create prior
pipe_prior = KandinskyPriorPipeline.from_pretrained(
    "kandinsky-community/kandinsky-2-1-prior", torch_dtype=torch.float16
)
pipe_prior.to("cuda")

# create img2img pipeline
pipe = KandinskyImg2ImgPipeline.from_pretrained("kandinsky-community/kandinsky-2-1", torch_dtype=torch.float16)
pipe.to("cuda")

prompt = "A fantasy landscape, Cinematic lighting"
negative_prompt = "low quality, bad quality"

generator = torch.Generator(device="cuda").manual_seed(30)
image_embeds, negative_image_embeds = pipe_prior(prompt, negative_prompt, generator=generator).to_tuple()

out = pipe(
    prompt,
    image=original_image,
    image_embeds=image_embeds,
    negative_image_embeds=negative_image_embeds,
    height=768,
    width=768,
    strength=0.3,
)

display(out.images[0])

結果画像です。

ファー、、すげ

(3) Interpolate

# from diffusers import KandinskyPriorPipeline, KandinskyPipeline
from diffusers.utils import load_image
import PIL

import torch

# prepare input image
img1 = load_image(
    "https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main" "/kandinsky/cat.png"
)

img2 = load_image(
    "https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main" "/kandinsky/starry_night.jpeg"
)

# add all the conditions we want to interpolate, can be either text or image
images_texts = ["a cat", img1, img2]

# specify the weights for each condition in images_texts
weights = [0.3, 0.3, 0.4]


# inference
pipe_prior = KandinskyPriorPipeline.from_pretrained(
    "kandinsky-community/kandinsky-2-1-prior", torch_dtype=torch.float16
)
pipe_prior.to("cuda")

# We can leave the prompt empty
prompt = ""
prior_out = pipe_prior.interpolate(images_texts, weights)

pipe = KandinskyPipeline.from_pretrained("kandinsky-community/kandinsky-2-1", torch_dtype=torch.float16)
pipe.to("cuda")

image = pipe(prompt, **prior_out, height=768, width=768).images[0]

display(image)

スタイル変換もバッチリ

最後に

今回はdiffusersv0.17.0から正式に利用できるようになったkandinsky stable diffusionを利用してみました。うーん、なかなかにいい感じ。芸術的な作品を描くならこれ感がすごい

今後ともLLM, Diffusion model, Image Analysis, 3Dに関連する試した記事を投稿していく予定なのでよろしくお願いします。

Discussion