1 Introducing Stable Diffusion

Evolution of the Diffusion model

Before Transformer and Attention

Convolutional Neural Networks (CNNs) and Residual Neural Networks (ResNets) dominated

Transformer transforms machine learning

CLIP from OpenAI makes a big difference

Generate images

Generative Adversarial Networks（GAN）は、非常に写実的な画像を生成することができる。しかし、生成過程でテキストプロンプトを利用することはできない。

denoising diffusion probabilistic model.
ノイズの多い画像から始めて、徐々にノイズを除去していくことで、より洗練された画像にすることができます。ノイズ除去プロセスは、ノイズの多い画像を徐々に元の画像の鮮明なバージョンに変換します。

このアプローチの背後にある考え方は独創的だ。任意の画像に対して、限られた数の正規分布ノイズ画像がオリジナル画像に追加され、効果的に完全なノイズ画像に変換される。CLIPモデルに導かれながら、この拡散プロセスを逆転させるモデルを訓練したらどうなるだろうか？驚くべきことに、このアプローチは機能する[4]。

Control Method	Functioning Stage	Usage Scenario
Textual Embedding	Text encoder	Add a new style, a new concept, or a new face
LoRA	Merge LoRA weights to the UNet model (and the CLIP text encoder, optional)	Add a set of styles, concepts, and generate content
Image-to-Image	Provide the initial latent image	Fix images, or add styles and concepts to images
ControlNet	ControlNet participant denoising together with a checkpoint model UNet	Control shape, pose, content detail

1 Introducing Stable Diffusion

Evolution of the Diffusion model

Before Transformer and Attention

Transformer transforms machine learning

CLIP from OpenAI makes a big difference

Generate images

DALL-E 2 and Stable Diffusion

Why Stable Diffusion

Which Stable Diffusion to use

2 Setting Up the Environment for Stable Diffusion

Hardware requirements to run Stable Diffusion

Storage

3 Generating Images Using Stable Diffusion

Generation seed

Sampling scheduler

Changing a model

Guidance scale

4 Understanding the Theory Behind Diffusion Models

Understanding the image-to-noise process

A more efficient forward diffusion process

The noise-to-image training process

Understanding Classifier Guidance denoising

その他参考

5 Understanding How Stable Diffusion Works

Stable Diffusion in latent space

Generating latent vectors using diffusers

Generating text embeddings using CLIP

Initializing time step embeddings

Initializing the Stable Diffusion UNet

Implementing a text-to-image Stable Diffusion inference pipeline

Implementing a text-guided image-to-image Stable Diffusion inference pipeline

Summary

6 Using Stable Diffusion Models

Loading the Diffusers model

Loading model checkpoints from safetensors and ckpt files

Using ckpt and safetensors files with Diffusers

Turning off the model safety checker

Converting the checkpoint model file to the Diffusers format

Using Stable Diffusion XL

7 Optimizing Performance and VRAM Usage

Optimization solution 1 – using the float16 or bfloat16 data type

Optimization solution 2 – enabling VAE tiling

Optimization solution 3 – enabling Xformers or using PyTorch 2.0

Optimization solution 4 – enabling sequential CPU offload

Optimization solution 5 – enabling model CPU offload

Optimization solution 6 – Token Merging (ToMe)

8 Using Community-Shared LoRAs

How does LoRA work?

Using LoRA with Diffusers

Applying a LoRA weight during loading

Diving into the internal structure of LoRA

Finding the A and B weight matrix from the LoRA file

Finding the corresponding checkpoint model layer name

Updating the checkpoint model weights

Making a function to load LoRA

Why LoRA works

9 Using Textual Inversion

Diffusers inference using TI

How TI works

Building a custom TI loader

TI in the pt file format

TI in bin file format

Detailed steps to build a TI loader

10 Overcoming 77-Token Limitations and Enabling Prompt Weighting

Understanding the 77-token limitation

Overcoming the 77-tokens limitation

Enabling long prompts with weighting

Overcoming the 77-token limitation using community pipelines

11 Image Restore and Super-Resolution

Understanding the terminologies

Upscaling images using Img2img diffusion

Img-to-Img limitations

ControlNet Tile image upscaling

Steps to use ControlNet Tile to upscale an image

Additional ControlNet Tile upscaling samples

12 Scheduled Prompt Parsing

Using the Compel package

Building a custom scheduled prompt pipeline

13 Generating Images with ControlNet

What is ControlNet and how is it different?