🔔 あなたへのおすすめ記事「For you（β）」をリリースしました

Open2023/11/27にコメント追加17

stable-fastの検証

Stable Diffusion

だだっこぱんだ

これの検証
ControlNetもLoRAも使えてTensorRTとかAITemplate並みの高速化が期待できるらしい（本当か？）

だだっこぱんだ

まずはDiffusersで普通に検証
検証方法は10回走らせてそこから平均とかをとる。

環境

Windows10
RTX3090Ti
cpython@3.10.13
xformers==0.0.22.post7+cu118
torch==2.1.0+cu118
stable-fast @ https://github.com/chengzeyi/stable-fast/releases/download/v0.0.9/stable_fast-0.0.9+torch210cu118-cp310-cp310-win_amd64.whl
diffusers==0.23.0
ninja==1.11.1.1
transformers==4.35.0

普通のDiffusers

	Avg	Min	Max
512x512	1.962	1.937	2.021
1024x1024	9.582	9.528	9.637

Stable Fast

	Avg	Min	Max
512x512	2.495	1.808	8.477
1024x1024	9.793	7.722	15.699

だだっこぱんだ

環境

Ubuntu 20.04
RTX3090
python@3.10.12
xformers==0.0.22.post7
triton==2.1.0
torch==2.1.0+cu118
stable-fast @ https://github.com/chengzeyi/stable-fast/releases/download/v0.0.9/stable_fast-0.0.9+torch210cu118-cp310-cp310-manylinux2014_x86_64.whl
diffusers==0.21.2
ninja==1.11.1.1
transformers==4.31.0

普通のDiffusers

	Avg	Min	Max
512x512	1.482	1.477	1.486
1024x1024	7.748	7.614	7.861

Stable Fast

	Avg	Min	Max
512x512	1.391	1.217	2.935
1024x1024	6.562	6.319	8.031

だだっこぱんだ

いったんまとめ

Warmupが大事かも (5回くらい事前に回しておいたほうが良い)
やっぱLinuxがはやいよね

ComfyUIのほうで試してみたが、エラーが出たのでいったんissueを立てた

** ComfyUI start up time: 2023-11-13 12:24:24.636300

Prestartup times for custom nodes:
   0.0 seconds: D:\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-Manager

Total VRAM 10240 MB, total RAM 64664 MB
xformers version: 0.0.22.post7+cu118
Set vram state to: NORMAL_VRAM
Device: cuda:0 NVIDIA GeForce RTX 3080 : cudaMallocAsync
VAE dtype: torch.bfloat16
Using xformers cross attention
### Loading: ComfyUI-Manager (V0.30.4)
### ComfyUI Revision: 1677 [4aeef781]

Import times for custom nodes:
   0.0 seconds: D:\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI_stable_fast
   0.0 seconds: D:\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-WD14-Tagger
   0.4 seconds: D:\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-Manager

Starting server

To see the GUI go to: http://127.0.0.1:8188
FETCH DATA from: D:\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-Manager\extension-node-map.json
got prompt
model_type EPS
adm 2816
Using xformers attention in VAE
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
Using xformers attention in VAE
missing {'cond_stage_model.clip_l.logit_scale', 'cond_stage_model.clip_l.text_projection'}
left over keys: dict_keys(['cond_stage_model.clip_l.transformer.text_model.embeddings.position_ids'])
triton not installed, skip
Requested to load SDXLClipModel
Loading 1 new model
D:\ComfyUI_windows_portable\python_embeded\lib\site-packages\torch\_utils.py:831: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  return self.fget.__get__(instance, owner)()
Requested to load SDXL
Loading 1 new model
D:\ComfyUI_windows_portable\python_embeded\lib\site-packages\sfast\utils\flat_tensors.py:157: TracerWarning: Converting a tensor to a Python number might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  obj_type = tensors[start].item()
D:\ComfyUI_windows_portable\python_embeded\lib\site-packages\sfast\utils\flat_tensors.py:216: TracerWarning: Converting a tensor to a Python number might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  size = tensors[start].item()
D:\ComfyUI_windows_portable\python_embeded\lib\site-packages\sfast\utils\flat_tensors.py:226: TracerWarning: Converting a tensor to a Python number might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  size = tensors[start].item()
D:\ComfyUI_windows_portable\python_embeded\lib\site-packages\sfast\utils\flat_tensors.py:212: TracerWarning: Converting a tensor to a Python list might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  return bytes(tensors[start].tolist()), start + 1
D:\ComfyUI_windows_portable\python_embeded\lib\site-packages\sfast\utils\flat_tensors.py:203: TracerWarning: Converting a tensor to a Python number might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  return int(tensors[start].item()), start + 1
D:\ComfyUI_windows_portable\ComfyUI\comfy\ldm\modules\diffusionmodules\openaimodel.py:619: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  assert y.shape[0] == x.shape[0]
D:\ComfyUI_windows_portable\ComfyUI\comfy\ldm\modules\diffusionmodules\openaimodel.py:125: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  assert x.shape[1] == self.channels
D:\ComfyUI_windows_portable\ComfyUI\comfy\ldm\modules\diffusionmodules\openaimodel.py:83: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  assert x.shape[1] == self.channels
D:\ComfyUI_windows_portable\python_embeded\lib\site-packages\sfast\utils\flat_tensors.py:21: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
  return torch.tensor([num], dtype=torch.int64)

だだっこぱんだ

SDXLでやるとわりかし悲しい結果になった
warmupに15分くらいかかる

普通のDiffusers

	Avg	Min	Max
1024x1024	8.420	8.276	8.571

Stable Fast

	Avg	Min	Max
1024x1024	8.428	7.986	11.197

Nightly buildを試してみる

一応動いた
Comfy + Nightly + SDXL + LoRA

** ComfyUI start up time: 2023-11-13 17:38:24.795915

Prestartup times for custom nodes:
   0.0 seconds: D:\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-Manager

Total VRAM 10240 MB, total RAM 64664 MB
xformers version: 0.0.22.post7+cu118
Set vram state to: NORMAL_VRAM
Device: cuda:0 NVIDIA GeForce RTX 3080 : cudaMallocAsync
VAE dtype: torch.bfloat16
Using xformers cross attention
### Loading: ComfyUI-Manager (V0.30.4)
### ComfyUI Revision: 1677 [4aeef781]

Import times for custom nodes:
   0.0 seconds: D:\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI_stable_fast
   0.0 seconds: D:\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-WD14-Tagger
   0.3 seconds: D:\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-Manager

Starting server

To see the GUI go to: http://127.0.0.1:8188
FETCH DATA from: D:\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-Manager\extension-node-map.json
got prompt
model_type EPS
adm 2816
Using xformers attention in VAE
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
Using xformers attention in VAE
missing {'cond_stage_model.clip_l.text_projection', 'cond_stage_model.clip_l.logit_scale'}
left over keys: dict_keys(['cond_stage_model.clip_l.transformer.text_model.embeddings.position_ids'])
triton not installed, skip
Requested to load SDXLClipModel
Loading 1 new model
D:\ComfyUI_windows_portable\python_embeded\lib\site-packages\torch\_utils.py:831: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  return self.fget.__get__(instance, owner)()
Requested to load SDXL
Loading 1 new model
D:\ComfyUI_windows_portable\python_embeded\lib\site-packages\sfast\utils\flat_tensors.py:157: TracerWarning: Converting a tensor to a Python number might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  obj_type = tensors[start].item()
D:\ComfyUI_windows_portable\python_embeded\lib\site-packages\sfast\utils\flat_tensors.py:216: TracerWarning: Converting a tensor to a Python number might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  size = tensors[start].item()
D:\ComfyUI_windows_portable\python_embeded\lib\site-packages\sfast\utils\flat_tensors.py:226: TracerWarning: Converting a tensor to a Python number might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  size = tensors[start].item()
D:\ComfyUI_windows_portable\python_embeded\lib\site-packages\sfast\utils\flat_tensors.py:212: TracerWarning: Converting a tensor to a Python list might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  return bytes(tensors[start].tolist()), start + 1
D:\ComfyUI_windows_portable\python_embeded\lib\site-packages\sfast\utils\flat_tensors.py:203: TracerWarning: Converting a tensor to a Python number might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  return int(tensors[start].item()), start + 1
D:\ComfyUI_windows_portable\ComfyUI\comfy\ldm\modules\diffusionmodules\openaimodel.py:619: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  assert y.shape[0] == x.shape[0]
D:\ComfyUI_windows_portable\ComfyUI\comfy\ldm\modules\diffusionmodules\openaimodel.py:125: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  assert x.shape[1] == self.channels
D:\ComfyUI_windows_portable\ComfyUI\comfy\ldm\modules\diffusionmodules\openaimodel.py:83: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  assert x.shape[1] == self.channels
D:\ComfyUI_windows_portable\python_embeded\lib\site-packages\sfast\utils\flat_tensors.py:21: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
  return torch.tensor([num], dtype=torch.int64)
D:\ComfyUI_windows_portable\python_embeded\lib\site-packages\torch\cuda\graphs.py:88: UserWarning: The CUDA Graph is empty. This usually means that the graph was attempted to be captured on wrong device or stream. (Triggered internally at ..\aten\src\ATen\cuda\CUDAGraph.cpp:193.)
  super().capture_end()

100%|██████████████████████████████████████████████████████████████████████████████████| 55/55 [08:32<00:00,  1.50it/s]
100%|██████████████████████████████████████████████████████████████████████████████████| 55/55 [08:32<00:00,  9.32s/it]
[93mWarning: Your graphics card doesn't have enough video memory to keep the model. Disable stable fast cuda graph, Flexibility will be improved but speed will be lost.[0m
Prompt executed in 517.57 seconds

ComfyUI + stable-fast手順書

stable-fastを入れる(使用するwheelは Nightly )

pip3 install 'diffusers>=0.19.3' 'xformers>=0.0.20' 'torch>=1.12.0' <Nightly wheel file>

ComfyUI_stable_fastを入れる
Add Node→loaders→Apply StableFast Unet
配線
生成(初回は時間かかる)

新しいカスタムノードのコミットと、新しいNightlyビルドを試す

Nightlyビルド消えてたので、devブランチをビルド

stable-fast、v0.0.10のタグが付いたのでそっち待つ(Windowsじゃもしかしてビルドできない?)

ComfyUI_stabble_fastの新しいコミット、初回のApplyが500秒から、470秒で-30秒くらい
3回目以降の生成は25秒から17秒でそこそこ早くなった

ばり早くなった

だだっこぱんだ

ということで新しい環境での比較

Windows

普通のDiffusers

	Avg	Min	Max
512x512 (SD1.5)	1.811	1.644	2.108
1024x1024 (SD1.5)	11.252	9.584	12.069
1024x1024 (SDXL)	11.501	11.112	12.119

Stable Fast

	Avg	Min	Max
512x512 (SD1.5)	1.434	1.425	1.444
1024x1024 (SD1.5)	7.676	7.638	7.720
1024x1024 (SDXL)	9.352	8.168	14.683

だだっこぱんだ

Linux

普通のDiffusers

	Avg	Min	Max
512x512 (SD1.5)	1.502	1.495	1.508
1024x1024 (SD1.5)	7.893	7.801	7.967
1024x1024 (SDXL)	8.569	8.447	8.672

Stable Fast

	Avg	Min	Max
512x512 (SD1.5)	1.124	1.122	1.126
1024x1024 (SD1.5)	5.623	5.590	5.647
1024x1024 (SDXL)	5.994	5.932	6.030

だだっこぱんだ

stable-fast v0.0.12.post3

環境

WSL Ubuntu2204
RTX 3090Ti

普通のDiffusers

	Avg	Min	Max
512x512 (SD1.5)	1.331	1.177	1.772

Stable Fast

	Avg	Min	Max
512x512 (SD1.5)	0.882	0.803	0.973

ログインするとコメントできます