Open17

stable-fastの検証

だだっこぱんだだだっこぱんだ

まずはDiffusersで普通に検証
検証方法は10回走らせてそこから平均とかをとる。

環境


普通のDiffusers

Avg Min Max
512x512 1.962 1.937 2.021
1024x1024 9.582 9.528 9.637

Stable Fast

Avg Min Max
512x512 2.495 1.808 8.477
1024x1024 9.793 7.722 15.699
だだっこぱんだだだっこぱんだ

環境


普通のDiffusers

Avg Min Max
512x512 1.482 1.477 1.486
1024x1024 7.748 7.614 7.861

Stable Fast

Avg Min Max
512x512 1.391 1.217 2.935
1024x1024 6.562 6.319 8.031
R0w9hR0w9h

ComfyUIのほうで試してみたが、エラーが出たのでいったんissueを立てた
https://github.com/gameltb/ComfyUI_stable_fast/issues/3

** ComfyUI start up time: 2023-11-13 12:24:24.636300

Prestartup times for custom nodes:
   0.0 seconds: D:\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-Manager

Total VRAM 10240 MB, total RAM 64664 MB
xformers version: 0.0.22.post7+cu118
Set vram state to: NORMAL_VRAM
Device: cuda:0 NVIDIA GeForce RTX 3080 : cudaMallocAsync
VAE dtype: torch.bfloat16
Using xformers cross attention
### Loading: ComfyUI-Manager (V0.30.4)
### ComfyUI Revision: 1677 [4aeef781]

Import times for custom nodes:
   0.0 seconds: D:\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI_stable_fast
   0.0 seconds: D:\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-WD14-Tagger
   0.4 seconds: D:\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-Manager

Starting server

To see the GUI go to: http://127.0.0.1:8188
FETCH DATA from: D:\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-Manager\extension-node-map.json
got prompt
model_type EPS
adm 2816
Using xformers attention in VAE
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
Using xformers attention in VAE
missing {'cond_stage_model.clip_l.logit_scale', 'cond_stage_model.clip_l.text_projection'}
left over keys: dict_keys(['cond_stage_model.clip_l.transformer.text_model.embeddings.position_ids'])
triton not installed, skip
Requested to load SDXLClipModel
Loading 1 new model
D:\ComfyUI_windows_portable\python_embeded\lib\site-packages\torch\_utils.py:831: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  return self.fget.__get__(instance, owner)()
Requested to load SDXL
Loading 1 new model
D:\ComfyUI_windows_portable\python_embeded\lib\site-packages\sfast\utils\flat_tensors.py:157: TracerWarning: Converting a tensor to a Python number might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  obj_type = tensors[start].item()
D:\ComfyUI_windows_portable\python_embeded\lib\site-packages\sfast\utils\flat_tensors.py:216: TracerWarning: Converting a tensor to a Python number might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  size = tensors[start].item()
D:\ComfyUI_windows_portable\python_embeded\lib\site-packages\sfast\utils\flat_tensors.py:226: TracerWarning: Converting a tensor to a Python number might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  size = tensors[start].item()
D:\ComfyUI_windows_portable\python_embeded\lib\site-packages\sfast\utils\flat_tensors.py:212: TracerWarning: Converting a tensor to a Python list might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  return bytes(tensors[start].tolist()), start + 1
D:\ComfyUI_windows_portable\python_embeded\lib\site-packages\sfast\utils\flat_tensors.py:203: TracerWarning: Converting a tensor to a Python number might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  return int(tensors[start].item()), start + 1
D:\ComfyUI_windows_portable\ComfyUI\comfy\ldm\modules\diffusionmodules\openaimodel.py:619: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  assert y.shape[0] == x.shape[0]
D:\ComfyUI_windows_portable\ComfyUI\comfy\ldm\modules\diffusionmodules\openaimodel.py:125: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  assert x.shape[1] == self.channels
D:\ComfyUI_windows_portable\ComfyUI\comfy\ldm\modules\diffusionmodules\openaimodel.py:83: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  assert x.shape[1] == self.channels
D:\ComfyUI_windows_portable\python_embeded\lib\site-packages\sfast\utils\flat_tensors.py:21: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
  return torch.tensor([num], dtype=torch.int64)

だだっこぱんだだだっこぱんだ

SDXLでやるとわりかし悲しい結果になった
warmupに15分くらいかかる

普通のDiffusers

Avg Min Max
1024x1024 8.420 8.276 8.571

Stable Fast

Avg Min Max
1024x1024 8.428 7.986 11.197
R0w9hR0w9h
R0w9hR0w9h

一応動いた
Comfy + Nightly + SDXL + LoRA

** ComfyUI start up time: 2023-11-13 17:38:24.795915

Prestartup times for custom nodes:
   0.0 seconds: D:\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-Manager

Total VRAM 10240 MB, total RAM 64664 MB
xformers version: 0.0.22.post7+cu118
Set vram state to: NORMAL_VRAM
Device: cuda:0 NVIDIA GeForce RTX 3080 : cudaMallocAsync
VAE dtype: torch.bfloat16
Using xformers cross attention
### Loading: ComfyUI-Manager (V0.30.4)
### ComfyUI Revision: 1677 [4aeef781]

Import times for custom nodes:
   0.0 seconds: D:\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI_stable_fast
   0.0 seconds: D:\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-WD14-Tagger
   0.3 seconds: D:\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-Manager

Starting server

To see the GUI go to: http://127.0.0.1:8188
FETCH DATA from: D:\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-Manager\extension-node-map.json
got prompt
model_type EPS
adm 2816
Using xformers attention in VAE
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
Using xformers attention in VAE
missing {'cond_stage_model.clip_l.text_projection', 'cond_stage_model.clip_l.logit_scale'}
left over keys: dict_keys(['cond_stage_model.clip_l.transformer.text_model.embeddings.position_ids'])
triton not installed, skip
Requested to load SDXLClipModel
Loading 1 new model
D:\ComfyUI_windows_portable\python_embeded\lib\site-packages\torch\_utils.py:831: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  return self.fget.__get__(instance, owner)()
Requested to load SDXL
Loading 1 new model
D:\ComfyUI_windows_portable\python_embeded\lib\site-packages\sfast\utils\flat_tensors.py:157: TracerWarning: Converting a tensor to a Python number might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  obj_type = tensors[start].item()
D:\ComfyUI_windows_portable\python_embeded\lib\site-packages\sfast\utils\flat_tensors.py:216: TracerWarning: Converting a tensor to a Python number might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  size = tensors[start].item()
D:\ComfyUI_windows_portable\python_embeded\lib\site-packages\sfast\utils\flat_tensors.py:226: TracerWarning: Converting a tensor to a Python number might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  size = tensors[start].item()
D:\ComfyUI_windows_portable\python_embeded\lib\site-packages\sfast\utils\flat_tensors.py:212: TracerWarning: Converting a tensor to a Python list might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  return bytes(tensors[start].tolist()), start + 1
D:\ComfyUI_windows_portable\python_embeded\lib\site-packages\sfast\utils\flat_tensors.py:203: TracerWarning: Converting a tensor to a Python number might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  return int(tensors[start].item()), start + 1
D:\ComfyUI_windows_portable\ComfyUI\comfy\ldm\modules\diffusionmodules\openaimodel.py:619: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  assert y.shape[0] == x.shape[0]
D:\ComfyUI_windows_portable\ComfyUI\comfy\ldm\modules\diffusionmodules\openaimodel.py:125: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  assert x.shape[1] == self.channels
D:\ComfyUI_windows_portable\ComfyUI\comfy\ldm\modules\diffusionmodules\openaimodel.py:83: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  assert x.shape[1] == self.channels
D:\ComfyUI_windows_portable\python_embeded\lib\site-packages\sfast\utils\flat_tensors.py:21: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
  return torch.tensor([num], dtype=torch.int64)
D:\ComfyUI_windows_portable\python_embeded\lib\site-packages\torch\cuda\graphs.py:88: UserWarning: The CUDA Graph is empty. This usually means that the graph was attempted to be captured on wrong device or stream. (Triggered internally at ..\aten\src\ATen\cuda\CUDAGraph.cpp:193.)
  super().capture_end()

100%|██████████████████████████████████████████████████████████████████████████████████| 55/55 [08:32<00:00,  1.50it/s]
100%|██████████████████████████████████████████████████████████████████████████████████| 55/55 [08:32<00:00,  9.32s/it]
[93mWarning: Your graphics card doesn't have enough video memory to keep the model. Disable stable fast cuda graph, Flexibility will be improved but speed will be lost.[0m
Prompt executed in 517.57 seconds

R0w9hR0w9h

ComfyUI + stable-fast手順書

  1. stable-fastを入れる(使用するwheelは Nightly )
pip3 install 'diffusers>=0.19.3' 'xformers>=0.0.20' 'torch>=1.12.0' <Nightly wheel file>
  1. ComfyUI_stable_fastを入れる
  2. Add NodeloadersApply StableFast Unet
  3. 配線
  4. 生成(初回は時間かかる)
R0w9hR0w9h

新しいカスタムノードのコミットと、新しいNightlyビルドを試す

https://github.com/gameltb/ComfyUI_stable_fast/commit/1cf79b9149ceb9ec790d0819b4fee6081a8daf53

R0w9hR0w9h

ComfyUI_stabble_fastの新しいコミット、初回のApplyが500秒から、470秒で-30秒くらい
3回目以降の生成は25秒から17秒でそこそこ早くなった

だだっこぱんだだだっこぱんだ

ということで新しい環境での比較

Windows

普通のDiffusers

Avg Min Max
512x512 (SD1.5) 1.811 1.644 2.108
1024x1024 (SD1.5) 11.252 9.584 12.069
1024x1024 (SDXL) 11.501 11.112 12.119

Stable Fast

Avg Min Max
512x512 (SD1.5) 1.434 1.425 1.444
1024x1024 (SD1.5) 7.676 7.638 7.720
1024x1024 (SDXL) 9.352 8.168 14.683
だだっこぱんだだだっこぱんだ

Linux

普通のDiffusers

Avg Min Max
512x512 (SD1.5) 1.502 1.495 1.508
1024x1024 (SD1.5) 7.893 7.801 7.967
1024x1024 (SDXL) 8.569 8.447 8.672

Stable Fast

Avg Min Max
512x512 (SD1.5) 1.124 1.122 1.126
1024x1024 (SD1.5) 5.623 5.590 5.647
1024x1024 (SDXL) 5.994 5.932 6.030
だだっこぱんだだだっこぱんだ

stable-fast v0.0.12.post3

環境

  • WSL Ubuntu2204
  • RTX 3090Ti

普通のDiffusers

Avg Min Max
512x512 (SD1.5) 1.331 1.177 1.772

Stable Fast

Avg Min Max
512x512 (SD1.5) 0.882 0.803 0.973