Open17
stable-fastの検証
ControlNetもLoRAも使えてTensorRTとかAITemplate並みの高速化が期待できるらしい(本当か?)
まずはDiffusersで普通に検証
検証方法は10回走らせてそこから平均とかをとる。
環境
- Windows10
- RTX3090Ti
- cpython@3.10.13
- xformers==0.0.22.post7+cu118
- torch==2.1.0+cu118
- stable-fast @ https://github.com/chengzeyi/stable-fast/releases/download/v0.0.9/stable_fast-0.0.9+torch210cu118-cp310-cp310-win_amd64.whl
- diffusers==0.23.0
- ninja==1.11.1.1
- transformers==4.35.0
普通のDiffusers
Avg | Min | Max | |
---|---|---|---|
512x512 | 1.962 | 1.937 | 2.021 |
1024x1024 | 9.582 | 9.528 | 9.637 |
Stable Fast
Avg | Min | Max | |
---|---|---|---|
512x512 | 2.495 | 1.808 | 8.477 |
1024x1024 | 9.793 | 7.722 | 15.699 |
環境
- Ubuntu 20.04
- RTX3090
- python@3.10.12
- xformers==0.0.22.post7
- triton==2.1.0
- torch==2.1.0+cu118
- stable-fast @ https://github.com/chengzeyi/stable-fast/releases/download/v0.0.9/stable_fast-0.0.9+torch210cu118-cp310-cp310-manylinux2014_x86_64.whl
- diffusers==0.21.2
- ninja==1.11.1.1
- transformers==4.31.0
普通のDiffusers
Avg | Min | Max | |
---|---|---|---|
512x512 | 1.482 | 1.477 | 1.486 |
1024x1024 | 7.748 | 7.614 | 7.861 |
Stable Fast
Avg | Min | Max | |
---|---|---|---|
512x512 | 1.391 | 1.217 | 2.935 |
1024x1024 | 6.562 | 6.319 | 8.031 |
いったんまとめ
- Warmupが大事かも (5回くらい事前に回しておいたほうが良い)
- やっぱLinuxがはやいよね
ComfyUIのほうで試してみたが、エラーが出たのでいったんissueを立てた
** ComfyUI start up time: 2023-11-13 12:24:24.636300
Prestartup times for custom nodes:
0.0 seconds: D:\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-Manager
Total VRAM 10240 MB, total RAM 64664 MB
xformers version: 0.0.22.post7+cu118
Set vram state to: NORMAL_VRAM
Device: cuda:0 NVIDIA GeForce RTX 3080 : cudaMallocAsync
VAE dtype: torch.bfloat16
Using xformers cross attention
### Loading: ComfyUI-Manager (V0.30.4)
### ComfyUI Revision: 1677 [4aeef781]
Import times for custom nodes:
0.0 seconds: D:\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI_stable_fast
0.0 seconds: D:\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-WD14-Tagger
0.4 seconds: D:\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-Manager
Starting server
To see the GUI go to: http://127.0.0.1:8188
FETCH DATA from: D:\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-Manager\extension-node-map.json
got prompt
model_type EPS
adm 2816
Using xformers attention in VAE
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
Using xformers attention in VAE
missing {'cond_stage_model.clip_l.logit_scale', 'cond_stage_model.clip_l.text_projection'}
left over keys: dict_keys(['cond_stage_model.clip_l.transformer.text_model.embeddings.position_ids'])
triton not installed, skip
Requested to load SDXLClipModel
Loading 1 new model
D:\ComfyUI_windows_portable\python_embeded\lib\site-packages\torch\_utils.py:831: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
return self.fget.__get__(instance, owner)()
Requested to load SDXL
Loading 1 new model
D:\ComfyUI_windows_portable\python_embeded\lib\site-packages\sfast\utils\flat_tensors.py:157: TracerWarning: Converting a tensor to a Python number might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
obj_type = tensors[start].item()
D:\ComfyUI_windows_portable\python_embeded\lib\site-packages\sfast\utils\flat_tensors.py:216: TracerWarning: Converting a tensor to a Python number might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
size = tensors[start].item()
D:\ComfyUI_windows_portable\python_embeded\lib\site-packages\sfast\utils\flat_tensors.py:226: TracerWarning: Converting a tensor to a Python number might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
size = tensors[start].item()
D:\ComfyUI_windows_portable\python_embeded\lib\site-packages\sfast\utils\flat_tensors.py:212: TracerWarning: Converting a tensor to a Python list might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
return bytes(tensors[start].tolist()), start + 1
D:\ComfyUI_windows_portable\python_embeded\lib\site-packages\sfast\utils\flat_tensors.py:203: TracerWarning: Converting a tensor to a Python number might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
return int(tensors[start].item()), start + 1
D:\ComfyUI_windows_portable\ComfyUI\comfy\ldm\modules\diffusionmodules\openaimodel.py:619: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
assert y.shape[0] == x.shape[0]
D:\ComfyUI_windows_portable\ComfyUI\comfy\ldm\modules\diffusionmodules\openaimodel.py:125: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
assert x.shape[1] == self.channels
D:\ComfyUI_windows_portable\ComfyUI\comfy\ldm\modules\diffusionmodules\openaimodel.py:83: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
assert x.shape[1] == self.channels
D:\ComfyUI_windows_portable\python_embeded\lib\site-packages\sfast\utils\flat_tensors.py:21: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
return torch.tensor([num], dtype=torch.int64)
SDXLでやるとわりかし悲しい結果になった
warmupに15分くらいかかる
普通のDiffusers
Avg | Min | Max | |
---|---|---|---|
1024x1024 | 8.420 | 8.276 | 8.571 |
Stable Fast
Avg | Min | Max | |
---|---|---|---|
1024x1024 | 8.428 | 7.986 | 11.197 |
Nightly buildを試してみる
一応動いた
Comfy + Nightly + SDXL + LoRA
** ComfyUI start up time: 2023-11-13 17:38:24.795915
Prestartup times for custom nodes:
0.0 seconds: D:\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-Manager
Total VRAM 10240 MB, total RAM 64664 MB
xformers version: 0.0.22.post7+cu118
Set vram state to: NORMAL_VRAM
Device: cuda:0 NVIDIA GeForce RTX 3080 : cudaMallocAsync
VAE dtype: torch.bfloat16
Using xformers cross attention
### Loading: ComfyUI-Manager (V0.30.4)
### ComfyUI Revision: 1677 [4aeef781]
Import times for custom nodes:
0.0 seconds: D:\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI_stable_fast
0.0 seconds: D:\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-WD14-Tagger
0.3 seconds: D:\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-Manager
Starting server
To see the GUI go to: http://127.0.0.1:8188
FETCH DATA from: D:\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-Manager\extension-node-map.json
got prompt
model_type EPS
adm 2816
Using xformers attention in VAE
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
Using xformers attention in VAE
missing {'cond_stage_model.clip_l.text_projection', 'cond_stage_model.clip_l.logit_scale'}
left over keys: dict_keys(['cond_stage_model.clip_l.transformer.text_model.embeddings.position_ids'])
triton not installed, skip
Requested to load SDXLClipModel
Loading 1 new model
D:\ComfyUI_windows_portable\python_embeded\lib\site-packages\torch\_utils.py:831: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
return self.fget.__get__(instance, owner)()
Requested to load SDXL
Loading 1 new model
D:\ComfyUI_windows_portable\python_embeded\lib\site-packages\sfast\utils\flat_tensors.py:157: TracerWarning: Converting a tensor to a Python number might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
obj_type = tensors[start].item()
D:\ComfyUI_windows_portable\python_embeded\lib\site-packages\sfast\utils\flat_tensors.py:216: TracerWarning: Converting a tensor to a Python number might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
size = tensors[start].item()
D:\ComfyUI_windows_portable\python_embeded\lib\site-packages\sfast\utils\flat_tensors.py:226: TracerWarning: Converting a tensor to a Python number might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
size = tensors[start].item()
D:\ComfyUI_windows_portable\python_embeded\lib\site-packages\sfast\utils\flat_tensors.py:212: TracerWarning: Converting a tensor to a Python list might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
return bytes(tensors[start].tolist()), start + 1
D:\ComfyUI_windows_portable\python_embeded\lib\site-packages\sfast\utils\flat_tensors.py:203: TracerWarning: Converting a tensor to a Python number might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
return int(tensors[start].item()), start + 1
D:\ComfyUI_windows_portable\ComfyUI\comfy\ldm\modules\diffusionmodules\openaimodel.py:619: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
assert y.shape[0] == x.shape[0]
D:\ComfyUI_windows_portable\ComfyUI\comfy\ldm\modules\diffusionmodules\openaimodel.py:125: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
assert x.shape[1] == self.channels
D:\ComfyUI_windows_portable\ComfyUI\comfy\ldm\modules\diffusionmodules\openaimodel.py:83: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
assert x.shape[1] == self.channels
D:\ComfyUI_windows_portable\python_embeded\lib\site-packages\sfast\utils\flat_tensors.py:21: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
return torch.tensor([num], dtype=torch.int64)
D:\ComfyUI_windows_portable\python_embeded\lib\site-packages\torch\cuda\graphs.py:88: UserWarning: The CUDA Graph is empty. This usually means that the graph was attempted to be captured on wrong device or stream. (Triggered internally at ..\aten\src\ATen\cuda\CUDAGraph.cpp:193.)
super().capture_end()
100%|██████████████████████████████████████████████████████████████████████████████████| 55/55 [08:32<00:00, 1.50it/s]
100%|██████████████████████████████████████████████████████████████████████████████████| 55/55 [08:32<00:00, 9.32s/it]
[93mWarning: Your graphics card doesn't have enough video memory to keep the model. Disable stable fast cuda graph, Flexibility will be improved but speed will be lost.[0m
Prompt executed in 517.57 seconds
ComfyUI + stable-fast手順書
- stable-fastを入れる(使用するwheelは Nightly )
pip3 install 'diffusers>=0.19.3' 'xformers>=0.0.20' 'torch>=1.12.0' <Nightly wheel file>
- ComfyUI_stable_fastを入れる
-
Add Node
→loaders
→Apply StableFast Unet
- 配線
- 生成(初回は時間かかる)
新しいカスタムノードのコミットと、新しいNightlyビルドを試す
ということで新しい環境での比較
Windows
普通のDiffusers
Avg | Min | Max | |
---|---|---|---|
512x512 (SD1.5) | 1.811 | 1.644 | 2.108 |
1024x1024 (SD1.5) | 11.252 | 9.584 | 12.069 |
1024x1024 (SDXL) | 11.501 | 11.112 | 12.119 |
Stable Fast
Avg | Min | Max | |
---|---|---|---|
512x512 (SD1.5) | 1.434 | 1.425 | 1.444 |
1024x1024 (SD1.5) | 7.676 | 7.638 | 7.720 |
1024x1024 (SDXL) | 9.352 | 8.168 | 14.683 |
Linux
普通のDiffusers
Avg | Min | Max | |
---|---|---|---|
512x512 (SD1.5) | 1.502 | 1.495 | 1.508 |
1024x1024 (SD1.5) | 7.893 | 7.801 | 7.967 |
1024x1024 (SDXL) | 8.569 | 8.447 | 8.672 |
Stable Fast
Avg | Min | Max | |
---|---|---|---|
512x512 (SD1.5) | 1.124 | 1.122 | 1.126 |
1024x1024 (SD1.5) | 5.623 | 5.590 | 5.647 |
1024x1024 (SDXL) | 5.994 | 5.932 | 6.030 |
stable-fast v0.0.12.post3
環境
- WSL Ubuntu2204
- RTX 3090Ti
普通のDiffusers
Avg | Min | Max | |
---|---|---|---|
512x512 (SD1.5) | 1.331 | 1.177 | 1.772 |
Stable Fast
Avg | Min | Max | |
---|---|---|---|
512x512 (SD1.5) | 0.882 | 0.803 | 0.973 |