🐡

BlenderとStableDiffusionXL+ControlNetによる3D Meshに合わせたtexture書き込み

dycoon

2024/08/15に公開

Blender

Stable Diffusion

tech

画像生成AIを利用して3D Meshのテクスチャーを生成する作業をしてみました。
この記事ではBlenderとStable DiffusionXLとControlNetを使用してどの方向から見てもよい感じに3D Meshにテクスチャーが貼られているように描きこむ手順を解説します。

同様の実験をしている先駆者はすでにおり、
Blender MarketにあるTexture Diffusionというプラグインから着想を得ています。

プログラミングはほぼなしで、BlenderでComposite Nodeを作る以外はTextre Paintの手作業などなので、Blenderの操作を行う人向け、あるいはそこから自動的なテクスチャー生成のヒントにしたい人向けの記事内容です。

実験内容

張り付ける対象の3Dモデルを用意します。(これは自作したものです。)

これに対して空のテクスチャーを割り当てUVをSmart UV Projectで割り当てておきます。

モデルをいろいろな方向から映るように配置します、
このモデルの場合は足の部分などは死角が発生しやすいので、パーツごとに切り出して配置しました。
それに対してdepth画像を以下のようなcomposite nodeで生成しました。

Rendering	depth

stable-diffusion-webuiの
text2imageでControlNetにdepth画像を与えて画像を生成し、upscaleして 8192x8192の画像を生成します。

parameters

The ultra realistic detailed computer graphics ((3d model sheet)) of ((same battle multi-legged tank with tank turret)) in ruin city in desert. post-apocalyptic. ball joint, The painting evokes a sense of history, decay, and natural beauty. 8k. ((dirt)), ((front lighting)), ((rust)), (mud) ,((scratch)), ((dark gray robot coated)), ((science fiction)), mechanical
Negative prompt: Deformed, blurry, bad anatomy, disfigured, poorly drawn face, mutation, mutated, extra limb, ugly, poorly drawn hands, missing limb, blurry, floating limbs, disconnected limbs, malformed hands, blur, out of focus, long neck, long body, ((((mutated hands and fingers)))), (((out of frame))), cartoon, (disfigured), (bad art), (deformed), (poorly drawn), (extra limbs), strange colours, blurry, boring, sketch, lacklustre, repetitive, cropped, ((shadow)), ((mechanical defect))
Steps: 100, Sampler: DPM++ 3M SDE Karras, CFG scale: 7, Seed: 1132989525, Size: 2048x2048, Model hash: 31e35c80fc, Model: sd_xl_base_1.0, VAE hash: 63aeecb90f, VAE: sdxl_vae.safetensors, ControlNet 0: "Module: none, Model: diffusers_xl_depth_full [2f51180b], Weight: 1, Resize Mode: Crop and Resize, Low Vram: True, Processor Res: 512, Threshold A: 0.5, Threshold B: 0.5, Guidance Start: 0, Guidance End: 1, Pixel Perfect: True, Control Mode: Balanced, Hr Option: Both, Save Detected Map: True", Version: v1.7.0


postprocessing

Postprocess upscale by: 4, Postprocess upscaler: R-ESRGAN 4x+


extras

Postprocess upscale by: 4, Postprocess upscaler: R-ESRGAN 4x+

text2imgにかかる生成時間はGeForce RTX 4080で14分ほどです。

stencilによるtexture paintでtextureを張り付ける方のオブジェクトのテクスチャに書き込みます。

生成する画像は1枚だけなのでstencilの位置合わせを手動で行ったとしてもそこまで手間ではないです。
stencilによるtexture paintは以下のサイトなどが参考になります。

Stencil Painting in Blender

ただ、今回は形状が細かいということもあって、背景の色をオブジェクトに塗り付けてしまうこともありました。
オブジェクトを見る方向もそこまで多くはなかったので塗った結果が十分には調整できなかったように思います。
とはいえそこそこそれらしいものができました。

作業結果

_	_

同じ作業を微妙解像度などを変えて試したときは以下のようになりました。いろいろ違いがあり安定して期待通りの結果を得るのが難しく、手直しのコストはそこそこ発生しました。

別のpromptで試した結果

parameters

The ultra realistic detailed computer graphics ((3d model sheet)) of ((same white battle multi-legged tank mecha)) in ruin city in desert. post-apocalyptic. ball joint, The painting evokes a sense of history, decay, and natural beauty. 8k. ((dirt)), ((front lighting)), ((rust)), (mud) ,((scratch)), ((white mecha coated)), ((science fiction)), mechanical, (((high technology)))
Negative prompt: Deformed, blurry, bad anatomy, disfigured, poorly drawn face, mutation, mutated, extra limb, ugly, poorly drawn hands, missing limb, blurry, floating limbs, disconnected limbs, malformed hands, blur, out of focus, long neck, long body, ((((mutated hands and fingers)))), (((out of frame))), cartoon, (disfigured), (bad art), (deformed), (poorly drawn), (extra limbs), strange colours, blurry, boring, sketch, lacklustre, repetitive, cropped, ((shadow)), ((mechanical defect))
Steps: 150, Sampler: DPM++ 3M SDE Karras, CFG scale: 7, Seed: 1038496024, Size: 2048x2048, Model hash: 31e35c80fc, Model: sd_xl_base_1.0, VAE hash: 63aeecb90f, VAE: sdxl_vae.safetensors, ControlNet 0: "Module: none, Model: diffusers_xl_depth_full [2f51180b], Weight: 1, Resize Mode: Crop and Resize, Low Vram: True, Processor Res: 512, Threshold A: 0, Threshold B: 0, Guidance Start: 0, Guidance End: 1, Pixel Perfect: True, Control Mode: Balanced, Hr Option: Both, Save Detected Map: True", Version: v1.7.0


postprocessing

Postprocess upscale by: 4, Postprocess upscaler: R-ESRGAN 4x+


extras

Postprocess upscale by: 4, Postprocess upscaler: R-ESRGAN 4x+

_	_

ほかの3D Meshでの実験

タイヤ

タイヤのdepth画像を生成	txt2imgにdepthのcontrolnetを指定する形で描画

stencilによるtexture paint結果

_	_

比較的良い結果が出ています。この手法が適するもの適さないものがあるかもしれません。

建物

こちらはちょっと手順を変えています。
まず、張り付ける対象のmodelを用意しました。

これに対して生成した場合は歪みが大きくなったり、期待通りの画像が得られなかったりしました。

歪みが大きい	窓がうまく出なかった

depthなどは別に用意するほうが良いので以下のようなものを用意しました。

rendering結果	depth

The ultra realistic detailed computer graphics 3d model sheet of building on cliff in ruin city in rain. post-apocalyptic. detailed cracked concrete buildings, The painting evokes a sense of history, decay, and natural beauty. 8k. (((broken windows))), (building entrance), road, ((3d building model sheet)), concrete, cloudy, concrete debris, ((dirt)), front lighting
Negative prompt: Deformed, blurry, bad anatomy, disfigured, poorly drawn face, mutation, mutated, extra limb, ugly, poorly drawn hands, missing limb, blurry, floating limbs, disconnected limbs, malformed hands, blur, out of focus, long neck, long body, ((((mutated hands and fingers)))), (((out of frame))), cartoon, 3d, (disfigured), (bad art), (deformed), (poorly drawn), (extra limbs), strange colours, blurry, boring, sketch, lacklustre, repetitive, cropped, hands, ((shading)), ((shadow))
Steps: 50, Sampler: Euler a, CFG scale: 13, Seed: 1715321586, Size: 1600x1600, Model hash: 31e35c80fc, Model: sd_xl_base_1.0, VAE hash: 63aeecb90f, VAE: sdxl_vae.safetensors, Denoising strength: 1, ControlNet 0: "Module: none, Model: diffusers_xl_depth_full [2f51180b], Weight: 1, Resize Mode: Crop and Resize, Low Vram: True, Guidance Start: 0, Guidance End: 1, Pixel Perfect: False, Control Mode: My prompt is more important, Save Detected Map: True", Version: v1.7.0

_	_

利点と欠点

利点としては以下がありそうです。

左右の違いなど比較的オブジェクト全体での違和感はないです。
- 以前をTEXTurePaperのデモを試したときは左右が非対称な感じがしました。
- 一枚の画像内にいろいろな方向から見たオブジェクトを配置して画像生成している結果だと思います。
- 複数の画像に分けて生成する場合はControlNetのReferenceを使う方法もありそうですが、depthとの併用はかなり私の環境では生成時間が大きくなりました。
タイヤのようなよくある形、複合的なmaterialにたいしては効果的な結果が得られることがありそうです。
一般的な画像生成AIにいえることですが自分でも思っていないものが出ることがあるので発想をもらうのに役立ちます。
単純な形状には効果的と思われます。
画像生成中はPCを放置する形になるのでその間ほかの作業をすることができます。

欠点としては以下がありそうです。

単一のマテリアルが想定される物体ならばProcedural Textureなどほかの方法がてきしているかもしれません。作業に時間がかかるのがデメリットになりそうです。
ライティングを含んだテクスチャになるので、手動のレタッチなどが必要になることもありそうです。
複雑な形状はおそらく苦手です。
余分なものが生成されることがあります。
歪みにより背景が塗り込まれることがあります。
思い通りの結果を得るのは難しいです。

自動化などで欠点を改善できると将来的には実用的になってくるかもしれません。