高品質動画生成AI【Allegro】をローカルで実行する【VRAM12GB】
Allegroとは
2024年10月に論文公開された新しい動画生成AIの一つであり、オープンソースながら商用ベースのモデルに匹敵するユーザ評価を得たと論文中に記載しているオープンソースの動画生成AI(Diffusion Transfomer:DiT)です。
(論文のタイトルをみても、商用モデルを非常に意識していることがわかりますね)
Allegroの特に良いところとして、商用モデルに匹敵する性能を持ちながら、2024年11月現在でDiffusersライブラリに統合されており、安価な(約30,000円)家庭用GPU(RTX3060)でも、高品質な画像が生成できるということで、試すしかないと思い、今回記事を書きました。
(しかも、商用利用が可能なライセンスにもなっています)
FPSだけ、15FPSと少し心もとないですが、公式でも推奨されているようにフレーム補間技術を利用すれば60FPSなどに改善できるので問題ありません。(何度でもいいますが、ローカルモデルなので、RTX3060で動作することが正義)
記事の後半にて実際にFPSを60FPSにした動画も公開しています。
デモとして公式が公開している、生成動画の質は下記からご覧にいただけます。かなりレベルが高いですね。
環境構築
AllegroはDiffusersライブラリに統合されているため、非常に使いやすいです。
(とは言いつつ、記事執筆時点ではリリースはされていないので、DiffusersライブラリをGithubからそのまま取得する必要はあります)
実行環境
OS:Ubuntu 20.04
GPU:RTX3060 12GB
CUDA:12.2
RAM:64GB
Python:3.11.7
venvで仮想環境を作ります。
venvによる仮想環境を構築する。
python -m venv env
source env/bin/activate
必要なパッケージをインストールする
下記コマンドでインストールします。
pip install git+https://github.com/huggingface/diffusers.git
pip install torch==2.4.1 transformers accelerate sentencepiece beautifulsoup4 ftfy opencv-python imageio imageio-ffmpeg
また、私の完全なPython環境リストも出力したので提示します。
上記のコマンドで動作しなかった方は、下記を展開してみてください。
requirements.lock
下記を利用する場合は次のコマンドで実行してください。
pip install -r requirements.lock
accelerate==1.0.1
beautifulsoup4==4.12.3
certifi==2024.8.30
charset-normalizer==3.4.0
diffusers @ git+https://github.com/huggingface/diffusers.git@a98a839de75f1ad82d8d200c3bc2e4ff89929081
filelock==3.16.1
fsspec==2024.10.0
ftfy==6.3.1
huggingface-hub==0.26.2
idna==3.10
imageio==2.36.0
imageio-ffmpeg==0.5.1
importlib_metadata==8.5.0
Jinja2==3.1.4
MarkupSafe==3.0.2
mpmath==1.3.0
networkx==3.4.2
numpy==2.1.2
nvidia-cublas-cu12==12.1.3.1
nvidia-cuda-cupti-cu12==12.1.105
nvidia-cuda-nvrtc-cu12==12.1.105
nvidia-cuda-runtime-cu12==12.1.105
nvidia-cudnn-cu12==9.1.0.70
nvidia-cufft-cu12==11.0.2.54
nvidia-curand-cu12==10.3.2.106
nvidia-cusolver-cu12==11.4.5.107
nvidia-cusparse-cu12==12.1.0.106
nvidia-nccl-cu12==2.20.5
nvidia-nvjitlink-cu12==12.6.77
nvidia-nvtx-cu12==12.1.105
opencv-python==4.10.0.84
packaging==24.1
pillow==11.0.0
psutil==6.1.0
PyYAML==6.0.2
regex==2024.9.11
requests==2.32.3
safetensors==0.4.5
sentencepiece==0.2.0
soupsieve==2.6
sympy==1.13.3
tokenizers==0.20.1
torch==2.4.1
tqdm==4.66.6
transformers==4.46.1
triton==3.0.0
typing_extensions==4.12.2
urllib3==2.2.3
wcwidth==0.2.13
zipp==3.20.2
実行コードを作成する
下記のようなmain.py
を用意します。
(長いので折りたたみます)
Diffusersライブラリの実装コードを参照しながら、普段のDiffusersの書き方で実装しています。
また公式のサンプルコードと公式実装も参照しています。
Allegro実行 コード全文
import torch
from diffusers import AutoencoderKLAllegro, AllegroPipeline
from diffusers.utils import export_to_video
import os
vae = AutoencoderKLAllegro.from_pretrained("rhymes-ai/Allegro", subfolder="vae", torch_dtype=torch.float32)
pipe = AllegroPipeline.from_pretrained("rhymes-ai/Allegro", vae = vae, torch_dtype=torch.bfloat16)
pipe.enable_sequential_cpu_offload()
pipe.vae.enable_slicing()
pipe.vae.enable_tiling()
prompts = [
"A seaside harbor with bright sunlight and sparkling seawater, with many boats in the water. From an aerial view, the boats vary in size and color, some moving and some stationary. Fishing boats in the water suggest that this location might be a popular spot for docking fishing boats.",
"Timelapse at the snow land with aurora in the sky.",
"A Japanese tram glides gracefully through the snowy city streets, its streamlined design elegantly cutting through the falling snowflakes. The tram’s illuminated windows cast warm light onto the snow-white surroundings, creating a cozy atmosphere. Snowflakes dance in the air, swirling around the tram. Outside the tram, the city is covered in a layer of snow, turning familiar streets into a winter wonderland. The cherry blossom trees are now bare, quietly standing beside the tram tracks, their branches covered in snow. People hurry along the street, wrapped up warmly.",
"A robot is dancing in Times Square.",
"A side profile shot of a woman with fireworks exploding in the distance beyond her",
"A garden comes to life as a kaleidoscope of butterflies flutters amidst the blossoms, their delicate wings casting shadows on the petals below. In the background, a grand fountain cascades water with a gentle splendor, its rhythmic sound providing a soothing backdrop. Beneath the cool shade of a mature tree, a solitary wooden chair invites solitude and reflection, its smooth surface worn by the touch of countless visitors seeking a moment of tranquility in nature's embrace.",
"An elderly gentleman, with a serene expression, sits at the water's edge, a steaming cup of tea by his side. He is engrossed in his artwork, brush in hand, as he renders an oil painting on a canvas that's propped up against a small, weathered table. The sea breeze whispers through his silver hair, gently billowing his loose-fitting white shirt, while the salty air adds an intangible element to his masterpiece in progress. The scene is one of tranquility and inspiration, with the artist's canvas capturing the vibrant hues of the setting sun reflecting off the tranquil sea.",
"A golden retriever, sporting sleek black sunglasses, with its lengthy fur flowing in the breeze, sprints playfully across a rooftop terrace, recently refreshed by a light rain. The scene unfolds from a distance, the dog's energetic bounds growing larger as it approaches the camera, its tail wagging with unrestrained joy, while droplets of water glisten on the concrete behind it. The overcast sky provides a dramatic backdrop, emphasizing the vibrant golden coat of the canine as it dashes towards the viewer.",
"On a brilliant sunny day, the lakeshore is lined with an array of willow trees, their slender branches swaying gently in the soft breeze. The tranquil surface of the lake reflects the clear blue sky, while several elegant swans glide gracefully through the still water, leaving behind delicate ripples that disturb the mirror-like quality of the lake. The scene is one of serene beauty, with the willows' greenery providing a picturesque frame for the peaceful avian visitors.",
"A small boy, head bowed and determination etched on his face, sprints through the torrential downpour as lightning crackles and thunder rumbles in the distance. The relentless rain pounds the ground, creating a chaotic dance of water droplets that mirror the dramatic sky's anger. In the far background, the silhouette of a cozy home beckons, a faint beacon of safety and warmth amidst the fierce weather. The scene is one of perseverance and the unyielding spirit of a child braving the elements.",
]
positive_prompt = """
(masterpiece), (best quality), (ultra-detailed), (unwatermarked),
{}
emotional, harmonious, vignette, 4k epic detailed, shot on kodak, 35mm photo,
sharp focus, high budget, cinemascope, moody, epic, gorgeous
"""
negative_prompt = """
nsfw, lowres, bad anatomy, bad hands, text, error, missing fingers, extra digit, fewer digits, cropped, worst quality,
low quality, normal quality, jpeg artifacts, signature, watermark, username, blurry.
"""
os.makedirs("output", exist_ok=True)
for step, prompt in enumerate(prompts):
print(f"Step {step+1}/{len(prompts)}")
user_prompt = positive_prompt.format(prompt.lower().strip())
video = pipe(
user_prompt,
negative_prompt = negative_prompt,
num_inference_steps=100,
guidance_scale=7.5,
max_sequence_length=512,
generator = torch.Generator(device="cuda:0").manual_seed(42)
).frames[0]
export_to_video(video, "./output/output{}.mp4".format(step), fps=15)
実行
下記コマンドを実行することで、動画が生成できます。
python main.py
生成された動画
Text to Videoをためす
残念ながら、現時点でAllegroで試せるのはT2Vのみなので、それを実行します。
(Image to Videoのほうが実運用的には使いやすいので、そちらも使えるようになると嬉しいですね)
プロンプトは、公式のデモプロンプトを4つと、CogVideoXのときに利用したプロンプトを5つと、Pyramid-Flowのときに利用したプロンプトを1つで実験してみました。
プロンプトは下記です。
prompts = [
"A seaside harbor with bright sunlight and sparkling seawater, with many boats in the water. From an aerial view, the boats vary in size and color, some moving and some stationary. Fishing boats in the water suggest that this location might be a popular spot for docking fishing boats.",
"Timelapse at the snow land with aurora in the sky.",
"A Japanese tram glides gracefully through the snowy city streets, its streamlined design elegantly cutting through the falling snowflakes. The tram’s illuminated windows cast warm light onto the snow-white surroundings, creating a cozy atmosphere. Snowflakes dance in the air, swirling around the tram. Outside the tram, the city is covered in a layer of snow, turning familiar streets into a winter wonderland. The cherry blossom trees are now bare, quietly standing beside the tram tracks, their branches covered in snow. People hurry along the street, wrapped up warmly.",
"A robot is dancing in Times Square.",
"A side profile shot of a woman with fireworks exploding in the distance beyond her",
"A garden comes to life as a kaleidoscope of butterflies flutters amidst the blossoms, their delicate wings casting shadows on the petals below. In the background, a grand fountain cascades water with a gentle splendor, its rhythmic sound providing a soothing backdrop. Beneath the cool shade of a mature tree, a solitary wooden chair invites solitude and reflection, its smooth surface worn by the touch of countless visitors seeking a moment of tranquility in nature's embrace.",
"An elderly gentleman, with a serene expression, sits at the water's edge, a steaming cup of tea by his side. He is engrossed in his artwork, brush in hand, as he renders an oil painting on a canvas that's propped up against a small, weathered table. The sea breeze whispers through his silver hair, gently billowing his loose-fitting white shirt, while the salty air adds an intangible element to his masterpiece in progress. The scene is one of tranquility and inspiration, with the artist's canvas capturing the vibrant hues of the setting sun reflecting off the tranquil sea.",
"A golden retriever, sporting sleek black sunglasses, with its lengthy fur flowing in the breeze, sprints playfully across a rooftop terrace, recently refreshed by a light rain. The scene unfolds from a distance, the dog's energetic bounds growing larger as it approaches the camera, its tail wagging with unrestrained joy, while droplets of water glisten on the concrete behind it. The overcast sky provides a dramatic backdrop, emphasizing the vibrant golden coat of the canine as it dashes towards the viewer.",
"On a brilliant sunny day, the lakeshore is lined with an array of willow trees, their slender branches swaying gently in the soft breeze. The tranquil surface of the lake reflects the clear blue sky, while several elegant swans glide gracefully through the still water, leaving behind delicate ripples that disturb the mirror-like quality of the lake. The scene is one of serene beauty, with the willows' greenery providing a picturesque frame for the peaceful avian visitors.",
"A small boy, head bowed and determination etched on his face, sprints through the torrential downpour as lightning crackles and thunder rumbles in the distance. The relentless rain pounds the ground, creating a chaotic dance of water droplets that mirror the dramatic sky's anger. In the far background, the silhouette of a cozy home beckons, a faint beacon of safety and warmth amidst the fierce weather. The scene is one of perseverance and the unyielding spirit of a child braving the elements.",
]
また、公式の実装を参考にして、上記のプロンプトに対して、下記の質向上用のプロンプトをつなげています。
positive_prompt = """
(masterpiece), (best quality), (ultra-detailed), (unwatermarked),
{}
emotional, harmonious, vignette, 4k epic detailed, shot on kodak, 35mm photo,
sharp focus, high budget, cinemascope, moody, epic, gorgeous
"""
user_prompt = positive_prompt.format(prompt.lower().strip())
さらに、Negative Promptも利用できるとのことなので、公式で利用されていたNegative Promptを利用しました。
negative_prompt = """
nsfw, lowres, bad anatomy, bad hands, text, error, missing fingers, extra digit, fewer digits, cropped, worst quality,
low quality, normal quality, jpeg artifacts, signature, watermark, username, blurry.
"""
生成された動画
実際に生成された動画は下記になります。
(1280x720 解像度で 6秒間 15 FPS)
公式のデモプロンプト
A seaside harbor with bright sunlight and sparkling seawater, with many boats in the water. From an aerial view, the boats vary in size and color, some moving and some stationary. Fishing boats in the water suggest that this location might be a popular spot for docking fishing boats.
Timelapse at the snow land with aurora in the sky.
A Japanese tram glides gracefully through the snowy city streets, its streamlined design elegantly cutting through the falling snowflakes. The tram’s illuminated windows cast warm light onto the snow-white surroundings, creating a cozy atmosphere. Snowflakes dance in the air, swirling around the tram. Outside the tram, the city is covered in a layer of snow, turning familiar streets into a winter wonderland. The cherry blossom trees are now bare, quietly standing beside the tram tracks, their branches covered in snow. People hurry along the street, wrapped up warmly.
A robot is dancing in Times Square.
Pyramid-Flowの公式デモプロンプト
A side profile shot of a woman with fireworks exploding in the distance beyond her
CogVideoXの公式デモプロンプト
A garden comes to life as a kaleidoscope of butterflies flutters amidst the blossoms, their delicate wings casting shadows on the petals below. In the background, a grand fountain cascades water with a gentle splendor, its rhythmic sound providing a soothing backdrop. Beneath the cool shade of a mature tree, a solitary wooden chair invites solitude and reflection, its smooth surface worn by the touch of countless visitors seeking a moment of tranquility in nature's embrace.
An elderly gentleman, with a serene expression, sits at the water's edge, a steaming cup of tea by his side. He is engrossed in his artwork, brush in hand, as he renders an oil painting on a canvas that's propped up against a small, weathered table. The sea breeze whispers through his silver hair, gently billowing his loose-fitting white shirt, while the salty air adds an intangible element to his masterpiece in progress. The scene is one of tranquility and inspiration, with the artist's canvas capturing the vibrant hues of the setting sun reflecting off the tranquil sea.
A golden retriever, sporting sleek black sunglasses, with its lengthy fur flowing in the breeze, sprints playfully across a rooftop terrace, recently refreshed by a light rain. The scene unfolds from a distance, the dog's energetic bounds growing larger as it approaches the camera, its tail wagging with unrestrained joy, while droplets of water glisten on the concrete behind it. The overcast sky provides a dramatic backdrop, emphasizing the vibrant golden coat of the canine as it dashes towards the viewer.
On a brilliant sunny day, the lakeshore is lined with an array of willow trees, their slender branches swaying gently in the soft breeze. The tranquil surface of the lake reflects the clear blue sky, while several elegant swans glide gracefully through the still water, leaving behind delicate ripples that disturb the mirror-like quality of the lake. The scene is one of serene beauty, with the willows' greenery providing a picturesque frame for the peaceful avian visitors.
A small boy, head bowed and determination etched on his face, sprints through the torrential downpour as lightning crackles and thunder rumbles in the distance. The relentless rain pounds the ground, creating a chaotic dance of water droplets that mirror the dramatic sky's anger. In the far background, the silhouette of a cozy home beckons, a faint beacon of safety and warmth amidst the fierce weather. The scene is one of perseverance and the unyielding spirit of a child braving the elements.
FPSは少し低いですが、生成されている動画の質は非常に高いと思いました。(デモで使われているプロンプト以外のものを利用しているのに、このレベルの質は感動です)
そのうえ、ローカル環境かつ安価なGPUで動作できるというのが、かなり嬉しいモデルですね。こういうモデルがどんどん増えてほしいです。(高価なGPUほしい・・・)
step数を減らしてみる
とはいっても、上記の動画は、1動画作成するために「5時間半」もの時間がかかっているため、時間という作成コストは非常に高いです。
この時間は、step数が100(デフォルト値)で実行されていることが一つの要因です。
このstep数を減らすことで、ビデオの質は下がってしまいますが、生成にかかる時間は線形に減少させることが可能になります。
そこでstep数による生成される動画の質の差を比較してみました。
プロンプトは一番上のものを利用しています。
30step
よく見ると、海の波に微妙なアーティファクトが出てしまっています。また解像感も少し低いような漢字がしますね
50step
30stepとあまり変わらないですね。
80step
30stepや50stepとは変わりますが、よく見ると色々崩壊しています。
30-80stepくらいで良い動画が生成できれば、動画生成にかかる時間を短縮できるかと思いましたが、残念ながら100stepの動画が圧倒的に質が高いので、100stepを利用することが前提になりそうですね・・・(5時間半・・・)
コードの簡単な解説
基本的には標準的なdiffusrs記法に則って記載することができます。
モデル定義
例えばモデル定義は下記のように書きます。
vae = AutoencoderKLAllegro.from_pretrained("rhymes-ai/Allegro", subfolder="vae", torch_dtype=torch.float32)
pipe = AllegroPipeline.from_pretrained("rhymes-ai/Allegro", vae = vae, torch_dtype=torch.bfloat16)
公式実装でもvaeは分けて定義されているため、今回の実装でも分けています。
vaeだけ`float32`で定義しているのも公式どおりです。
モデルオフロードによるVRAM削減
pipe.enable_sequential_cpu_offload()
pipe.vae.enable_slicing()
pipe.vae.enable_tiling()
RTX3060で実行できるようにするために、層ごとのCPUオフロードを定義しています。
また後半のvaeに対するenable_slicing
とenable_tiling
もVRAM削減に寄与していますが、そもそもこれを書かないと下記のエラーがしまうので、注意してください。
raise NotImplementedError("Decoding without tiling has not been implemented yet.")
NotImplementedError: Decoding without tiling has not been implemented yet.
プロンプトの定義
prompts = [
"A seaside harbor with bright sunlight and sparkling seawater, with many boats in the water. From an aerial view, the boats vary in size and color, some moving and some stationary. Fishing boats in the water suggest that this location might be a popular spot for docking fishing boats.",
"Timelapse at the snow land with aurora in the sky.",
"A Japanese tram glides gracefully through the snowy city streets, its streamlined design elegantly cutting through the falling snowflakes. The tram’s illuminated windows cast warm light onto the snow-white surroundings, creating a cozy atmosphere. Snowflakes dance in the air, swirling around the tram. Outside the tram, the city is covered in a layer of snow, turning familiar streets into a winter wonderland. The cherry blossom trees are now bare, quietly standing beside the tram tracks, their branches covered in snow. People hurry along the street, wrapped up warmly.",
"A robot is dancing in Times Square.",
"A side profile shot of a woman with fireworks exploding in the distance beyond her",
"A garden comes to life as a kaleidoscope of butterflies flutters amidst the blossoms, their delicate wings casting shadows on the petals below. In the background, a grand fountain cascades water with a gentle splendor, its rhythmic sound providing a soothing backdrop. Beneath the cool shade of a mature tree, a solitary wooden chair invites solitude and reflection, its smooth surface worn by the touch of countless visitors seeking a moment of tranquility in nature's embrace.",
"An elderly gentleman, with a serene expression, sits at the water's edge, a steaming cup of tea by his side. He is engrossed in his artwork, brush in hand, as he renders an oil painting on a canvas that's propped up against a small, weathered table. The sea breeze whispers through his silver hair, gently billowing his loose-fitting white shirt, while the salty air adds an intangible element to his masterpiece in progress. The scene is one of tranquility and inspiration, with the artist's canvas capturing the vibrant hues of the setting sun reflecting off the tranquil sea.",
"A golden retriever, sporting sleek black sunglasses, with its lengthy fur flowing in the breeze, sprints playfully across a rooftop terrace, recently refreshed by a light rain. The scene unfolds from a distance, the dog's energetic bounds growing larger as it approaches the camera, its tail wagging with unrestrained joy, while droplets of water glisten on the concrete behind it. The overcast sky provides a dramatic backdrop, emphasizing the vibrant golden coat of the canine as it dashes towards the viewer.",
"On a brilliant sunny day, the lakeshore is lined with an array of willow trees, their slender branches swaying gently in the soft breeze. The tranquil surface of the lake reflects the clear blue sky, while several elegant swans glide gracefully through the still water, leaving behind delicate ripples that disturb the mirror-like quality of the lake. The scene is one of serene beauty, with the willows' greenery providing a picturesque frame for the peaceful avian visitors.",
"A small boy, head bowed and determination etched on his face, sprints through the torrential downpour as lightning crackles and thunder rumbles in the distance. The relentless rain pounds the ground, creating a chaotic dance of water droplets that mirror the dramatic sky's anger. In the far background, the silhouette of a cozy home beckons, a faint beacon of safety and warmth amidst the fierce weather. The scene is one of perseverance and the unyielding spirit of a child braving the elements.",
]
positive_prompt = """
(masterpiece), (best quality), (ultra-detailed), (unwatermarked),
{}
emotional, harmonious, vignette, 4k epic detailed, shot on kodak, 35mm photo,
sharp focus, high budget, cinemascope, moody, epic, gorgeous
"""
negative_prompt = """
nsfw, lowres, bad anatomy, bad hands, text, error, missing fingers, extra digit, fewer digits, cropped, worst quality,
low quality, normal quality, jpeg artifacts, signature, watermark, username, blurry.
"""
for step, prompt in enumerate(prompts):
user_prompt = positive_prompt.format(prompt.lower().strip())
深夜や外出中に動いてもらってほしかったので、複数プロンプトを同時に実行させようとしています。
結果として、プロンプトは配列として記載しています。
公式実装の通り、ユーザプロンプトとネガティブプロンプトに対して、デフォルトのプロンプトを反映させています。
Pipelineの呼び出し、動画保存
video = pipe(
user_prompt,
negative_prompt = negative_prompt,
num_inference_steps=100,
guidance_scale=7.5,
max_sequence_length=512,
generator = torch.Generator(device="cuda:0").manual_seed(42)
).frames[0]
export_to_video(video, "./output/output{}.mp4".format(step), fps=15)
書き方は普通のDiffusersライブラリでの書き方とほとんど変わりません。パラメータも公式の値や、DiffusersライブラリのPipepineの__call__
メソッドのデフォルト値を参考に記載しています。
また、num_inference_steps
が拡散モデルが微分方程式を解くためのstep数になっています。後半の実験ではこの値を変化させていました。
FPSの改善
さて、先日の記事でも紹介しましたがFPSが低い動画に関しては、フレーム補間技術を利用して、FPSを改善させることができます。
Allegro公式では、別のフレーム補間技術の利用を推奨していますが、今回は手っ取り早くCogVideoXで利用したRIFEを利用しようと思います。
(使い方は上記の記事をご覧ください)
フレーム補間した結果
実際に生成された動画は下記になります。
(1280x720 解像度で 6秒間 60 FPS)
公式のデモプロンプト
A seaside harbor with bright sunlight and sparkling seawater, with many boats in the water. From an aerial view, the boats vary in size and color, some moving and some stationary. Fishing boats in the water suggest that this location might be a popular spot for docking fishing boats.
Timelapse at the snow land with aurora in the sky.
A Japanese tram glides gracefully through the snowy city streets, its streamlined design elegantly cutting through the falling snowflakes. The tram’s illuminated windows cast warm light onto the snow-white surroundings, creating a cozy atmosphere. Snowflakes dance in the air, swirling around the tram. Outside the tram, the city is covered in a layer of snow, turning familiar streets into a winter wonderland. The cherry blossom trees are now bare, quietly standing beside the tram tracks, their branches covered in snow. People hurry along the street, wrapped up warmly.
A robot is dancing in Times Square.
Pyramid-Flowの公式デモプロンプト
A side profile shot of a woman with fireworks exploding in the distance beyond her
CogVideoXの公式デモプロンプト
A garden comes to life as a kaleidoscope of butterflies flutters amidst the blossoms, their delicate wings casting shadows on the petals below. In the background, a grand fountain cascades water with a gentle splendor, its rhythmic sound providing a soothing backdrop. Beneath the cool shade of a mature tree, a solitary wooden chair invites solitude and reflection, its smooth surface worn by the touch of countless visitors seeking a moment of tranquility in nature's embrace.
An elderly gentleman, with a serene expression, sits at the water's edge, a steaming cup of tea by his side. He is engrossed in his artwork, brush in hand, as he renders an oil painting on a canvas that's propped up against a small, weathered table. The sea breeze whispers through his silver hair, gently billowing his loose-fitting white shirt, while the salty air adds an intangible element to his masterpiece in progress. The scene is one of tranquility and inspiration, with the artist's canvas capturing the vibrant hues of the setting sun reflecting off the tranquil sea.
A golden retriever, sporting sleek black sunglasses, with its lengthy fur flowing in the breeze, sprints playfully across a rooftop terrace, recently refreshed by a light rain. The scene unfolds from a distance, the dog's energetic bounds growing larger as it approaches the camera, its tail wagging with unrestrained joy, while droplets of water glisten on the concrete behind it. The overcast sky provides a dramatic backdrop, emphasizing the vibrant golden coat of the canine as it dashes towards the viewer.
On a brilliant sunny day, the lakeshore is lined with an array of willow trees, their slender branches swaying gently in the soft breeze. The tranquil surface of the lake reflects the clear blue sky, while several elegant swans glide gracefully through the still water, leaving behind delicate ripples that disturb the mirror-like quality of the lake. The scene is one of serene beauty, with the willows' greenery providing a picturesque frame for the peaceful avian visitors.
A small boy, head bowed and determination etched on his face, sprints through the torrential downpour as lightning crackles and thunder rumbles in the distance. The relentless rain pounds the ground, creating a chaotic dance of water droplets that mirror the dramatic sky's anger. In the far background, the silhouette of a cozy home beckons, a faint beacon of safety and warmth amidst the fierce weather. The scene is one of perseverance and the unyielding spirit of a child braving the elements.
まとめ
生成にかかる時間は非常に長いですが、商用モデルレベルの動画生成AIが動いてしまうのは本当にすごいと思いました。
出てきた動画の質も非常に高いです。どんどん動画生成AIに関しても実利用がしやすい環境になってきたと思います。
(そもそも生成系はseed値をかえてガチャをすることが前提なのに、固定seedの一発出しでこの性能は非常に魅力的です)
執筆時点時点で、もっとも性能の高い画像生成モデルは「Mochi 1」と言われていますが、現時点ではVRAM24GBが最低限必要ということでしたので(元の要求レベルを考えれば十分下がっていますが)、かなり高価なGPUが必要となります。
それと比較すると、商用レベルの画像生成ができ、RTX3060でも動作する本モデルは、非常に競争力のあるモデルだなと思います。
欲を言えば、量子化モデルなどが出てくると、さらに高速に動画生成できるようにならないかなと期待しています。
それでは、ここまで読んでくださり、ありがとうございました!
Discussion