iTranslated by AI

The content below is an AI-generated translation. This is an experimental feature, and may contain errors. View original article
🐖

Generating Videos After Analyzing the Sora 2 Prompting Guide

に公開

I'm Beagle 🐶

AI videos generated by Sora 2 are currently a hot topic.
While there are some negative views regarding issues such as copyright infringement, using Sora (website, iOS app) or the Sora 2 API allows you to generate "convincing" videos from natural language.
However, whether you can generate the exact video you intended with this method is a matter of luck, essentially like a "gacha."

So, how can we generate something even slightly closer to the intended video?
Recently, the official Sora 2 Prompting Guide was released, so creating prompts based on its content should help in generating better videos.

In this article, I tried to see if I could create videos as instructed by decoding this prompting guide.

Update Information

  • 2025/10/23 Posted
  • 2025/11/03 Updated information on Sora and Sora 2 to reflect the current status

About Sora 2

I'll touch on this briefly.

While there are many other articles about Sora 2, the following article provides a clear explanation of basic content such as how to get started, usage, and features.

https://www.ai-souken.com/article/what-is-openai-sora-2

What is Sora 2?

It is the name of the video generation model released by OpenAI.
It can be used through the Sora website or the iOS app.
There is also a higher-tier model called Sora 2 Pro, available to ChatGPT Pro subscribers.

What is Sora (Website, iOS App)?

https://sora.chatgpt.com/

This is the name of the website and iOS app for using Sora 2, and it takes the form of a social networking service.
Currently, it is generally invitation-only, meaning you cannot use it without being a paid ChatGPT subscriber or having an invitation code.
The invitation system seems to be gradually being phased out, as I have heard that even free ChatGPT members have been able to register without an invitation code.
(Updated November 3, 2025)
It seems that invitation codes have been abolished, though this is said to be only for a limited time.

Video Generation and Posting

By entering a prompt in the chat input field, you can generate 10-second or 15-second videos.
Even simple natural language inputs like "a video of a dog running in a meadow" will generate a video based on that input.
Videos can be posted directly and are viewable by other users.

Currently, non-Pro members can generate up to 30 videos every 24 hours.
It follows a so-called "rolling 24-hour window" method, where you have 30 slots that become available again 24 hours after the generation time.
As a side note, this method is used by Google's Jules in other places.
(Updated November 3, 2025)
Currently, Pro members can generate 100 videos per day, and other members can generate 30 videos per day.
It is a 24-hour cycle limit system that resets around 9:00 AM JST.
Additionally, generating a 15-second video consumes two counts of the 10-second video limit, and 25-second videos (available only to Pro members) consume four counts.
If you exceed the usage limit, credits are also available for additional purchase at $4 for 10 videos.

The duration of the video you create can be selected from the settings icon in the chat field - Duration.

img0013.png

Sora 2 API

An API version of Sora 2 is also provided, enabling video generation within applications and workflows.
However, the pricing is quite high, so it might be difficult to try out casually.
This article does not cover the use or explanation of the Sora 2 API.

Sora 2 Prompting Guide

Now, let's dive into the prompting guide.
OpenAI has published the Sora 2 Prompting Guide here:

https://cookbook.openai.com/examples/sora/sora2_prompting_guide

An article with a Japanese interpretation has been published by npaka-san.

https://note.com/npaka/n/n754187f14a03

How to Approach Sora 2 Prompting

Now, let's look at what should actually be described in the prompt based on my understanding of the guide.

Prompt Strategy

First, I've selected the three most important points regarding how to approach prompts.

  • Structure the prompt as if you were drawing a storyboard.
  • Use clear, specific, and concrete expressions; avoid abstract or ambiguous phrasing.
  • Since the same prompt yields different results every time, generate multiple versions and choose the best one. Use the Remix feature for fine-tuning.

Understanding "Shots"

Not limited to Sora 2, there is a concept called a shot in video production.
A shot is a continuous block of footage recorded by a camera at one time, and a video work can be seen as a collection of multiple connected shots.

A "scene of brewing and drinking coffee" can, for example, be composed of the following four shots:

  1. An electric kettle is boiling in the living room.
  2. Hot water is being poured to brew coffee.
  3. The camera zooms in to show a face taking a sip.
  4. The camera pulls back, showing the person putting the cup down and relaxing.

It's easier to see it in action, so I generated a video consisting of these four shots with Sora 2.

https://sora.chatgpt.com/p/s_68f9a639827481918bc91feab01cd3b4

Regarding shots, the following points are important:

  • Each shot should have only one purpose. It should consist of one camera movement and one clear action.
  • A single prompt can contain multiple shots.
  • Different technical elements (camerawork, angle, lens, etc.) can be specified for each shot.
  • Consider the overall flow of the video and maintain continuity between shots.

Technical Elements that can be Described in Prompts

I have picked up the visual technical elements that can be specified from the prompting guide and summarized them in the table below. Since the guide mentions that "professional terminology can be used," it might be difficult without basic knowledge, so I have included reference links. It is not necessary to specify everything. Elements not specified are left to the model's discretion. The model may follow instructions even for items not listed here.

These can be applied per shot or to the entire video. If the same element is specified for both the entire video and an individual shot, the instruction for the specific shot generally takes precedence.

Element Content/Key Points Example
Overview/Purpose Concisely state the overall concept and purpose of the video Like a documentary
1960s film style
Camera Framing Can be specified using terminology Medium close-up
Macro shot
Camera Angle Can be specified using terminology Low angle
Eye-level shot
Depth of Field (DOF) Specify depth clearly Shallow DOF
Camera Movement One action per shot Slow dolly-in
Tracking-out
Lens Specify focal length numerically
Specify lens type
Shot with a 35mm lens
Use an anamorphic lens
Other Shooting Settings Lens filters, color grading, camera notes...etc Cinematic color grading
Keep eyeline low
Mood Be specific about emotions and atmosphere Melancholic and nostalgic
Location/Time of Day Concisely state the setting and time Urban station platform
Dawn
Light/Lighting Be specific about the quality, direction, and mix of light sources Sharp single light combined with soft light from a window
Subject Action One action per shot A man turns around and looks up
Action Timing Describe actions in counts or seconds Draw the curtains after walking 4 steps
Dialogue Describe who says what and when 3.50s: Neighbor "Hello, it's a nice day again today"
BGM/Sound Effects Sound effects can be specified even if the BGM is silent BGM: Sound of rain
BGM: Silent, Sound Effect: Sound of wind blowing
Other Supplementary Info Person, costume, object...etc.
Be specific about what you want to appear
Navy coat, coffee cup, commuters, departure sound
Foreground: safety line, Middle ground: passengers, Background: train
Negative Specification Clearly state what you don't want to appear Do not show brand signs
Reference Image Provide a URL or attach an image (jpeg, png, webp) Refer to https://example.com/image.jpg for the protagonist's appearance
Timecode Specify specific seconds 0.00 - 3.50 Shot 1
3.50 - 7.50 Shot 2

Generating Videos by Having an LLM Create Prompts

Now, let's actually construct a prompt and try creating a video on Sora.
That said, planning a video piece, dividing it into shots, and determining detailed instructions for each shot is quite a laborious task.

So, let's rely on an LLM for this.
Since this prompting guide covers everything from concepts to specific examples, an LLM should be able to create a good prompt if we let it interpret the guide.

Let's try it out immediately.
I'll feed the following prompt into GPT-5.

https://cookbook.openai.com/examples/sora/sora2_prompting_guide
This URL is a prompting guide for creating videos with Sora 2.
While not explicitly stated in this guide, the current Sora 2 can create 15-second videos.

After understanding the prompting guide, please construct an appropriate prompt for creating a video on the web version of Sora 2.

Subject: Cat cafe
Purpose: Store promotion
A cute and fun video that makes people want to visit the cafe

The prompt for Sora 2 generated by GPT-5 is as follows:

Overview:
A cat cafe on a sunny afternoon with light filtering through the trees. In the bokeh, the details of the fur sparkle, and steam rises from the sweets and latte.
The background features wood, linen, and houseplants. Avoid logos or signs with specific names, depicting it as a universal "cozy space."

Cinematography:
Camera shot: Eye-level medium-wide with a slow dolly -> Macro of the cat's paws -> Medium shot of the cat dozing off on a lap.
Lens: 40mm equivalent, moving towards 50mm in the second half.
Lighting + palette: Natural light key + warm practical lamps; colors are amber/cream/walnut brown.
Mood: Happy, relaxed, high-quality.
Actions (15 seconds, 3 beats):
0–5s: A cat gently enters the frame across a table at a guest's seat.
5–10s: Show the cat's paws "kneading" in a macro-like shot, then focus shifts to the steaming latte and sweets.
10–15s: The expression of the cat melting into the guest's lap. A staff member gently places a dessert, and the cat blinks slightly. Ends with a light dolly-out to show the whole space.
Dialogue / VO (Short):
Female Narration: "A sweet afternoon with cats. Your seat is waiting."
Background Sound: Low-level store ambiance, the slight sound of cutlery clinking, and only the cat's purring (music is subtle).

The video actually generated is here:

https://sora.chatgpt.com/p/s_68f9a10076288191b4cafb4e259cbae5

Prompt Customization and Review

To customize this prompt, it would be easier to change it to a format where you can describe settings for each shot.
It's a good idea to instruct the LLM to "split it by shot."

After that, you can customize the necessary parts. It is more reliable to do this manually, but if the scope of change is too wide, asking an LLM is also an option.

Also, reviewing the prompt is important.
When having an LLM create a prompt, issues tend to occur such as cramming in too many elements, inappropriate time allocation, or inconsistent expressions.
Once you have generated a customized video, check the video and the prompt together. If there seems to be a problem, it's good to make adjustments such as removing elements or shots to give it more breathing room, or correcting expressions.

Customized prompt
Overview: A cat cafe on a sunny afternoon with light filtering through the trees. The background features wood, linen, and houseplants. Avoid logos or signs with specific names, depicting it as a universal "cozy space." The cat is a black cat. It has a calm expression that blends into the cafe's atmosphere.

Shot 1(0–5s)—— Introduction / Creating the atmosphere
Scene: A cat cafe on a sunny afternoon with light filtering through the trees. Wood, linen, and houseplants. Clean and high-quality. No specific logos or signs.
Camera shot: Eye-level medium-wide, slow dolly-in.
Lens & DOF: 40mm equivalent, shallow depth of field to make the subject stand out.
Lighting + palette: Soft natural light from the window + warm practical lamps. Palette anchors: amber / cream / walnut brown.
Action beats: A cat gently enters the frame across a table -> Steam rises from the cup, and the cat sniffs the aroma.
Mood: Happy, relaxed, high-quality. Purpose: To instantly convey comfort and texture.

Shot 2(5–8.5s)—— The "kneading" cuteness
Scene: Close-up of the cat's front paws (macro-leaning), with a latte and sweets blurred in the background.
Camera shot: Fixed
Lens & DOF: 50mm equivalent close-up, shallow DOF (sharp paws, soft background bokeh).
Lighting + palette: Same light source and tone as Shot 1 (emphasizing continuity). Palette anchors: amber / cream / walnut brown.
Action beats: Paws knead twice -> short pause. In the background, steam from the latte flows softly.
Purpose: Fixate on the "cuteness you want to touch" as the focal moment.

Shot 3(8.5–15s)—— Finalizing the store visit image
Scene: A cat dozing off on someone's lap. The warm atmosphere of the cafe in the background.
Camera shot: Gentle dolly-out to broaden the space slightly.
Lens & DOF: Medium shot around 50mm. Sharp eyes and whiskers, soft background.
Lighting + palette: Same lighting design as Shot 1. Palette anchors: amber / cream / walnut brown.
Action beats: Staff member quietly places a dessert on the table -> Cat slowly blinks -> Meows
Purpose: A gentle lingering feeling that solidifies the desire to "spend time here."
Dialogue / VO (Short and natural) Female narration (softly): "A sweet afternoon with cats. Your seat is available."

Background Sound (Common to all) Low-level store ambiance, slight sound of cutlery clinking. Keep BGM subtle and utilize ambient sounds.

The generated video is here:

https://sora.chatgpt.com/p/s_68f9cbfffd34819183c67be191958fe6

My adjustment skills aren't great, so the quality hasn't improved much, but I was able to make customizations such as changing the cat to a black cat and fine-tuning the scenes.

Proposal for a Prompt Template

LLMs output prompts in different formats every time.

Therefore, there is a method to stabilize the format of the prompts generated by the LLM by providing an output template.
Since it outputs in the desired format, you can create prompts in a style that suits your preferences.

I used a template like this:

<<Instructions for the entire video>>
(Overview/Purpose)
(Instructions)

(Timecode) <<Shot 1>>
(Overview/Purpose)
(Instructions)

(Timecode) <<Shot 2>>
(Overview/Purpose)
(Instructions)

(Timecode) <<Shot 3>>
(Overview/Purpose)
(Instructions)



(Timecode) <<Shot N>>
(Overview/Purpose)
(Instructions)

I think it is best to experiment and find a template that you find easy to use.

Summary

I analyzed the Sora 2 prompting guide and actually generated some videos.

By having an LLM understand the prompting guide to generate a draft prompt for Sora 2 and then customizing it, I was able to get closer to the intended video. There is a limit to leaving all adjustments to the LLM; performing manual adjustments for fine details is more certain. Since having knowledge of the prompting guide makes it easier to determine which elements to adjust, I truly felt the importance of actually understanding it. That said, even if you construct a prompt perfectly, some parts will still be left to chance due to the model's characteristics. Therefore, as mentioned in the prompting guide, "Most importantly, be prepared to iterate" (the most important thing is the mindset to keep trying).

What I learned in this article—especially concerning cinematography techniques and prompt construction—should be applicable when using other video generation models. If I have the capacity, I'd like to try out other new models when they are released in the future and write about them.

GitHubで編集を提案

Discussion