🍣

FFMPEG Commands for Vision Pro

に公開

FFmpeg and Apple Device

Hi, I’m building a video platform for an Apple device. FFmpeg is incredibly useful for encoding videos, but we need to be cautious when preparing videos for playback on an Apple device like the Vision Pro.

FFmpeg Commands

I’ll share some commands for video encoding.

Basic Encoding

Let’s start with a basic command:

ffmpeg -i input.mp4 -c:v libx265 -b:v 200M -profile:v main -tag:v hvc1 -map_metadata 0 -c:a copy output.mp4

Okay, I'll explain this command.

  • -i option

    • Specifies the input file. The -i flag tells FFmpeg which file to process. In this case, input.mp4 is the source video file that will be converted or manipulated.
  • -c:v option

    • c:v stands for "codec: video" and defines the encoder to use for the video stream.
    • We can choose an encoder. Typically, we use either libx265 or hevc_videotoolbox:
      • libx265 (aka x265) is the HEVC encoder produced by MulticoreWare. It is open-source and uses software encoding. This means it takes more time but provides higher quality compared to hardware encoding.
      • hevc_videotoolbox (aka VideoToolbox) is the HEVC encoder produced by Apple. It uses hardware encoding with the GPU, making it much faster than software encoding. However, the quality is lower at the same bitrate.
  • -b:v option

    • b:v stands for "bitrate: video" and specifies the target bitrate for the video stream.
    • If you wanna constraints the bit rate, I reccomend the below article. I often use libx265 encoder not hevc_videotoolbox when I wanna achieve that. But official workaround it the following.
  • -profile:v option

    • profile:v specifies the encoding profile for the video codec.
    • The main profile is a common profile in H.265. It supports 8-bit color depth.
    • If you want to encode with 10-bit color depth, you need to set the main10 profile.
  • -tag:v option

    • tag:v sets a codec tag for the video stream in the output container.
    • To ensure compatibility with Apple devices, you need to set this to hvc1.
  • map_metadata option

    • map_metadata controls how metadata (e.g., title, author, creation date) is handled.
    • 0 refers to the first input file (in this case, input.mp4). This tells FFmpeg to copy all metadata from the input file to the output file (output.mp4).
  • c:a option

    • c:a stands for "codec: audio" and specifies the audio codec.
    • copy instructs FFmpeg to pass the audio stream from input.mp4 to output.mp4 unchanged, without re-encoding.

Check video details

You can check video details like color primaries, color transfer and so on.

ffprobe -v error -show_format -show_streams input.mp4

If you wanna restrict output to the specific info, you can use grep command after using ffprobe command like the below.

ffprobe -v error -show_format -show_streams input.mp4 | grep -E "^pix_fmt|^bit_rate|^r_frame_rate|^color_space|^color_transfer|^color_primaries|^profile"

Change Frame rate and Resolution

If you wanna change FPS and Resolution for the video, you can use the following command.

ffmpeg -i input.mp4 -vf fps=<frame rate>,scale=<witdth>:<height> -c:v libx265 -b:v 200M -profile:v main -tag:v hvc1 -map_metadata 0 -c:a copy output.mp4

ProRes Encoding

This is the encoding command from ProRes to ProRes. I assume that ProRes format is 4444 XQ in the below command. Btw prores_ks does not support for yuv444p12le. So let's use yuv444p10le.

ffmpeg -i input.mov -vf fps=<rate>,scale=<w>:<h> -c:v prores_ks -profile:v 5 -pix_fmt yuv444p10le -map_metadata 0 -c:a copy output.mov

ProRes merge left and right eye

ffmpeg -i left.mov -i right.mov -filter_complex "[0:v]setsar=1,format=yuv444p12le[l];[1:v]setsar=1,format=yuv444p10le[r];[l][r]hstack=2,format=yuv444p12le,setparams=color_primaries=smpte432:color_trc=smpte2084:colorspace=bt709:range=full[v]" -map "[v]" -map '0:a?' -map '1:a?' -c:v prores_ks -profile:v 5 -pix_fmt yuv444p12le -movflags write_colr -map_metadata 0 -c:a copy output_sbs_prores4444xq.mov

ProRes divide to left and right eye

ffmpeg -i sbs_prores.mov \
-filter_complex "
  [0:v]fps=90,setsar=1,format=yuv444p10le,split=2[L][R];
  [L]crop=iw/2:ih:0:0,format=yuv444p10le,setparams=color_primaries=smpte432:color_trc=smpte2084:colorspace=bt709:range=full[vL];
  [R]crop=iw/2:ih:iw/2:0,format=yuv444p10le,setparams=color_primaries=smpte432:color_trc=smpte2084:colorspace=bt709:range=full[vR];
  [0:a]apad,asplit=2[aL][aR]
" \
-map "[vL]" -map "[aL]" \
  -c:v prores_ks -profile:v 5 -pix_fmt yuv444p10le \
  -c:a pcm_s16le \
  -color_primaries smpte432 -color_trc smpte2084 -colorspace bt709 -color_range 2 \
  -movflags +write_colr+faststart \
  -video_track_timescale 90000 -shortest \
  left.mov \
-map "[vR]" -map "[aR]" \
  -c:v prores_ks -profile:v 5 -pix_fmt yuv444p10le \
  -c:a pcm_s16le \
  -color_primaries smpte432 -color_trc smpte2084 -colorspace bt709 -color_range 2 \
  -movflags +write_colr+faststart \
  -video_track_timescale 90000 -shortest \
  right.mov

Video cut with specific time

If you wanna cut the video with specific time, you can use the following commands.

Cut the video from the beggining.

ffmpeg -i input.mp4 -t <seconds> -c copy output.mp4

Cut the video from the specific time.

ffmpeg -i input.mp4 -ss <00:00:00> -to <00:00:40> -c copy output.mp4

Based on frame number

ffmpeg -i input.mov \
  -vf "trim=start_frame=1000:end_frame=2000,setpts=PTS-STARTPTS,fps=<rate>,scale=<w>:<h>" \
  -an \
  -c:v prores_ks -profile:v 5 -pix_fmt yuv444p10le -map_metadata 0 \
  output.mov

Video divide to right and left

ffmpeg -i input.mp4 -vf "crop=iw/2:ih:0:0" -c:v libx265 -b:v 200M -profile:v main -tag:v hvc1 -map_metadata 0 -c:a copy output.mp4

Zero B-frame

Sometimes we wanna control GOP for HEVC. And the following command generates the video with Zero B-frame.

ffmpeg -i input.mp4 -c:v libx265 -b:v 200M -profile:v main -tag:v hvc1 -map_metadata 0 -x265-params "bframes=0" -c:a copy output.mp4

If you are not familiar with GOP and B-frame, you can check my another article for HEVC.
https://zenn.dev/kbrash/articles/c71c3fcd66c4b2

Why I wrote this command? Because sometimes custom decoder is built for IDR and P-frame. So if you try to play the HEVC video with such a decoder, it leads to fail for decoding.

Set IDR interval

If you wanna set IDR frame every 90 frames, you can use -g 90 option.

ffmpeg -i input.mp4 -c:v libx265 -b:v 200M -profile:v main -tag:v hvc1 -map_metadata 0 -g 90 -x265-params "bframes=0:open-gop=0" -c:a copy output.mp4

And do not forget to set open-gop=0, if you wanna play on Apple Device. Apple Device does not support open GOP.

What is Open and Closed GOPs? Please check this link.
https://streaminglearningcenter.com/blogs/open-and-closed-gops-all-you-need-to-know.html

Divide into multi tiles

If you wanna divide your HEVC video into multi tiles, you can use the following command.

ffmpeg -i input.mp4 -c:v libx265 -b:v 200M -profile:v main -tag:v hvc1 -map_metadata 0 -x265-params "tiles=3x3" -c:a copy output.mp4

What is tile? Please check my another article. It's really important in terms of decoding speed on the edge device.
https://zenn.dev/kbrash/articles/c71c3fcd66c4b2

Crop SBS video from the center position of each eye

If you have a Side By Side video and want to crop it, you need to divide it for left and right first. After that, you can crop each frame. And be careful if you divide frame, you need to set color property with "-x265-params".

The following example shows the situation that you want to crop 7200x7200 resolution frame (for each eye) to 6600x6000.

SDR

ffmpeg -i input.mp4 -filter_complex "[0:v]crop=7200:7200:0:0[left_full];[0:v]crop=7200:7200:7200:0[right_full];[left_full]crop=6600:6000:300:600[left];[right_full]crop=6600:6000:300:600[right];[left][right]hstack=inputs=2[stacked]" -map "[stacked]" -map 0:a -c:v libx265 -b:v 200M -profile:v main -x265-params "colorprim=bt709:transfer=bt709:colormatrix=bt709" -tag:v hvc1 -map_metadata 0 -c:a copy output.mp4

HDR

ffmpeg -i input.mp4 -filter_complex "[0:v]crop=7200:7200:0:0[left_full];[0:v]crop=7200:7200:7200:0[right_full];[left_full]crop=6600:6000:300:600[left];[right_full]crop=6600:6000:300:600[right];[left][right]hstack=inputs=2[stacked]" -map "[stacked]" -map 0:a -c:v libx265 -b:v 200M -profile:v main10 -x265-params "colorprim=smpte432:transfer=smpte2084:colormatrix=bt709" -tag:v hvc1 -map_metadata 0 -c:a copy output.mp4

SBS Half Equirectangle to Rect

ffmpeg -i input.mp4 -vf v360=input=equirect:output=flat:ih_fov=180:iv_fov=180:h_fov=90:v_fov=90:in_stereo=sbs:w=3840:h=3840 -codec:v hevc_videotoolbox -b:v <bitrate> -profile:v main10 -tag:v hvc1 -map_metadata 0 output.mp4

One Eye Equirectangle to Rect

ffmpeg -i input.mp4 \
-vf v360=input=equirect:output=flat:ih_fov=180:iv_fov=180:h_fov=90:v_fov=90:w=3840:h=3840 \
-codec:v hevc_videotoolbox -b:v <bitrate> -profile:v main10 -tag:v hvc1 -map_metadata 0 \
output.mp4

Frame crop from the video

Crop 20% area from the center

ffmpeg -ss 00:00:10 -i 7200.mov -vframes 1 -filter:v "crop=iw*0.2:ih*0.2" frame_center_20_7200.png

Crop 20% area, and set offsets from left and top edge.

ffmpeg -ss 00:00:10 -i 7200.mov -vframes 1 -filter:v "crop=iw*0.2:ih*0.2:iw*0.1:ih*0.1" frame_offset_10_7200.png

Crop 20% area, set offsets from left and top edge, and adjust brightness.

ffmpeg -ss 00:00:00 -i test1_7200.mov -vframes 1 -filter:v "crop=iw*0.15:ih*0.15:iw*0.3:ih*0.7,eq=brightness=0.15" test2_brighter.png

Using VMAF command

VMAF is an efficient way to value the quality of the videos, which is created by Netflix.
Input videos should be filtered as the same resolution and frame rate. And VMAF is supposed to be evaluated for SDR.

ffmpeg -i hevc_hdr.mov -i prores_hdr.mov -filter_complex "\
[0:v]zscale=transferin=smpte2084:primariesin=smpte432:matrixin=bt709:rangein=tv,\
      zscale=transfer=bt709:primaries=bt709:matrix=bt709:range=tv,\
      fps=90,scale=<width>:<height>,setsar=1,format=yuv420p10le[dist];\
[1:v]zscale=transferin=smpte2084:primariesin=smpte432:matrixin=bt709:rangein=pc,\
      zscale=transfer=bt709:primaries=bt709:matrix=bt709:range=tv,\
      fps=90,scale=<width>:<height>,setsar=1,format=yuv420p10le[ref];\
[dist][ref]libvmaf=log_fmt=json:log_path=vmaf.json" \
-an -f null -

Check bitrate for each frame

ffprobe -v error \
  -select_streams v:0 \
  -show_frames \
  -show_entries frame=pkt_pts_time,pkt_size,pict_type \
  -of csv=p=0 \
  input.mov \
  > frames_bitrate.csv

ProRes to HEVC keeping decodable in Vision Pro

ffmpeg -i sbs_prores4444xq.mov -i music.wav \
  -map 0:v:0 -map 1:a:0 \
  -vf "zscale=matrixin=bt709:transferin=smpte2084:primariesin=smpte432:matrix=bt709:transfer=smpte2084:primaries=smpte432,format=yuv420p10le" \
  -c:v libx265 -profile:v main10 -crf 20 -preset slow -tag:v hvc1 \
  -x265-params "bframes=0:b-adapt=0:open-gop=0:keyint=90:min-keyint=90:scenecut=0:repeat-headers=1:aud=1:hdr-opt=1" \
  -color_primaries smpte432 -color_trc smpte2084 -colorspace bt709 -color_range tv \
  -c:a copy -shortest \
  sbs_hevc.mov

CRF Definition

Approx. CRF range Perceived quality Typical use case
0 Completely lossless (reversible) Pixels are identical to the original. Extremely large files; rarely used in practice.
1–12 Near-lossless, extremely high quality Film mastering, VFX intermediates, etc. Not as heavy as ProRes/EXR, but generally too large for delivery/streaming.
13–17 Very high quality High-quality archives, offline masters, HDR reference files, etc. File sizes are still quite large.
18–20 Visually near-lossless high quality For quality-focused delivery and archiving. Differences from the source are hard to see unless you look very closely.
21–22 High quality with some focus on efficiency Typical “high-quality delivery” range. For Asahi’s VR/HDR use case, a good baseline to test.
23–24 Range where compression becomes noticeable Standard delivery when you also want to save storage. Slight artifacts may appear in dark areas or fine textures.
25–28 Clearly looks compressed For mobile, bandwidth-constrained environments, monitoring, etc. Usually not suitable for high-resolution VR/HDR.
29 and above Very rough / heavily compressed Very low-bitrate experiments, security cameras, previews, etc. Prioritizes size/bandwidth over quality.

x265 parameters

Parameter Value What it controls Effect / Why you’d use it
bframes 0 Maximum number of B-frames between reference frames Disables B-frames. Simplifies decoding and reduces latency, but decreases compression efficiency.
b-adapt 0 Adaptive decision of where to place B-frames Disables adaptive B-frame placement. Has no real effect when bframes=0, but explicitly turns off the B-frame decision.
open-gop 0 Whether GOPs are “open” (can reference across GOP boundaries) Forces closed GOPs. Improves random access and container/player compatibility (each GOP is more self-contained).
keyint 90 Maximum distance between IDR/keyframes (in frames) At 90 fps this gives ~1 second GOPs. Controls how often full keyframes appear (trade-off between seeking and bitrate).
min-keyint 90 Minimum distance between keyframes Forces a fixed keyframe interval of 90 frames (no extra keyframes in between). Ensures perfectly regular GOP structure.
scenecut 0 Automatic insertion of keyframes on scene changes Disables scenecut detection. Prevents “extra” keyframes at scene cuts; GOP length stays strictly constant.
repeat-headers 1 Repetition of VPS/SPS/PPS headers in the bitstream Repeats codec headers regularly (e.g. at each keyframe). Improves robustness and compatibility, useful for some players.
aud 1 Insertion of Access Unit Delimiters (AUD NAL units) Adds AUD NAL units. Required/expected by some decoders and tools; can help with stream parsing and compatibility.
hdr-opt 1 HDR-specific encoding optimizations Enables x265’s HDR optimizations (e.g. better handling of PQ/HDR highlights, less banding/clipping for HDR content).
rc-lookahead e.g. 0 / default Number of future frames analyzed for rate control & scenecut decisions Higher values improve rate control and visual stability but increase latency and CPU. 0 minimizes latency but reduces encoder “intelligence.”

Divide a video based on frame number

Set start and end frame.

ffmpeg -i input.mov \
  -vf "trim=start_frame=1000:end_frame=2000,setpts=PTS-STARTPTS,fps=<rate>,scale=<w>:<h>" \
  -an \
  -c:v prores_ks -profile:v 5 -pix_fmt yuv444p10le -map_metadata 0 \
  output.mov

Set only start frame.

ffmpeg -i input.mov \
  -vf "trim=start_frame=1000,setpts=PTS-STARTPTS,fps=<rate>,scale=<w>:<h>" \
  -an \
  -c:v prores_ks -profile:v 5 -pix_fmt yuv444p10le -map_metadata 0 \
  output.mov

Merge multi files

  • filelist.txt
file '/path/v1.mov'
file '/path/v2.mov'
file '/path/v3.mov'

ProRes

ffmpeg -f concat -safe 0 -i filelist.txt -i music.wav \
  -map 0:v:0 -map 1:a:0 \
  -vf "fps=<rate>,scale=<w>:<h>" \
  -c:v prores_ks -profile:v 5 -pix_fmt yuv444p10le \
  -c:a copy -shortest -map_metadata 0 \
  output.mov

HEVC

ffmpeg -f concat -safe 0 -i filelist.txt -i audio.mov \
  -map 0:v:0 -map 1:a:0 \
  -vf "zscale=matrixin=bt709:transferin=smpte2084:primariesin=smpte432:matrix=bt709:transfer=smpte2084:primaries=smpte432,format=yuv420p10le" \
  -c:v libx265 -profile:v main10 -crf 22 -preset slow -tag:v hvc1 \
  -x265-params "bframes=0:b-adapt=0:open-gop=0:keyint=90:min-keyint=90:scenecut=0:repeat-headers=1:aud=1:hdr-opt=1"
  -color_primaries smpte432 -color_trc smpte2084 -colorspace bt709 -color_range tv \
  -c:a copy -shortest hevc.mov

Make a picture less bright

ffmpeg -i input.jpg -vf "eq=brightness=-0.1" output.jpg

Discussion