ð± RenderDoc 㧠Android ã®draw callã¯èšæž¬ã§ããªã - TB(D)Rã«ãããGPUæéã®èšæž¬ã«é¢ãã泚æ
(Qiita ã®èšäºã®ç§»è»¢; æããã€ã¢ã°ã©ã ãªã©å çãããã)
ã¯ãªãã¯ãã€ããµã ãã€ã«:
RenderDoc ã® Event Browser ã§ã³ãã³ãã®GPUæéãšããŠè¡šç€ºããã Duration ã¯, ã¢ãã€ã«ç«¯æ«ã§ã¯æå³ããªããªã.
ããã€ãã®ããªãæåãªããã°èšäºãã¹ã©ã€ãã«ãããŠã, ãã®ãèšæž¬ããæçšã§ãããã®ããã«æžãããŠããã, ãããã¯èª€ã£ãŠãã.
RenderDoc is not a profiler.
ãã®äºå®ã¯ UE4/UE5 ã ããã Unity ã ãããå€ããããšã¯ãªã.
ããã§ã¯ãããGPUã¢ãŒããã¯ãã£ã®åäœãã¯ãã, ãã©ã€ãã®éçºè ã®çºèšããœãŒã¹ã³ãŒãã®åŒçšãå«ããæ§ã ãªæ ¹æ ãç¶æ³èšŒæ ãªã©ãæããŠè«èšŒãã.
TL;DR
çµè«
ã¢ãã€ã«ç«¯æ«ã®GPUã«ãããŠ, render pass å ã®GPUã¿ã€ã ã¹ã¿ã³ãã¯, äœããã®æå³ããªãå€ãæããªã:
- draw call(s) ããšã®GPUæéã®èšæž¬ãèããããšã¯ã§ããªã; ãã®ããã«åŒã¹ãå®è¡ã®åäœãã®ãã®ã, ååšããªã.
- æå³ããã£ãŠèšæž¬ã§ããã°ã©ãã£ã¯ã¹ãã€ãã©ã€ã³ã®GPUæéã®æå°ã®åäœã¯ render pass (âã¬ã³ããŒã¿ãŒã²ãã)[1].
RenderDoc ã® â°
RenderDoc ã® Event Browser 㧠GPUæéãšããŠè¡šç€ºãããå€(Duration)ã¯, ã¢ãã€ã«ç«¯æ«ã®GPUã§ã¯æå³ããªããªã.
ãªã RenderDoc ã® Duration ã®å€ã«æå³ããªããšèšããã ?
ç䌌ã³ãŒã
// B - A ã®å€ã«, æå³ããã:
vkCmdWriteTimestamp(..., A);
vkCmdBeginRenderPass(...);
...
vkCmdDraw*(...)
...
vkCmdEndRenderPass(...);
vkCmdWriteTimestamp(..., B);
// B - A ã®å€ã«, æå³ã¯ãªã:
vkCmdBeginRenderPass(...);
...
vkCmdWriteTimestamp(..., A);
vkCmdDraw*(...)
vkCmdWriteTimestamp(..., B);
...
vkCmdEndRenderPass(...);
åæ
æ¬ç·šãž: Tile-Based (Deferred) Rendering ãªGPUãã°ã©ãã£ã¯ã¹APIã®ã¿ã€ã ã¹ã¿ã³ãæ©èœã«ã€ããŠæ¢ã«ç¥ã£ãŠããå Žåã¯ã¹ããããããã.
察象ãšãªãGPUã¢ãŒããã¯ãã£
䞻㫠Android ã§äœ¿çšããã Adreno, Mali, PowerVR ãäž»ã«æ³å®ãããã, Apple Silicon ã«ç©ãŸããGPUã«å¯ŸããŠãåãè°è«ãæç«ãããšæããã.
ããã§ã®åé¡ã«ãŠ, TBDR 㯠TBR ã«å¯ŸããŠ, åã¿ã€ã«ã«ãŠã©ã¹ã¿ã©ã€ãŒãŒã·ã§ã³ããåã«é é¢æ¶å»ãè¡ããã®(PowerVR æµã®åŒã³æ¹[3]).
ææ³ | ã¢ãŒããã¯ã㣠|
---|---|
Tile-Based Rendering | Adreno, Mali |
Tile-Based Deferred Rendering | PowerVR, Apple Silicon[4], VideoCore IV[5] |
Immediate Rendering | ãã¹ã¯ããã NVIDIA, AMD, etc. |
NVIDIA Tegra ã«ã€ããŠ
Tile-Based Rendering ã§ã¯ãªã[3:1].
Qualcomm ã®SoCã§ã¿ã€ã ã¹ã¿ã³ãã®çœ ã«åµã£ã人ã®è³ªåã§ã¯, Tegra ã§ã¯æ³å®éãã«ãªã£ããšã®ããš.
Vivante(VeriSillicon) ã«ã€ããŠ
çŸåšã® Vulkan ã GLES 3.x ãå®è£
å¯èœãªã¬ãã«ã®IPã³ã¢ã«ã€ããŠããããäžæ. å
¬åŒãµã€ãã«ã¯ Vulkan® 1.0/1.1/1.2
ã®è¡šèšãããã, äž»ã«çµèŸŒã¿åãã®ãããæ
å ±ã¯å°ãªã, 2015幎以éã®ååã¯å€ç¶ãšããªã.
è¿œå æ å ±1
Vulkan Hardware Database ã« VeriSillicon ã® Android 端æ«ãååšãã. NXP ã® i.MX 8M Mini EVKB ã®ãã®ããã.
è¿œå æ å ±2
Vivante ã®ããŒããŠã§ã¢èšèšã®å質ã«ã€ããŠ, Faith Ekstrand æ°(Intel ã®ãã©ã€ãéçºè
)ã®èŠè.
ãã®ã¹ã¬ããã¯å
šäœçã«åãã³ãã®ããŒããŠã§ã¢ã®èšèšã®å質ã«ã€ããŠéåžžã«èå³æ·±ãç¥èŠãæäŸããŠãã, äžèªãå§ãã.
ã°ã©ãã£ã¯ã¹APIã®GPUã¿ã€ã ã¹ã¿ã³ãã«é¢ããä»æ§ãšãã®å®è£
Vulkan
Vulkan ã§ã¯ 1.0 ã®ã³ã¢ã«, query pool ãçšãã query ã®äžçš® ãšã㊠timestamp query ããã,
ã³ãã³ããããã¡äžã®ã³ãã³ããšããŠã¯ vkCmdWriteTimestamp
ã該åœããã»ã, æ¡åŒµ VK_KHR_synchronization2
ã«vkCmdWriteTimestamp2KHR
ããã.
æ©èœã¯åãã ã, vkCmdWriteTimestamp2KHR
ã¯ãã€ãã©ã€ã³ã¹ããŒãžã®èšå®ã«VK_KHR_synchronization2
ã®ãããåãåãç¹ãç°ãªã.
å®è£
ã«ãããµããŒãã¯çŸ©åçã§ã¯ãªã(VkQueueFamilyProperties::timestampValidBits
, VkPhysicalDeviceLimits::timestampComputeAndGraphics
).
Adreno ã® Vulkan å®è£ ã®ç¶æ³
ããªãå€ãã®ãã©ã€ãããŒãžã§ã³ã§ããµããŒããæåŸ ã§ã, Build Date ã2018幎ã®(æãããæç床ã®äœã)ããŒãžã§ã³ã§ã, ã¿ã€ã ã¹ã¿ã³ãã¯ãµããŒãããŠãã.
Mali ã® Vulkan å®è£ ã®ç¶æ³
ãµããŒãããŠããªã端æ«ã«ã¯ Mali ãå§åçã«å€ãã, ããæè¿(2022/07 çŸåš)ã®ãã©ã€ãããŒãžã§ã³ã§ã¯ãµããŒããå§ããŠããæš¡æ§.
OpenGL ES
GLES ã§ã¯æ¡åŒµ GL_EXT_disjoint_timer_query
ãååšã, æŠããã¹ã¯ããã OpenGL ã® timer query ã«çžåœããæ©èœãèŠå®ãããŠãã.
Android
Android ã§ã¯ããããã®å®è£ ã§ãµããŒããããŠããäžæ¹, ãµããŒãããŠããªã端æ«ã«ã¯ Adreno 540 (2017) ãç©ãã ãã®ãå«ãŸããŠãã. ãã©ã€ãã®ããŒãžã§ã³ã«ããéãã ãšæããã.
gpuinfo.org ã®ããŒã¿ããŒã¹ã§ã¯ãµããŒãçã çŽ56% ã«çãŸãã, ããã«ã¯ä»¥äžãé¢ä¿ããŠãããšæããã:
- Android ã® GLES å®è£ , ãšããæ¬ãã§ã¯ããªãæã®ãã®ãŸã§å«ãã§ããŸãããš
- Vulkan ãšç°ãªããã®ä»æ§ãã®ãã®ãã³ã¢ä»æ§ã§ãªãæ¡åŒµã§ããããš
iOS
iOS ã® GLES å®è£
ã§ã¯ GL_EXT_disjoint_timer_query
ã¯ãµããŒããããŠããªã.
GLES 3.1 ãå®è£
ããŠããªã[8]ã®ãšåãããŠ, Metal æšãã«äŒŽã GLES ã®å·éã®äžç°ã§ã¯ãã.
Metal ã®ã¿ã€ã ã¹ã¿ã³ãæ©èœã«ã€ããŠã¯, ããŒãžã§ã³ãããã€ã¹ã«ãã£ãŠå·®ãæ¿ããããã ã, å å®ããŠãããšããã.
ãã©ã€ãããŒãžã§ã³ã«ããæ©èœã®å·®
èå³æ·±ãããšã«, Mali ãç©ãã å€ãã®ç«¯æ«ã®ãã©ã€ãã§ã¯, ãã® GLES å®è£
ãGL_EXT_disjoint_timer_query
ã«å¯Ÿå¿ããŠããã«ãé¢ããã, Vulkan å®è£
ã§ã¯ã¿ã€ã ã¹ã¿ã³ãã®äœ¿çšã«å¯Ÿå¿ããŠããªã (VkQueueFamilyProperties::timestampValidBits
== 0) ããã ã£ã.
äžè¿°ã®ãšãã, ããã¯æè¿ã®ãã©ã€ãã®ãªãªãŒã¹ã§ã¯æ¹åãããŠããããã§, åçŽã« Vulkan å®è£
ã§ã¯å¯Ÿå¿ãè¡ãããã«ãªãªãŒã¹ãããŠããã ãã ãšæããã.
æ¬ç·š
TB(D)R ã®æåã«ã€ããŠæ¢ç¥ã®å Žåã¯ã¹ããã
é TB(D)R ãš TB(D)R ã§ã® draw ã³ãã³ãã®æåã®åŸ©ç¿
Rust ã£ãœãç䌌ã³ãŒãã§å€§ãŸããªæåã瀺ã.
ãã€ãã©ã€ã³ã¹ããŒãžã®ãã¡è°è«ã®æ¬è³ªã«é¢ä¿ã®ç¡ããã®ã, ã¢ãŒããã¯ãã£ããšã®è©³çŽ°ãªæé©åãªã©[10]ã«ã€ããŠã¯, å²æãã.
Immediate Rendering ãªGPU
ã°ã©ãã£ã¯ã¹APIã® draw ã³ãã³ãã¯, ããçšåºŠçŽæçãªåãããã:
for draw_command in draw_commands {
for primitive in draw_command.primitives {
for vertex in primitive.vertices {
vertex.shade();
}
let fragments = primitive.rasterize();
for fragment in fragments {
fragment.shade();
}
}
}
TBR(Tile-Based Rendering) ãªGPU
TBR ã§ã¯, render pass ã®ããŒããŠã§ã¢ã§ã®å®è¡ã¯å€§ãã2ã€ã®ãã¹ã«åããã.
1ãã¹ç®ã§ã¯åããªããã£ãã«ã€ããŠé ç¹ã·ã§ãŒããå®è¡ããŠæå±ããã¿ã€ã«ã決å®ã,
2ãã¹ç®ã§ã¯åã¿ã€ã«ã«ã€ããŠ, 1ãã¹ç®ã§åéãããããªããã£ãã®ã©ã¹ã¿ã©ã€ãŒãŒã·ã§ã³ã®ããš, ãã©ã°ã¡ã³ãã·ã§ãŒããå®è¡ããã.
ãã®æç¹ã§, ãdraw call ããšãã®æŠå¿µã¯ç Žç¶»ããããšãããã.
1. Binning Pass (Adreno) / Geometry Processing (Mali)
ã¬ã³ããŒã¿ãŒã²ãããã¿ã€ã«ã«åå²ã, draw ã³ãã³ãã®ããªããã£ãã«ã€ããŠé ç¹ã·ã§ãŒããå®è¡ã, åã¿ã€ã«ã§å¯èŠãªãã®ã決å®ããŠãã.
ã¿ã€ã«ã®å€§ãã㯠Mali ã§ã¯åºæ¬çã«ã¯ 16x16 ã«ãªãã®ã«å¯ŸããŠ, Adreno ã§ã¯ 倧ãã[11]ãã€ããªãæè»ãªæš¡æ§ (Snapdragon Profiler ã§å®éã®å€§ããã確èªã§ãã).
Adreno ã«ã€ããŠ, ã¿ã€ã«ã®å€§ãããªã©ã®æ
å ±ãååŸããããã® Qualcomm ã®ãã³ãæ¡åŒµã®ä»æ§ããªãªãŒã¹ããã.
for draw_command in draw_commands {
for primitive in draw_command.primitives {
// é ç¹äœçœ®ã決å®ããããé ç¹ã·ã§ãŒããå®è¡ãã.
for vertex in primitive.vertices {
vertex.shade();
}
// æå±ããã¿ã€ã«ã決å®ã, å¯èŠããªããã£ããšããŠç»é²ãã.
let tiles = find_tiles_for_primitive(&primitive);
for tile in tiles {
tile.add_visible_primitive(&primitive);
}
}
}
2. Rendering Pass (Adreno) / Fragment Processing (Mali)
åã¿ã€ã«ã«ã€ããŠ, å¯èŠãªããªããã£ãã®ã©ã¹ã¿ã©ã€ãŒãŒã·ã§ã³ãšãã©ã°ã¡ã³ãã·ã§ãŒãã®å®è¡ãè¡ã.
ãdraw callããšããæ¬ãã¯, ãã¯ãç¡ã.
for tile in render_pass.tiles {
// ã¯ãªã¢ããªããªã, ã·ã¹ãã RAMã®ãã¬ãŒã ãããã¡ãã, å
容ã on-chip ãªã¿ã€ã«å°çšã¡ã¢ãªã«ããŒã.
tile.load_framebuffer_if_not_clearing();
// ãã®ã¿ã€ã«ã§å¯èŠãªããªããã£ãã®ãã©ã°ã¡ã³ãã·ã§ãŒããå®è¡ã,
// ã¬ã€ãã³ã·ãå°ãã垯åå¹
ã®åºã on-chip ãªã¿ã€ã«å°çšã¡ã¢ãªã«æç»ãã.
for primitive in tile.visible_primitives {
let fragments = primitive.rasterize();
for fragment in fragments {
fragment.shade();
}
}
// on-chip ãªã¿ã€ã«å°çšã¡ã¢ãªã®å
容ã, ã·ã¹ãã RAMã®ãã¬ãŒã ãããã¡ã«ã¹ãã¢.
tile.store_framebuffer();
}
æ ¹æ
Mali ã®ãã©ã€ãéçºè ã®åç
Arm ã®éçºè
ãã©ãŒã©ã ã®, Mali ã«ãããGL_EXT_disjoint_timer_query
ã®æåã«é¢ãã質åã«, Mali ã®ãã©ã€ãããŒã ã®ãšã³ãžãã¢ã§ãã Peter Harris æ°ãèå³æ·±ãåçãããŠãã(匷調çè
):
Tile-based GPUs like Mali don't even implement the pipeline as a single pipeline.
(äžç¥)
you can't use timer queries for timing single drawcalls; they don't exist in isolation in any usable form.
From a query point of view all drawcalls in the pass will complete when the last tile in the fragment shading completes.
Timer queries can be used with some success for timing single renderpasses,
æèš³(匷調çè ):
Mali ã®ãã㪠tile-based ãªGPUã¯ãããããã€ãã©ã€ã³ã(OpenGL ã®ä»æ§ã§ç€ºããããããª)1ã€ã®ãã€ãã©ã€ã³ãšããŠå®è£ ããŠããªã.
timer query ã§åã ã® draw call ã®æéãèšæž¬ããããšã¯ã§ããªã; draw call ã¯ç¬ç«ãã圢ã§ååšããªã.
query ã®èŠç¹ã§ã¯, render pass äžã® draw call ã®çµäºã¿ã€ãã³ã°ã¯, æåŸã®ã¿ã€ã«ã®ãã©ã°ã¡ã³ãã·ã§ãŒãã£ã³ã°ãå®äºãããšãã«ãªã.
timer query ã¯åã ã® render pass ã®æéã®èšæž¬ã«ã¯ããçšåºŠæå¹ã ã, ...
èŠããã« draw call ããš, ãšåŒã¹ããããªåäœãã®ãã®ãååšããªãããšãææããŠããã,
ãæåŸã®ã¿ã€ã«ã®ãã©ã°ã¡ã³ãã·ã§ãŒãã£ã³ã°ãå®äºãããšãã, ãšãããã¬ãŒãºãç¹ã«æ³šç®ã«å€ãã.
Adreno ã®ãªãŒãã³ãœãŒã¹ Vulkan ãã©ã€ãã®ã³ã¡ã³ã
ãªãŒãã³ãœãŒã¹ã®ã°ã©ãã£ã¯ã¹ãã©ã€ãã¹ã¿ãã¯ã§ãã mesa ã«ã¯ Adreno çšã® Vulkan ãã©ã€ããå«ãŸã, turnip ãšåŒã°ãã.
turnip ã®vkCmdWriteTimestamp2KHR
ã®å®è£
ãèŠããŠã¿ã.
ãã㯠query pool ã®ã€ã³ããã¯ã¹uint32_t query
äœçœ®ã«ãã®æç¹ã§ã®ã¿ã€ã ã¹ã¿ã³ããæžãããã³ãã³ãã ã,
ãã®å®è£
ã®åé ã«ã¯éåžžã«èå³æ·±ãã³ã¡ã³ãããã:
VKAPI_ATTR void VKAPI_CALL
tu_CmdWriteTimestamp2(VkCommandBuffer commandBuffer, ..., uint32_t query)
{
...
/* Inside a render pass, just write the timestamp multiple times so that
* the user gets the last one if we use GMEM. There isn't really much
* better we can do, and this seems to be what the blob does too.
*/
struct tu_cs *cs = cmd->state.pass ? &cmd->draw_cs : &cmd->cs;
æèš³(匷調çè ):
render pass äžã®å Žå, ãšããããã¿ã€ã ã¹ã¿ã³ãã¯è€æ°åæžã蟌ãã§ããŸã,
GMEM ã䜿çšããŠããå Žåã«ã¯ãŠãŒã¶ãŒã«ã¯æåŸã«æžã蟌ãã å€ãåŸãããããã«ãã.
ä»ã«ç¹ã«ããããã¯ç¡ãã, blob ããã®ããã«ããŠããããã .
ãGMEMã ãšã¯ Adreno æµã®ãã¿ã€ã«å°çšã¡ã¢ãªãã®åŒã³æ¹ã§ãã, ãblobã㯠mesa ã®æèã§ã¯ãã³ã補ã®ã¯ããŒãºããœãŒã¹ãªãã©ã€ãã®ããšãæã.
ãGMEM ã䜿çšããŠããå Žåãã¯ãŸãã« Tile-Based Rendering ãè¡ã[14]å Žåã®ããšãèšã£ãŠãã,
render pass äžã® draw ã³ãã³ããããããã®ã¿ã€ã«æ¯ã«ç¹°ãè¿ããããã, query pool äžã®ã€ã³ããã¯ã¹query
äœçœ®ãžã®ã¿ã€ã ã¹ã¿ã³ãã®æžèŸŒã¿ããŸã, ã¿ã€ã«ããšã«ç¹°ãè¿ãããããšãæå³ããŠãã.
ã¿ã€ã«ããšã«åãäœçœ®ãžã®äžæžããç¹°ãè¿ãããã®ã§,
ã¢ããªã±ãŒã·ã§ã³åŽã«èŠããå€ã¯æåŸã®ã¿ã€ã«ã®å®è¡ã§æžã蟌ãŸããå€ã«ãªã.
Adreno ã®ãªãŒãã³ãœãŒã¹ GLES ãã©ã€ãéçºè ã®æçš¿
Rob Clark æ°ã¯ mesa ã® freedreno (Adreno çšã® GLES ãã©ã€ã)ã®éçºè
ã®1人ã ã,
Qualcomm ã®éçºè
ãã©ãŒã©ã ã«ãŠ, GL_EXT_disjoint_timer_query
ã§åŸãããå€ã«ã€ããŠã®è³ªåã«åçãæçš¿ããŠãã(匷調çè
):
(since start and stop time kind of have no sensible meaning with a tiler)
(äžç¥)
From the cmdstream, it looks like it is overwriting the saved timestamps from the previous tile on the next tile
(åŸç¥)
æèš³(匷調çè ):
(TB(D)R[15] ã§ã¯éå§/çµäºã¯æå³ãæããªãæããªã®ã§)
ã³ãã³ãã¹ããªãŒã [16]ãããããš, åã®ã¿ã€ã«ã§æžã蟌ãã ã¿ã€ã ã¹ã¿ã³ãã次ã®ã¿ã€ã«ã§äžæžãããŠããããã«èŠãã, ...
ãåã®ã¿ã€ã«ã§æžã蟌ãã ã¿ã€ã ã¹ã¿ã³ãã次ã®ã¿ã€ã«ã§äžæžãããŠããã.
ãã¯ã, ã¢ããªã±ãŒã·ã§ã³åŽã«èŠããå€ã¯æåŸã®ã¿ã€ã«ã®å®è¡ã§æžã蟌ãŸããå€ã«ãªã.
ãã¡ãããã®æå㯠implementation detail ã§ãã, ã°ã©ãã£ã¯ã¹APIã®ä»æ§ã§å®çŸ©ããããã®ã§ã¯ãªã.
ç¶æ³èšŒæ ãªã©
Adreno ã® Vulkan å®è£ ã«ããããã°
詳ãããã©ã€ãããŒãžã§ã³ã®ç¯å²ã¯äžæã ã, vkCmdWriteTimestamp
ã«ã€ããŠ, 以äžã®ãã°ã«ééããçµéšããã:
- secondary command buffer äžã§ã®ã¿ã€ã ã¹ã¿ã³ããæ°žä¹
ã« available ã«ãªããªã;
VK_QUERY_RESULT_WAIT_BIT
ä»ãã®vkGetQueryPoolResults
ãæ°žä¹ ã«ãããã¯ãã. - multiview ã䜿çšãã render pass ã§ã¿ã€ã ã¹ã¿ã³ãã view ã®æ°ã«é¢ããã1ã€ããæžã蟌ãŸããªã.
ä»æ§ã®èŠå®éãã« view ã®æ°ã ãæžã蟌ãŸããåæã§vkGetQueryPoolResults
ãããš, æ°žä¹ ã« available ã«ãªããªã.
ãããã¯, ãã®ã©ã¡ãã render pass äžã®äœ¿çšã§çºçãããã®ã ãšããããšã«ãã®éèŠæ§ããã.
ã€ãŸãå°ãªããšãããæç¹ãŸã§ã¯, Qualcomm å
éšã§ãããã®ãŠãŒã¹ã±ãŒã¹ã¯ãã¹ããããªãã£ããšæ³åããã.
ããã¯ãããã render pass äžã§ã¿ã€ã ã¹ã¿ã³ãã䜿çšããããšã Qualcomm ã«ãã£ãŠæ·±ãèæ ®ãããªãã£ãå¯èœæ§ã瀺åãã.
Oculus Developers ã®èšäºã§ãããã¡ã€ãªã³ã°çšã« RenderDoc ãæšå¥šãããŠããã
ãã®èšäºã§ã¯, Meta ã Oculus Quest çšã«ç¹å¥ã«å éšã§ãã©ãŒã¯ããŠæäŸããŠãã RenderDoc for Oculus ã®è©±ãããŠãã:
Oculusã¯ãRenderDocã®ç¬èªã®ãã©ãŒã¯ã管çããããã«ãªããŸãã
ãã®ãã©ãŒã¯ã¯ãQuestã®Snapdragon 835ããããšQuest 2ã®Snapdragon XR2ãããããã®äœã¬ãã«GPUãããã¡ã€ãªã³ã°ããŒã¿(ç¹ã«ãã®ã¿ã€ã«ã¬ã³ãã©ãŒããã®æ å ±) ãžã®ã¢ã¯ã»ã¹ãæäŸããŸãã
Binning Pass ã§äœããã ãã³(ã¿ã€ã«) ã®å®è¡ã®è©³çŽ°ã衚瀺ãã ã¿ã€ã«ã¿ã€ã ã©ã€ã³ ãALUåœä»€æ°ãªã©, Snapdragon Profiler ã§ããèŠããªãã£ããããªæ
å ±ã®è¡šç€ºãè¿œå ãããŠãã.
ãœãŒã¹ã³ãŒããæäŸãããŠããªãã®ãæããã, å°ãªããšãå®å
šã«ãã³ãäŸåã§ãã, æŽã«ã¯ç¹å®ã®ãããäŸåã§ããå¯èœæ§ããã.
ã€ãŸã, æããã« Tile-Based Rendering ãªGPUåãã«ã«ã¹ã¿ãã€ãºããã RenderDoc ã«ãªã£ãŠãã.
æŽã«, draw call ããšã® Duration ã«é¢ããå€æŽã«ã€ããŠã¯ããããããªã.
Metal: iOS, Mac(Apple Silicon)
iOS ã®ãã®ã M1 ãªã©ã® SoC ã«ç©ãŸããŠããGPUã TBDR ã§ãã, Adreno ã Mali ãªã©ãšåæ§ã®è°è«ãæç«ããã¯ãã ã,
Metal ãçŽæ¥è§Šã£ãçµéšãå°ãªããã, çŽæ¥èŠ³æž¬ããããšã¯ãªã.
MoltenVK ã«ã, å°ãåã®ããŒãžã§ã³ãŸã§ timestamp query ã¯é©åã«å®è£
ãããŠããªãã£ã.
MoltenVK ã®éçºè ã®ã³ã¡ã³ã
Apple ã® SoC ã®çŽ°ããæåã«ã€ããŠã¯å
¬éãããŠããæ
å ±ãå°ãªãã,
MoltenVK ã®äž»ãªéçºè
ã§ããã¡ã³ããã§ãã Bill Hollings æ°ã®, ã¿ã€ã ã¹ã¿ã³ãã®å€ã«é¢ãã issue ãžã®ã³ã¡ã³ãã«ããã°(匷調çè
):
Apple SoC timestamps are generated at the end of the current encoding pass.
All timestamps in the renderpass will have the same timestamp.
IM GPU's are different and support the kind of per-draw timestamping that Vulkan defines.
æèš³(匷調çè ):
Apple ã® SoC ã®ã¿ã€ã ã¹ã¿ã³ãã¯çŸåšã® encoding pass ã®æåŸ[17]ã«çæããã.
render pass äžã®ã¿ã€ã ã¹ã¿ã³ãã¯, å šãŠåãå€ã«ãªã.
ã€ããã£ãšã€ãã¢ãŒãã®GPUãªãã°, Vulkan ãå®çŸ©ãããã㪠draw ããšã®ã¿ã€ã ã¹ã¿ã³ããè¡ãã.
ãããä¿¡ãããªã, render pass äžã®ã¿ã€ã ã¹ã¿ã³ãã¯å šãŠ, åããå šãŠã®ã¿ã€ã«ã®åŠçãå®äºãããšãã®å€ã«ãªããã, ã©ã®ãã¢ã®å·®åããŒãã«ãªã, äœã®æå³ããªãããšã«ãªã.
Metal: ã¿ã€ã ã¹ã¿ã³ãã®äœ¿çšå¯èœç®æ
MTLDevice ã«ã¯, ã³ãã³ããããã¡äžã®ç®æãè¡šãåæäœMTLCounterSamplingPoint
ãåãåã£ãŠ, ãã®ããã€ã¹ã§ã®ã¿ã€ã ã¹ã¿ã³ãã®äœ¿çšå¯èœç®æãåãåãããã€ã³ã¿ãŒãã§ãŒã¹, supportsCounterSampling
ãååšãã.
å®çŸ©ãããŠããåæå€ã¯:
-
atBlitBoundary
: blit ã³ãã³ãé -
atDispatchBoundary
: kernel (ã³ã³ãã¥ãŒãã·ã§ãŒã)ã® dispatch ã³ãã³ãé -
atDrawBoundary
: draw ã³ãã³ãé -
atStageBoundary
: render pass ã®é ç¹/ãã©ã°ã¡ã³ãã¹ããŒãžé, compute/blit ãã¹é -
atTileDispatchBoundary
: render pass å ã® tile dispatch é
ã€ãŸã, draw call ããšã®èšæž¬ãè¡ãããã«ã¯, ãã®é¢æ°ãatDrawBoundary
ã«ã€ããŠtrue
ãè¿ãå¿
èŠããã.
ã ãæ¢ã«äºæ³ãã€ãããã«, Apple Silicon ã®ããã€ã¹ã§ã¯, ãã®ãã¡ atStageBoundary
ã«ã€ããŠããtrue
ãè¿ããªãããã.
WebGPUã®ä»æ§çå®ã®ãªããžããªã«ãŠ, åœæã®ä»æ§ã§å®ããããã¿ã€ã ã¹ã¿ã³ãæ©èœã, TBDR ã¢ãŒããã¯ãã£ã§å®è£ ããããšãäžå¯èœã§ããããšãææãããã, æçš¿è ã«ããã°:
Apple Silicon devices only return true for
atStageBoundary
; all the other enum values return false.
ã€ãŸã Apple Silicon ã§ã¯ Metal ã®å®è£ èªäœã, ãã®ãããªã€ã³ã¿ãŒãã§ãŒã¹ãéããŠ, draw call ããšã®èšæž¬ãäžå¯èœã§ããããšãå ±åãã.
Xcode ã® GPUãã£ããã£ã§ draw ã³ãã³ãã®æã«è¡šç€ºãããæéã¯?
ãªãã ããã.
ãã³ãç¬èªã®ããŒããŠã§ã¢ã«ãŠã³ã¿ã§, åãã€ãã©ã€ã³ã¹ããŒãžã«å解ããã draw call ã®ã¯ãŒã¯ããŒãã®æèŠæéãèšæž¬ããŠå
ã® draw call ããšã«å ç®ãããããŠããã®ã§ãªããã°,
RenderDoc ã® Duration ã®ããã«æå³ã®ãªãå€ã«ãªã£ãŠããã®ã?
PowerVR ã®é ã¯ã©ãã ã£ãð€?
ãªã RenderDoc ã® Duration ã®å€ã«æå³ããªããšèšããã ?
ãããŸã§ã®è°è«ãã, TB(D)R ã«ãã㊠Event Browser äžã® draw ã³ãã³ãã® Duration ã®å€ãç¡æå³ãªã®ã¯èªæãšèšããã,
vkCmdBeginRenderPass
ã® Colour Pass
ã®ããã«ãã, åæ§ã«æçšæ§ããªã.
â° Time durations for the actions ã®åäœ
Event Browser ã® â° Time durations for the actions ã¯, ãã£ããã£ãããã³ãã³ããã¡ã«ã€ããŠ,
å draw ã³ãã³ãã®ååŸã«vkCmdWriteTimestamp
ãæ¿å
¥ããŠã¿ãŒã²ããããã€ã¹ã«åå®è¡ãã, æžã蟌ãŸããã¿ã€ã ã¹ã¿ã³ãã®å·®åãåéãã:
EventBrowser::on_timeActions_clicked()
:
VulkanReplay::FetchCounters(const rdcarray<GPUCounter> &counters)
:
Vulkan 以å€ã®ã°ã©ãã£ã¯ã¹APIã«ã€ããŠã, æŠãåããããªåäœããã.
芪ããŒãã®å€ã¯, åããŒãã®å€ã®åèšãšããŠèšç®ããã.
EventItemModel::CalculateTotalDuration
:
ã€ãŸãColour Pass
ã®ããŒã ã® Duration ã¯, ãã®åããŒãã§ãã draw ã³ãã³ããã¡ã® Duration ã®ç·åã«éãã,
äœãã®åèã«ãªãæ
å ±ã«ã¯ãªããªã.
代æ¿æ¡
åäŸã® draw ã³ãã³ãã® Duration ã®ç·åãšãã圢ã§ãªã,
vkCmdBeginRenderPass
ïœvkCmdEndRenderPass
ã®ååŸã«vkCmdWriteTimestamp
ãæ¿å
¥ããããã«ããäžã§,
ãã®å€ã®å·®åãColour Pass
ã® Duration ãšã㊠Event Browser ã«è¡šç€ºããããã«ã§ããã°,
TB(D)R ã«ãããŠãããçšåºŠæçšãªGPUæéã®èšæž¬æ©èœãšããŠäœ¿ããã¯ãã ãšæ³åããã.
ããŸã
VK_QCOM_tile_properties
Qualcomm ã®ã¿ã€ã«é¢é£ã®æ¡åŒµ 2022/06/21 ã® Vulkan 1.3.222 ã®ä»æ§ãªãªãŒã¹ã«, Qualcomm ã®ãã³ãæ¡åŒµã® proposal ã2ã€å«ãŸããŠãã.
ç»ååŠçé¢é£ã®æ©èœãæäŸããVK_QCOM_image_processing
ã¯ãšãããã眮ããŠãããŠ, ã¿ã€ã«ã®å€§ãããååŸã§ããã€ã³ã¿ãŒãã§ãŒã¹ãæäŸããVK_QCOM_tile_properties
ãèå³æ·±ã.
ãŸã çæãããããã¥ã¡ã³ãã®åœ¢ã«ã¯ãªã£ãŠããªãã, proposal ã® asciidoc ã¯èŠãããšãã§ãã.
proposal ã«ã¯, Adreno ã®ã¿ã€ãªã³ã°(binning)ã®åäœã«ã€ããŠã®è²Žéãªæ
å ±ãå«ãŸããŠãã.
ããå®çšçã«ã¯, fragment density map ã®çæã«åœ¹ç«ãŠãããšãæ³å®ãããŠããããã.
ã€ã³ã¿ãŒãã§ãŒã¹
çŸç¶ã§ã¯ä»¥äžã®ãããªã€ã³ã¿ãŒãã§ãŒã¹ãææ¡ãããŠãã, render pass ã VK_KHR_dynamic_rendering
ã®VkRenderingInfo
ã«ã€ããŠ, ã¿ã€ã«ã®æ
å ±ãååŸããæ©èœãæäŸãããããšãããã.
typedef struct VkTilePropertiesQCOM {
VkStructureType sType;
void* pNext;
VkExtent3D tileSize;
VkExtent2D apronSize;
VkOffset2D origin;
} VkTilePropertiesQCOM;
VkResult vkGetFramebufferTilePropertiesQCOM(VkDevice, VkFramebuffer, uint32_t* pPropertiesCount, VkTilePropertiesQCOM* pProperties);
VkResult vkGetDynamicRenderingTilePropertiesQCOM(VkDevice, const VkRenderingInfo*, VkTilePropertiesQCOM* pProperties);
FlexRender ã«ã€ããŠ
FlexRender[14:1] 㧠Immediate Rendering ã«ãªã£ãã±ãŒã¹ãæ°ã«ãªããšããã ã, ãã¡ãã proprosal ã® issues ã§è§ŠããããŠãã:
=== RESOLVED: Adreno implementation may decide to execute certain workloads in direct rendering mode a.k.a Flex render. What is the interaction of this extension with Flex render?
In those cases, the information returned by this extension may not indicate the true execution mode of the GPU.
ã€ãŸã, ãã®å Žå, æ£ããæ å ±ãè¿ããªãããšãå®è£ ã«èš±å¯ããå 容ãšãªã£ãŠãã.
Subpass ã® on-chip åäœã確èªããæ¡åŒµ
è¿œèš(2023/02/13): Pixel 6 ã® Mali ã®æ°ããã®ãã©ã€ã(38.1.0) ã«å®è£
ãããŠãã.
2022/06/14 ã«ãªãªãŒã¹ãããä»æ§ã§ãããµããŒãããŠããå®è£
ã¯ãªãã, VK_EXT_subpass_merge_feedback
ã«, render pass äžã® subpass ãå®éã«ããŒãžããããã©ãããåãåãããããã®ã€ã³ã¿ãŒãã§ãŒã¹ãèŠå®ãããŠãã.
ããŒãžã§ããªãã£ãå Žåãã®çç±ãè¿ããããã«ãªã£ãŠããããã.
Adreno ã®ãã°
ã¿ã€ã ã¹ã¿ã³ãã«é¢é£ããªãã, ä»ã«ãããã€ãç¥ãããŠãã Adreno ã® Vulkan å®è£ ã®ãã°ããã:
-
å®éã«ã¯æŽã« render pass éã®ãã€ãã©ã€ã³åã«ãã誀差ããã. â©ïž
-
ãã³ãç¬èªã®ããŒããŠã§ã¢ã«ãŠã³ã¿ãèªãå¿ èŠããã; Vulkan ã OpenGL ES(ã®ã³ã¢ä»æ§)ã«ã¯ãã¿ã€ã«ãã®æŠå¿µã¯ãªãããšã«æ³šæ. Vulkan ã® render pass API ã¯ãã¿ã€ã«ã®æŠå¿µãé²åºããã« TB(D)R åãã®æé©åãèš±ããããšãæå³ãããæŠå¿µã ã£ã (ããã¯å€±æã ã£ããšããåããå€ã). â©ïž
-
Bring your Metal app to Apple silicon Macs - WWDC20 - Videos - Apple Developer â©ïž
-
Real World Technologies - Forums - Thread: Article: Tile-based Rasterization in Nvidia GPUs: Rob Clark æ°ã®æçš¿. â©ïž
-
ãã , GLES 3.1 ã®ä»æ§æžã¯ GLES 3.0 ã®ãããšæ¯ã¹ãŠäžæ°ã«ããªã¥ãŒã ãå¢ããäºå®ããã. â©ïž
-
Metal Retrospective - Roblox Blog:
> there is no GL driver on iOS 10 on the newest iPhones, apparently. GL is implemented on top of Metal, â©ïž -
Early Z rejection, Hidden Surface Removal, Forward Pixel Kill, Transaction Elimination, etc. â©ïž
-
Adreno tiling · Wiki · freedreno / freedreno · GitLab â©ïž
-
IDVS(Index-Driven Vertex Shading): The Bifrost Shader Core, IDVS shader variants(Arm Mali Offline Compiler User Guide Version 7.3) â©ïž
-
HSR(Hidden Surface Removal): Sorting Objects and Geometry on PowerVR Hardware - Imagination â©ïž
-
ä»ã«ã¯ã³ã³ãã¥ãŒãã·ã§ãŒãã®èšæž¬ã«ã䜿çšã§ããã»ã, Adreno 㯠render pass äžã®ãžãªã¡ããªã®è€é床ã«ãã, tile-based ã§ãªããã¹ã¯ãããã®ãããªã€ããã£ãšã€ãã¢ãŒãã«åãæ¿ããæ©èœãæ〠(FlexRender). â©ïž â©ïž
-
è±èªã®æç« ã§ã¯ããåçŽã« tiler ãšåŒã°ãã. â©ïž
-
blob ãããŒããŠã§ã¢ã«ã³ãã³ãã¹ããªãŒã ãéãåäœã®è§£æçµæã ãšæããã. â©ïž
-
MTLRenderCommandEncoder
ã®æåŸ, ã€ãŸã render pass ã®æåŸ. â©ïž
Discussion