👏

5060Ti 16GB の OpenCL 性能のメモ

に公開

https://github.com/ProjectPhysX/OpenCL-Benchmark

83% に powerlimit(150W)して計測しました.

.-----------------------------------------------------------------------------.
|----------------.------------------------------------------------------------|
| Device ID    0 | NVIDIA GeForce RTX 5060 Ti                                 |
|----------------'------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID      | 0                                                          |
| Device Name    | NVIDIA GeForce RTX 5060 Ti                                 |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 576.02 (Windows)                                           |
| OpenCL Version | OpenCL C 3.0                                               |
| Compute Units  | 36 at 2587 MHz (4608 cores, 23.842 TFLOPs/s)               |
| Memory, Cache  | 16310 MB VRAM, 1152 KB global / 48 KB local                |
| Buffer Limits  | 4077 MB global, 64 KB constant                             |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
| FP64  compute                                         0.398 TFLOPs/s (1/64) |
| FP32  compute                                        24.388 TFLOPs/s ( 1x ) |
| FP16  compute                                        25.449 TFLOPs/s ( 1x ) |
| INT64 compute                                         2.245  TIOPs/s (1/12) |
| INT32 compute                                        12.754  TIOPs/s (1/2 ) |
| INT16 compute                                        11.305  TIOPs/s (1/2 ) |
| INT8  compute                                        46.901  TIOPs/s ( 2x ) |
| Memory Bandwidth ( coalesced read      )                        422.04 GB/s |
| Memory Bandwidth ( coalesced      write)                        425.73 GB/s |
| Memory Bandwidth (misaligned read      )                        425.42 GB/s |
| Memory Bandwidth (misaligned      write)                        147.43 GB/s |
| PCIe   Bandwidth (send                 )                         13.47 GB/s |
| PCIe   Bandwidth (   receive           )                         12.85 GB/s |
| PCIe   Bandwidth (        bidirectional)            (Gen4 x16)   13.11 GB/s |
|-----------------------------------------------------------------------------|

まあほぼ仕様通り.
misaligned write では性能落ちるところに注意.

FP flops 性能でみれば 5090 の 1/4 というところ

https://www.techpowerup.com/gpu-specs/geforce-rtx-5090.c4216

Discussion