👏
5060Ti 16GB の OpenCL 性能のメモ
83% に powerlimit(150W)して計測しました.
.-----------------------------------------------------------------------------.
|----------------.------------------------------------------------------------|
| Device ID 0 | NVIDIA GeForce RTX 5060 Ti |
|----------------'------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID | 0 |
| Device Name | NVIDIA GeForce RTX 5060 Ti |
| Device Vendor | NVIDIA Corporation |
| Device Driver | 576.02 (Windows) |
| OpenCL Version | OpenCL C 3.0 |
| Compute Units | 36 at 2587 MHz (4608 cores, 23.842 TFLOPs/s) |
| Memory, Cache | 16310 MB VRAM, 1152 KB global / 48 KB local |
| Buffer Limits | 4077 MB global, 64 KB constant |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled. |
| FP64 compute 0.398 TFLOPs/s (1/64) |
| FP32 compute 24.388 TFLOPs/s ( 1x ) |
| FP16 compute 25.449 TFLOPs/s ( 1x ) |
| INT64 compute 2.245 TIOPs/s (1/12) |
| INT32 compute 12.754 TIOPs/s (1/2 ) |
| INT16 compute 11.305 TIOPs/s (1/2 ) |
| INT8 compute 46.901 TIOPs/s ( 2x ) |
| Memory Bandwidth ( coalesced read ) 422.04 GB/s |
| Memory Bandwidth ( coalesced write) 425.73 GB/s |
| Memory Bandwidth (misaligned read ) 425.42 GB/s |
| Memory Bandwidth (misaligned write) 147.43 GB/s |
| PCIe Bandwidth (send ) 13.47 GB/s |
| PCIe Bandwidth ( receive ) 12.85 GB/s |
| PCIe Bandwidth ( bidirectional) (Gen4 x16) 13.11 GB/s |
|-----------------------------------------------------------------------------|
まあほぼ仕様通り.
misaligned write では性能落ちるところに注意.
FP flops 性能でみれば 5090 の 1/4 というところ
Discussion