🚅
Valkey RDMAの基礎性能測定
サマリ
- valkeyがExperimentalでRDMA対応ということで、動かしてみた
- RDMAを利用する場合、通常(TCP)の場合と比較して2倍強のThroughput、Latencyも50%以上削減
- この差異がどこに起因するものなのか、RDMA実装を継続調査(予定)
情報源
RDMA対応 Valkey Server Setup
$ git clone https://github.com/valkey-io/valkey.git
$ cd valkey
$ git checkout 8.0.1
$ make BUILD_RDMA=module
RDMA対応 redis-benchmark Setup
$ git clone https://github.com/pizhenwei/redis.git
$ cd redis
$ git checkout -b feature-rdma-with-cli origin/feature-rdma-with-cli
$ make BUILD_RDMA=yes -j16
測定
構成
Server | Client | |
---|---|---|
CPU | Intel(R) Core(TM) i5-9500 CPU @ 3.00GHz | AMD Ryzen 9 5950X 16-Core Processor |
memory | 32GB | 128GB |
NIC | Mellanox Technologies ConnectX®-5 EN network interface card, 10/25GbE dual-port SFP28, PCIe3.0 x8, tall bracket; MCX512A-ACAT | Mellanox Technologies ConnectX-4 Lx Stand-up dual-port 10GbE MCX4121A-XCAT |
Valkey Server起動設定
起動設定
./src/valkey-server --port 6379 \\
--loadmodule src/valkey-rdma.so port=6380 bind=0.0.0.0 \\
--protected-mode no --appendonly no --maxmemory 20gb
起動後設定
./src/valkey-cli
> CONFIG SET maxmemory-policy volatile-lru
RDMA
redis-benchmark実行パラメータ
./src/redis-benchmark -h 192.168.248.229 -p 6380 -c 30 -n 10000000 -r 1000000 --threads 8 -d 512 -t ping,set,get,lrange_100 --rdma
PING_INLINE
====== PING_INLINE ======
10000000 requests completed in 19.25 seconds
30 parallel clients
512 bytes payload
keep alive: 1
multi-thread: yes
threads: 8
...
Summary:
throughput summary: 519426.53 requests per second
latency summary (msec):
avg min p50 p95 p99 max
0.056 0.000 0.055 0.079 0.087 374.271
PING_MBULK
====== PING_MBULK ======
10000000 requests completed in 17.25 seconds
30 parallel clients
512 bytes payload
keep alive: 1
multi-thread: yes
threads: 8
...
Summary:
throughput summary: 579642.94 requests per second
latency summary (msec):
avg min p50 p95 p99 max
0.050 0.000 0.055 0.063 0.071 387.839
SET
====== SET ======
10000000 requests completed in 25.30 seconds
30 parallel clients
512 bytes payload
keep alive: 1
multi-thread: yes
threads: 8
...
Summary:
throughput summary: 395335.03 requests per second
latency summary (msec):
avg min p50 p95 p99 max
0.073 0.000 0.071 0.103 0.127 299.007
GET
====== GET ======
10000000 requests completed in 23.54 seconds
30 parallel clients
512 bytes payload
keep alive: 1
multi-thread: yes
threads: 8
...
Summary:
throughput summary: 424826.91 requests per second
latency summary (msec):
avg min p50 p95 p99 max
0.068 0.000 0.071 0.095 0.103 308.479
LPUSH
====== LPUSH (needed to benchmark LRANGE) ======
10000000 requests completed in 27.05 seconds
30 parallel clients
512 bytes payload
keep alive: 1
multi-thread: yes
threads: 8
...
Summary:
throughput summary: 369740.44 requests per second
latency summary (msec):
avg min p50 p95 p99 max
0.079 0.000 0.079 0.111 0.119 247.295
LRANGE_100
====== LRANGE_100 (first 100 elements) ======
10000000 requests completed in 452.91 seconds
30 parallel clients
512 bytes payload
keep alive: 1
multi-thread: yes
threads: 8
...
Summary:
throughput summary: 22079.35 requests per second
latency summary (msec):
avg min p50 p95 p99 max
1.047 0.000 1.071 1.919 2.311 230.271
TCP
redis-benchmark実行パラメータ
$ ./redis-benchmark -h HOST -c 30 -n 10000000 -r 1000000000 \
--threads 8 -d 512 -t ping,set,get,lrange_100
PING_INLINE
====== PING_INLINE ======
10000000 requests completed in 49.51 seconds
30 parallel clients
512 bytes payload
keep alive: 1
host configuration "save": 3600 1 300 100 60 10000
host configuration "appendonly": no
multi-thread: yes
threads: 8
...
Summary:
throughput summary: 201987.56 requests per second
latency summary (msec):
avg min p50 p95 p99 max
0.140 0.024 0.135 0.223 0.271 2.647
PING_MBULK
====== PING_MBULK ======
10000000 requests completed in 47.58 seconds
30 parallel clients
512 bytes payload
keep alive: 1
host configuration "save": 3600 1 300 100 60 10000
host configuration "appendonly": no
multi-thread: yes
threads: 8
...
Summary:
throughput summary: 210172.33 requests per second
latency summary (msec):
avg min p50 p95 p99 max
0.135 0.024 0.135 0.215 0.263 1.327
SET
====== SET ======
10000000 requests completed in 57.10 seconds
30 parallel clients
512 bytes payload
keep alive: 1
host configuration "save": 3600 1 300 100 60 10000
host configuration "appendonly": no
multi-thread: yes
threads: 8
...
Summary:
throughput summary: 175116.02 requests per second
latency summary (msec):
avg min p50 p95 p99 max
0.162 0.032 0.159 0.247 0.295 9.639
GET
====== GET ======
10000000 requests completed in 54.60 seconds
30 parallel clients
512 bytes payload
keep alive: 1
host configuration "save": 3600 1 300 100 60 10000
host configuration "appendonly": no
multi-thread: yes
threads: 8
...
Summary:
throughput summary: 183163.59 requests per second
latency summary (msec):
avg min p50 p95 p99 max
0.155 0.024 0.151 0.231 0.287 17.839
LPUSH
====== LPUSH (needed to benchmark LRANGE) ======
10000000 requests completed in 57.62 seconds
30 parallel clients
512 bytes payload
keep alive: 1
host configuration "save": 3600 1 300 100 60 10000
host configuration "appendonly": no
multi-thread: yes
threads: 8
...
Summary:
throughput summary: 173547.84 requests per second
latency summary (msec):
avg min p50 p95 p99 max
0.164 0.024 0.159 0.255 0.311 31.311
LRANGE_100
====== LRANGE_100 (first 100 elements) ======
10000000 requests completed in 442.33 seconds
30 parallel clients
512 bytes payload
keep alive: 1
host configuration "save": 3600 1 300 100 60 10000
host configuration "appendonly": no
multi-thread: yes
threads: 8
...
Summary:
throughput summary: 22607.81 requests per second
latency summary (msec):
avg min p50 p95 p99 max
1.168 0.064 1.191 1.359 1.519 70.399
Discussion