🚅

Valkey RDMAの基礎性能測定

2024/11/06に公開

Redis

サマリ

valkeyがExperimentalでRDMA対応ということで、動かしてみた
RDMAを利用する場合、通常(TCP)の場合と比較して2倍強のThroughput、Latencyも50%以上削減
この差異がどこに起因するものなのか、RDMA実装を継続調査（予定）

情報源

Valkey 8.0: Delivering Enhanced Performance and ReliabilityここのLinkにある#477参照
- Server側Setup
- redis-benchmark(RDMA対応Setup)

RDMA対応 Valkey Server Setup

$ git clone https://github.com/valkey-io/valkey.git
$ cd valkey
$ git checkout 8.0.1
$ make BUILD_RDMA=module

RDMA対応 redis-benchmark Setup

$ git clone https://github.com/pizhenwei/redis.git
$ cd redis
$ git checkout -b feature-rdma-with-cli origin/feature-rdma-with-cli
$ make BUILD_RDMA=yes -j16

測定

構成

	Server	Client
CPU	Intel(R) Core(TM) i5-9500 CPU @ 3.00GHz	AMD Ryzen 9 5950X 16-Core Processor
memory	32GB	128GB
NIC	Mellanox Technologies ConnectX®-5 EN network interface card, 10/25GbE dual-port SFP28, PCIe3.0 x8, tall bracket; MCX512A-ACAT	Mellanox Technologies ConnectX-4 Lx Stand-up dual-port 10GbE MCX4121A-XCAT

Valkey Server起動設定

起動設定

./src/valkey-server --port 6379   \\
  --loadmodule src/valkey-rdma.so port=6380 bind=0.0.0.0 \\
  --protected-mode no --appendonly no --maxmemory 20gb

起動後設定

./src/valkey-cli
> CONFIG SET maxmemory-policy volatile-lru

RDMA

redis-benchmark実行パラメータ

./src/redis-benchmark -h 192.168.248.229 -p 6380 -c 30 -n 10000000 -r 1000000 --threads 8 -d 512 -t ping,set,get,lrange_100 --rdma

PING_INLINE

====== PING_INLINE ======                                                     
  10000000 requests completed in 19.25 seconds
  30 parallel clients
  512 bytes payload
  keep alive: 1
  multi-thread: yes
  threads: 8
...
Summary:
  throughput summary: 519426.53 requests per second
  latency summary (msec):
          avg       min       p50       p95       p99       max
        0.056     0.000     0.055     0.079     0.087   374.271

PING_MBULK

====== PING_MBULK ======                                                     
  10000000 requests completed in 17.25 seconds
  30 parallel clients
  512 bytes payload
  keep alive: 1
  multi-thread: yes
  threads: 8
...
Summary:
  throughput summary: 579642.94 requests per second
  latency summary (msec):
          avg       min       p50       p95       p99       max
        0.050     0.000     0.055     0.063     0.071   387.839

SET

====== SET ======                                                     
  10000000 requests completed in 25.30 seconds
  30 parallel clients
  512 bytes payload
  keep alive: 1
  multi-thread: yes
  threads: 8
...
Summary:
  throughput summary: 395335.03 requests per second
  latency summary (msec):
          avg       min       p50       p95       p99       max
        0.073     0.000     0.071     0.103     0.127   299.007

GET

====== GET ======                                                     
  10000000 requests completed in 23.54 seconds
  30 parallel clients
  512 bytes payload
  keep alive: 1
  multi-thread: yes
  threads: 8
...
Summary:
  throughput summary: 424826.91 requests per second
  latency summary (msec):
          avg       min       p50       p95       p99       max
        0.068     0.000     0.071     0.095     0.103   308.479

LPUSH

====== LPUSH (needed to benchmark LRANGE) ======                                                     
  10000000 requests completed in 27.05 seconds
  30 parallel clients
  512 bytes payload
  keep alive: 1
  multi-thread: yes
  threads: 8
...
Summary:
  throughput summary: 369740.44 requests per second
  latency summary (msec):
          avg       min       p50       p95       p99       max
        0.079     0.000     0.079     0.111     0.119   247.295

LRANGE_100

====== LRANGE_100 (first 100 elements) ======                                                    
  10000000 requests completed in 452.91 seconds
  30 parallel clients
  512 bytes payload
  keep alive: 1
  multi-thread: yes
  threads: 8
...
Summary:
  throughput summary: 22079.35 requests per second
  latency summary (msec):
          avg       min       p50       p95       p99       max
        1.047     0.000     1.071     1.919     2.311   230.271

TCP

redis-benchmark実行パラメータ

$ ./redis-benchmark -h HOST -c 30 -n 10000000 -r 1000000000 \
    --threads 8 -d 512 -t ping,set,get,lrange_100

PING_INLINE

====== PING_INLINE ======                                                     
  10000000 requests completed in 49.51 seconds
  30 parallel clients
  512 bytes payload
  keep alive: 1
  host configuration "save": 3600 1 300 100 60 10000
  host configuration "appendonly": no
  multi-thread: yes
  threads: 8
...
Summary:
  throughput summary: 201987.56 requests per second
  latency summary (msec):
          avg       min       p50       p95       p99       max
        0.140     0.024     0.135     0.223     0.271     2.647

PING_MBULK

====== PING_MBULK ======                                                     
  10000000 requests completed in 47.58 seconds
  30 parallel clients
  512 bytes payload
  keep alive: 1
  host configuration "save": 3600 1 300 100 60 10000
  host configuration "appendonly": no
  multi-thread: yes
  threads: 8
...
Summary:
  throughput summary: 210172.33 requests per second
  latency summary (msec):
          avg       min       p50       p95       p99       max
        0.135     0.024     0.135     0.215     0.263     1.327

SET

====== SET ======                                                     
  10000000 requests completed in 57.10 seconds
  30 parallel clients
  512 bytes payload
  keep alive: 1
  host configuration "save": 3600 1 300 100 60 10000
  host configuration "appendonly": no
  multi-thread: yes
  threads: 8
...
Summary:
  throughput summary: 175116.02 requests per second
  latency summary (msec):
          avg       min       p50       p95       p99       max
        0.162     0.032     0.159     0.247     0.295     9.639

GET

====== GET ======                                                     
  10000000 requests completed in 54.60 seconds
  30 parallel clients
  512 bytes payload
  keep alive: 1
  host configuration "save": 3600 1 300 100 60 10000
  host configuration "appendonly": no
  multi-thread: yes
  threads: 8
...
Summary:
  throughput summary: 183163.59 requests per second
  latency summary (msec):
          avg       min       p50       p95       p99       max
        0.155     0.024     0.151     0.231     0.287    17.839

LPUSH

====== LPUSH (needed to benchmark LRANGE) ======                                                     
  10000000 requests completed in 57.62 seconds
  30 parallel clients
  512 bytes payload
  keep alive: 1
  host configuration "save": 3600 1 300 100 60 10000
  host configuration "appendonly": no
  multi-thread: yes
  threads: 8
...
Summary:
  throughput summary: 173547.84 requests per second
  latency summary (msec):
          avg       min       p50       p95       p99       max
        0.164     0.024     0.159     0.255     0.311    31.311

LRANGE_100

====== LRANGE_100 (first 100 elements) ======                                                   
  10000000 requests completed in 442.33 seconds
  30 parallel clients
  512 bytes payload
  keep alive: 1
  host configuration "save": 3600 1 300 100 60 10000
  host configuration "appendonly": no
  multi-thread: yes
  threads: 8
...
Summary:
  throughput summary: 22607.81 requests per second
  latency summary (msec):
          avg       min       p50       p95       p99       max
        1.168     0.064     1.191     1.359     1.519    70.399

サマリ

情報源

RDMA対応 Valkey Server Setup

RDMA対応 redis-benchmark Setup

測定

構成

Valkey Server起動設定

RDMA

PING_INLINE

PING_MBULK

SET

GET

LPUSH

LRANGE_100

TCP

PING_INLINE

PING_MBULK

SET

GET

LPUSH

LRANGE_100

Discussion