💡

Zynq(ARM Cortex-A9)上でRELIC ペアリングを動かす

2023/05/18に公開

ビルド方法

Rootfsに以下のパッケージ追加

  • devel → gmp-dev
  • cmake: project-spec\meta-user\conf\user-rootfsconfigにCONFIG_cmake追加して
  • Filesystem Packages -> misc -> packagegroup-core-buildessential -> packagegroup-core-buildessential-dev

petalinux-buildしてできたrootfsを焼き直す。

zynqにログインしてIPを確認し、scpでrelicのファイルを転送する。
rootfsにファイルを追加する方法もあるんだろうけどわからない。
あとは
https://zenn.dev/articles/d280188ad27d60/edit
の手順と同じくrelicをビルドすればOK
presetはarm-pbc-bn254.shを使う。
ビルドは結構かかるので、頻繁にやるならクロスコンパイル環境作った方が楽そう。

ベンチマーク(arm-asm-254)

デフォルトのループ回数1万回だと時間がかかりすぎてタイマーがオーバーフローするのか信用できない結果になったので、-DBENCH=30オプションを指定している。
ループ回数を少なくしたことで、重い演算にはほぼ影響ないが、計時に50nsくらいのオフセットがあるようでほぼノータイムの演算(g1_nullとか)は誤差が大きい。

ペアリング一回18ms程度なので結構早い。
mclのgithubのベンチマークではmclが22msでRELICが36msとなっている。これは何の差だろうか。

  • mclは 900MHz quad-core ARM Cortex-A7 on Raspberry Pi2, Linux 4.4.11-v7+
  • こっちは 667MHz dual-core ARM Cortex-A9 on Zynq7020, Linux 5.15.19-xilinx-v2022.1

周波数で見るとzynqはだいぶ遅いけど、GUIがなかったりIOがほとんどつながってなかったりという部分で有利なんだろうか?

-- RELIC 0.6.0 configuration:

** Allocation mode: AUTO

** Arithmetic backend: ARM_ASM_254

** Benchmarking options:
   Number of times: 900

** Multiple precision module options:
   Precision: 1024 bits, 32 words
   Arithmetic method: COMBA;COMBA;MONTY;SLIDE;BASIC;BASIC

** Prime field module options:
   Prime size: 254 bits, 8 words
   Arithmetic method: INTEG;INTEG;INTEG;MONTY;EXGCD;LOWER;SLIDE

** Prime field extension module options:
   Arithmetic method: INTEG;INTEG;LAZYR

** Prime elliptic curve module options:
   Arithmetic method: PROJC;LWNAF;COMBS;INTER

** Bilinear pairing module options:
   Arithmetic method: LAZYR;OATEP

** Binary field module options:
   Polynomial size: 283 bits, 9 words
   Arithmetic method: LODAH;QUICK;QUICK;QUICK;QUICK;QUICK;EXGCD;SLIDE;QUICK

** Binary elliptic curve module options:
   Arithmetic method: PROJC;LWNAF;COMBS;INTER

** Elliptic Curve Cryptography module options:
   Arithmetic method: PRIME

** Edwards Curve Cryptography module options:
   Arithmetic method: PROJC;LWNAF;COMBS;INTER

** Hash function module options:
   Chosen method: SH256


-- Benchmarks for the PC module:

-- Curve BN-P254:

-- Group G_1:

** Utilities:

BENCH: g1_null                          = 65 nanosec
BENCH: g1_new                           = 51 nanosec
BENCH: g1_free                          = 47 nanosec
BENCH: g1_is_infty                      = 96 nanosec
BENCH: g1_set_infty                     = 267 nanosec
BENCH: g1_copy                          = 227 nanosec
BENCH: g1_cmp                           = 5694 nanosec
BENCH: g1_cmp (1 norm)                  = 3086 nanosec
BENCH: g1_cmp (2 norm)                  = 540 nanosec
BENCH: g1_rand                          = 2300579 nanosec
BENCH: g1_is_valid                      = 6197 nanosec
BENCH: g1_size_bin (0)                  = 94 nanosec
BENCH: g1_size_bin (1)                  = 96 nanosec
BENCH: g1_write_bin (0)                 = 2831 nanosec
BENCH: g1_write_bin (1)                 = 3304 nanosec
BENCH: g1_read_bin (0)                  = 10849 nanosec
BENCH: g1_read_bin (1)                  = 562482 nanosec

** Arithmetic:

BENCH: g1_add                           = 21093 nanosec
BENCH: g1_sub                           = 21347 nanosec
BENCH: g1_dbl                           = 15141 nanosec
BENCH: g1_neg                           = 282 nanosec
BENCH: g1_mul                           = 4769538 nanosec
BENCH: g1_mul_gen                       = 2274900 nanosec
BENCH: g1_mul_pre                       = 3901155 nanosec
BENCH: g1_mul_fix                       = 2279169 nanosec
BENCH: g1_mul_sim                       = 7090799 nanosec
BENCH: g1_mul_sim_gen                   = 6509682 nanosec
BENCH: g1_mul_dig                       = 1236836 nanosec
BENCH: g1_map                           = 3690220 nanosec

-- Group G_2:

** Utilities:

BENCH: g2_null                          = 55 nanosec
BENCH: g2_new                           = 49 nanosec
BENCH: g2_free                          = 50 nanosec
BENCH: g2_is_infty                      = 118 nanosec
BENCH: g2_set_infty                     = 459 nanosec
BENCH: g2_copy                          = 460 nanosec
BENCH: g2_cmp                           = 28517 nanosec
BENCH: g2_cmp (1 norm)                  = 566093 nanosec
BENCH: g2_cmp (2 norm)                  = 1127 nanosec
BENCH: g2_rand                          = 4608947 nanosec
BENCH: g2_is_valid                      = 3478230 nanosec
BENCH: g2_size_bin (0)                  = 606 nanosec
BENCH: g2_size_bin (1)                  = 673 nanosec
BENCH: g2_write_bin (0)                 = 5405 nanosec
BENCH: g2_write_bin (1)                 = 5184 nanosec
BENCH: g2_read_bin (0)                  = 20894 nanosec
BENCH: g2_read_bin (1)                  = 1923601 nanosec

** Arithmetic:

BENCH: g2_add                           = 59057 nanosec
BENCH: g2_sub                           = 59699 nanosec
BENCH: g2_dbl                           = 24700 nanosec
BENCH: g2_neg                           = 485 nanosec
BENCH: g2_mul                           = 5917218 nanosec
BENCH: g2_mul_gen                       = 4567744 nanosec
BENCH: g2_mul_pre                       = 13152007 nanosec
BENCH: g2_mul_fix                       = 4596973 nanosec
BENCH: g2_mul_sim                       = 9717836 nanosec
BENCH: g2_mul_sim_gen                   = 9701413 nanosec
BENCH: g2_mul_dig                       = 1780691 nanosec
BENCH: g2_map                           = 10097627 nanosec

-- Group G_T:

** Utilities:

BENCH: gt_null                          = 55 nanosec
BENCH: gt_new                           = 49 nanosec
BENCH: gt_free                          = 48 nanosec
BENCH: gt_copy                          = 570 nanosec
BENCH: gt_zero                          = 762 nanosec
BENCH: gt_set_unity                     = 795 nanosec
BENCH: gt_is_unity                      = 264 nanosec
BENCH: gt_rand                          = 8837284 nanosec
BENCH: gt_cmp                           = 176 nanosec
BENCH: gt_size_bin (0)                  = 77 nanosec
BENCH: gt_write_bin (0)                 = 13980 nanosec
BENCH: gt_read_bin (0)                  = 28087 nanosec
BENCH: gt_size_bin (1)                  = 193269 nanosec
BENCH: gt_write_bin (1)                 = 203744 nanosec
BENCH: gt_read_bin (1)                  = 610806 nanosec
BENCH: gt_is_valid                      = 4951563 nanosec

** Arithmetic:

BENCH: gt_mul                           = 70669 nanosec
BENCH: gt_sqr                           = 53154 nanosec
BENCH: gt_inv                           = 787 nanosec
BENCH: gt_exp                           = 8869437 nanosec
BENCH: gt_exp_gen                       = 8775112 nanosec
BENCH: gt_exp_sim                       = 20560340 nanosec
BENCH: gt_exp_dig                       = 2709792 nanosec

-- Pairing:

** Arithmetic:

BENCH: pc_map                           = 18494192 nanosec
BENCH: pc_exp                           = 8110233 nanosec
BENCH: pc_map_sim (2)                   = 24866551 nanosec
bench_fp
-- RELIC 0.6.0 configuration:

** Allocation mode: AUTO

** Arithmetic backend: ARM_ASM_254

** Benchmarking options:
   Number of times: 900

** Multiple precision module options:
   Precision: 1024 bits, 32 words
   Arithmetic method: COMBA;COMBA;MONTY;SLIDE;BASIC;BASIC

** Prime field module options:
   Prime size: 254 bits, 8 words
   Arithmetic method: INTEG;INTEG;INTEG;MONTY;EXGCD;LOWER;SLIDE

** Prime field extension module options:
   Arithmetic method: INTEG;INTEG;LAZYR

** Prime elliptic curve module options:
   Arithmetic method: PROJC;LWNAF;COMBS;INTER

** Bilinear pairing module options:
   Arithmetic method: LAZYR;OATEP

** Binary field module options:
   Polynomial size: 283 bits, 9 words
   Arithmetic method: LODAH;QUICK;QUICK;QUICK;QUICK;QUICK;EXGCD;SLIDE;QUICK

** Binary elliptic curve module options:
   Arithmetic method: PROJC;LWNAF;COMBS;INTER

** Elliptic Curve Cryptography module options:
   Arithmetic method: PRIME

** Edwards Curve Cryptography module options:
   Arithmetic method: PROJC;LWNAF;COMBS;INTER

** Hash function module options:
   Chosen method: SH256


-- Benchmarks for the FP module:

-- Prime modulus:
   25236482 40000001 BA344D80 00000008 61210000 00000013 A7000000 00000013

-- Utilities:

BENCH: fp_null                          = 74 nanosec
BENCH: fp_new                           = 48 nanosec
BENCH: fp_free                          = 49 nanosec
BENCH: fp_copy                          = 109 nanosec
BENCH: fp_zero                          = 110 nanosec
BENCH: fp_is_zero                       = 67 nanosec
BENCH: fp_get_bit                       = 59 nanosec
BENCH: fp_set_bit                       = 53 nanosec
BENCH: fp_set_dig (1)                   = 126 nanosec
BENCH: fp_set_dig                       = 960 nanosec
BENCH: fp_bits                          = 79 nanosec
BENCH: fp_rand                          = 13950 nanosec
BENCH: fp_size_str (16)                 = 134044 nanosec
BENCH: fp_write_str (16)                = 269051 nanosec
BENCH: fp_read_str (16)                 = 30961 nanosec
BENCH: fp_write_bin                     = 1255 nanosec
BENCH: fp_read_bin                      = 2500 nanosec
BENCH: fp_cmp                           = 119 nanosec
BENCH: fp_cmp_dig                       = 1044 nanosec

-- Arithmetic:

BENCH: fp_add                           = 108 nanosec
BENCH: fp_add_basic                     = 230 nanosec
BENCH: fp_add_integ                     = 107 nanosec
BENCH: fp_add_dig (1)                   = 133 nanosec
BENCH: fp_add_dig                       = 1043 nanosec
BENCH: fp_sub                           = 107 nanosec
BENCH: fp_sub_basic                     = 127 nanosec
BENCH: fp_sub_integ                     = 105 nanosec
BENCH: fp_sub_dig (1)                   = 136 nanosec
BENCH: fp_sub_dig                       = 1045 nanosec
BENCH: fp_neg                           = 115 nanosec
BENCH: fp_neg_basic                     = 133 nanosec
BENCH: fp_neg_integ                     = 115 nanosec
BENCH: fp_mul                           = 1396 nanosec
BENCH: fp_mul_basic                     = 1622 nanosec
BENCH: fp_mul_integ                     = 1403 nanosec
BENCH: fp_mul_comba                     = 1397 nanosec
BENCH: fp_mul_karat                     = 3513 nanosec
BENCH: fp_mul_dig                       = 2367 nanosec
BENCH: fp_sqr                           = 1972 nanosec
BENCH: fp_sqr_basic                     = 2133 nanosec
BENCH: fp_sqr_integ                     = 1995 nanosec
BENCH: fp_sqr_comba                     = 1979 nanosec
BENCH: fp_dbl                           = 258 nanosec
BENCH: fp_dbl_basic                     = 237 nanosec
BENCH: fp_dbl_integ                     = 260 nanosec
BENCH: fp_hlv                           = 167 nanosec
BENCH: fp_hlv_basic                     = 165 nanosec
BENCH: fp_hlv_integ                     = 171 nanosec
BENCH: fp_lsh                           = 182 nanosec
BENCH: fp_rsh                           = 228 nanosec
BENCH: fp_rdc                           = 782 nanosec
BENCH: fp_rdc_basic                     = 5831 nanosec
BENCH: fp_rdc_monty                     = 785 nanosec
BENCH: fp_rdc_monty_basic               = 1025 nanosec
BENCH: fp_rdc_monty_comba               = 788 nanosec
BENCH: fp_inv                           = 562303 nanosec
BENCH: fp_inv_basic                     = 546839 nanosec
BENCH: fp_inv_binar                     = 321260 nanosec
BENCH: fp_inv_monty                     = 171846 nanosec
BENCH: fp_inv_exgcd                     = 567145 nanosec
BENCH: fp_inv_divst                     = 791675 nanosec
BENCH: fp_inv_jmpds                     = 92568 nanosec
BENCH: fp_inv_lower                     = 546442 nanosec
BENCH: fp_inv_sim (2)                   = 591567 nanosec
BENCH: fp_smb                           = 543190 nanosec
BENCH: fp_smb_basic                     = 543579 nanosec
BENCH: fp_smb_divst                     = 236745 nanosec
BENCH: fp_smb_jmpds                     = 52920 nanosec
BENCH: fp_smb_lower                     = 543063 nanosec
BENCH: fp_exp                           = 592878 nanosec
BENCH: fp_exp_basic                     = 682499 nanosec
BENCH: fp_exp_slide                     = 592337 nanosec
BENCH: fp_exp_monty                     = 914035 nanosec
BENCH: fp_srt                           = 543272 nanosec
BENCH: fp_prime_conv                    = 2549 nanosec
BENCH: fp_prime_conv_dig                = 963 nanosec
BENCH: fp_prime_back                    = 989 nanosec

ベンチマーク(gmp)

ペアリング1回24ms

-- RELIC 0.6.0 configuration:

** Allocation mode: AUTO

** Arithmetic backend: gmp

** Benchmarking options:
   Number of times: 900

** Multiple precision module options:
   Precision: 1024 bits, 32 words
   Arithmetic method: COMBA;COMBA;MONTY;SLIDE;BASIC;BASIC

** Prime field module options:
   Prime size: 254 bits, 8 words
   Arithmetic method: INTEG;INTEG;INTEG;MONTY;EXGCD;LOWER;SLIDE

** Prime field extension module options:
   Arithmetic method: INTEG;INTEG;LAZYR

** Prime elliptic curve module options:
   Arithmetic method: PROJC;LWNAF;COMBS;INTER

** Bilinear pairing module options:
   Arithmetic method: LAZYR;OATEP

** Binary field module options:
   Polynomial size: 283 bits, 9 words
   Arithmetic method: LODAH;QUICK;QUICK;QUICK;QUICK;QUICK;EXGCD;SLIDE;QUICK

** Binary elliptic curve module options:
   Arithmetic method: PROJC;LWNAF;COMBS;INTER

** Elliptic Curve Cryptography module options:
   Arithmetic method: PRIME

** Edwards Curve Cryptography module options:
   Arithmetic method: PROJC;LWNAF;COMBS;INTER

** Hash function module options:
   Chosen method: SH256


-- Benchmarks for the PC module:

-- Curve BN-P254:

-- Group G_1:

** Utilities:

BENCH: g1_null                          = 68 nanosec
BENCH: g1_new                           = 47 nanosec
BENCH: g1_free                          = 46 nanosec
BENCH: g1_is_infty                      = 93 nanosec
BENCH: g1_set_infty                     = 269 nanosec
BENCH: g1_copy                          = 223 nanosec
BENCH: g1_cmp                           = 8146 nanosec
BENCH: g1_cmp (1 norm)                  = 4407 nanosec
BENCH: g1_cmp (2 norm)                  = 529 nanosec
BENCH: g1_rand                          = 2940856 nanosec
BENCH: g1_is_valid                      = 7102 nanosec
BENCH: g1_size_bin (0)                  = 94 nanosec
BENCH: g1_size_bin (1)                  = 91 nanosec
BENCH: g1_write_bin (0)                 = 3507 nanosec
BENCH: g1_write_bin (1)                 = 3997 nanosec
BENCH: g1_read_bin (0)                  = 12941 nanosec
BENCH: g1_read_bin (1)                  = 595090 nanosec

** Arithmetic:

BENCH: g1_add                           = 32242 nanosec
BENCH: g1_sub                           = 32700 nanosec
BENCH: g1_dbl                           = 20072 nanosec
BENCH: g1_neg                           = 350 nanosec
BENCH: g1_mul                           = 5623237 nanosec
BENCH: g1_mul_gen                       = 2919083 nanosec
BENCH: g1_mul_pre                       = 4141693 nanosec
BENCH: g1_mul_fix                       = 2921158 nanosec
BENCH: g1_mul_sim                       = 8271932 nanosec
BENCH: g1_mul_sim_gen                   = 7763096 nanosec
BENCH: g1_mul_dig                       = 1386064 nanosec
BENCH: g1_map                           = 3544844 nanosec

-- Group G_2:

** Utilities:

BENCH: g2_null                          = 57 nanosec
BENCH: g2_new                           = 46 nanosec
BENCH: g2_free                          = 42 nanosec
BENCH: g2_is_infty                      = 116 nanosec
BENCH: g2_set_infty                     = 460 nanosec
BENCH: g2_copy                          = 450 nanosec
BENCH: g2_cmp                           = 40844 nanosec
BENCH: g2_cmp (1 norm)                  = 489348 nanosec
BENCH: g2_cmp (2 norm)                  = 1173 nanosec
BENCH: g2_rand                          = 6359735 nanosec
BENCH: g2_is_valid                      = 4360231 nanosec
BENCH: g2_size_bin (0)                  = 588 nanosec
BENCH: g2_size_bin (1)                  = 611 nanosec
BENCH: g2_write_bin (0)                 = 6789 nanosec
BENCH: g2_write_bin (1)                 = 6237 nanosec
BENCH: g2_read_bin (0)                  = 27976 nanosec
BENCH: g2_read_bin (1)                  = 1912655 nanosec

** Arithmetic:

BENCH: g2_add                           = 84645 nanosec
BENCH: g2_sub                           = 85379 nanosec
BENCH: g2_dbl                           = 37840 nanosec
BENCH: g2_neg                           = 543 nanosec
BENCH: g2_mul                           = 8131156 nanosec
BENCH: g2_mul_gen                       = 6302903 nanosec
BENCH: g2_mul_pre                       = 14655278 nanosec
BENCH: g2_mul_fix                       = 6347945 nanosec
BENCH: g2_mul_sim                       = 13382756 nanosec
BENCH: g2_mul_sim_gen                   = 13356752 nanosec
BENCH: g2_mul_dig                       = 2275341 nanosec
BENCH: g2_map                           = 10563326 nanosec

-- Group G_T:

** Utilities:

BENCH: gt_null                          = 53 nanosec
BENCH: gt_new                           = 47 nanosec
BENCH: gt_free                          = 44 nanosec
BENCH: gt_copy                          = 562 nanosec
BENCH: gt_zero                          = 771 nanosec
BENCH: gt_set_unity                     = 778 nanosec
BENCH: gt_is_unity                      = 257 nanosec
BENCH: gt_rand                          = 10977691 nanosec
BENCH: gt_cmp                           = 169 nanosec
BENCH: gt_size_bin (0)                  = 76 nanosec
BENCH: gt_write_bin (0)                 = 17747 nanosec
BENCH: gt_read_bin (0)                  = 35975 nanosec
BENCH: gt_size_bin (1)                  = 269147 nanosec
BENCH: gt_write_bin (1)                 = 281477 nanosec
BENCH: gt_read_bin (1)                  = 540315 nanosec
BENCH: gt_is_valid                      = 6292718 nanosec

** Arithmetic:

BENCH: gt_mul                           = 94194 nanosec
BENCH: gt_sqr                           = 71366 nanosec
BENCH: gt_inv                           = 971 nanosec
BENCH: gt_exp                           = 12049332 nanosec
BENCH: gt_exp_gen                       = 11921140 nanosec
BENCH: gt_exp_sim                       = 27561129 nanosec
BENCH: gt_exp_dig                       = 3630160 nanosec

-- Pairing:

** Arithmetic:

BENCH: pc_map                           = 24259299 nanosec
BENCH: pc_exp                           = 10329882 nanosec
BENCH: pc_map_sim (2)                   = 33122336 nanosec

Discussion