💡
Zynq(ARM Cortex-A9)上でRELIC ペアリングを動かす
ビルド方法
Rootfsに以下のパッケージ追加
- devel → gmp-dev
- cmake: project-spec\meta-user\conf\user-rootfsconfigにCONFIG_cmake追加して
- Filesystem Packages -> misc -> packagegroup-core-buildessential -> packagegroup-core-buildessential-dev
petalinux-buildしてできたrootfsを焼き直す。
zynqにログインしてIPを確認し、scpでrelicのファイルを転送する。
rootfsにファイルを追加する方法もあるんだろうけどわからない。
あとは
の手順と同じくrelicをビルドすればOK
presetはarm-pbc-bn254.shを使う。
ビルドは結構かかるので、頻繁にやるならクロスコンパイル環境作った方が楽そう。
ベンチマーク(arm-asm-254)
デフォルトのループ回数1万回だと時間がかかりすぎてタイマーがオーバーフローするのか信用できない結果になったので、-DBENCH=30オプションを指定している。
ループ回数を少なくしたことで、重い演算にはほぼ影響ないが、計時に50nsくらいのオフセットがあるようでほぼノータイムの演算(g1_nullとか)は誤差が大きい。
ペアリング一回18ms程度なので結構早い。
mclのgithubのベンチマークではmclが22msでRELICが36msとなっている。これは何の差だろうか。
- mclは 900MHz quad-core ARM Cortex-A7 on Raspberry Pi2, Linux 4.4.11-v7+
- こっちは 667MHz dual-core ARM Cortex-A9 on Zynq7020, Linux 5.15.19-xilinx-v2022.1
周波数で見るとzynqはだいぶ遅いけど、GUIがなかったりIOがほとんどつながってなかったりという部分で有利なんだろうか?
-- RELIC 0.6.0 configuration:
** Allocation mode: AUTO
** Arithmetic backend: ARM_ASM_254
** Benchmarking options:
Number of times: 900
** Multiple precision module options:
Precision: 1024 bits, 32 words
Arithmetic method: COMBA;COMBA;MONTY;SLIDE;BASIC;BASIC
** Prime field module options:
Prime size: 254 bits, 8 words
Arithmetic method: INTEG;INTEG;INTEG;MONTY;EXGCD;LOWER;SLIDE
** Prime field extension module options:
Arithmetic method: INTEG;INTEG;LAZYR
** Prime elliptic curve module options:
Arithmetic method: PROJC;LWNAF;COMBS;INTER
** Bilinear pairing module options:
Arithmetic method: LAZYR;OATEP
** Binary field module options:
Polynomial size: 283 bits, 9 words
Arithmetic method: LODAH;QUICK;QUICK;QUICK;QUICK;QUICK;EXGCD;SLIDE;QUICK
** Binary elliptic curve module options:
Arithmetic method: PROJC;LWNAF;COMBS;INTER
** Elliptic Curve Cryptography module options:
Arithmetic method: PRIME
** Edwards Curve Cryptography module options:
Arithmetic method: PROJC;LWNAF;COMBS;INTER
** Hash function module options:
Chosen method: SH256
-- Benchmarks for the PC module:
-- Curve BN-P254:
-- Group G_1:
** Utilities:
BENCH: g1_null = 65 nanosec
BENCH: g1_new = 51 nanosec
BENCH: g1_free = 47 nanosec
BENCH: g1_is_infty = 96 nanosec
BENCH: g1_set_infty = 267 nanosec
BENCH: g1_copy = 227 nanosec
BENCH: g1_cmp = 5694 nanosec
BENCH: g1_cmp (1 norm) = 3086 nanosec
BENCH: g1_cmp (2 norm) = 540 nanosec
BENCH: g1_rand = 2300579 nanosec
BENCH: g1_is_valid = 6197 nanosec
BENCH: g1_size_bin (0) = 94 nanosec
BENCH: g1_size_bin (1) = 96 nanosec
BENCH: g1_write_bin (0) = 2831 nanosec
BENCH: g1_write_bin (1) = 3304 nanosec
BENCH: g1_read_bin (0) = 10849 nanosec
BENCH: g1_read_bin (1) = 562482 nanosec
** Arithmetic:
BENCH: g1_add = 21093 nanosec
BENCH: g1_sub = 21347 nanosec
BENCH: g1_dbl = 15141 nanosec
BENCH: g1_neg = 282 nanosec
BENCH: g1_mul = 4769538 nanosec
BENCH: g1_mul_gen = 2274900 nanosec
BENCH: g1_mul_pre = 3901155 nanosec
BENCH: g1_mul_fix = 2279169 nanosec
BENCH: g1_mul_sim = 7090799 nanosec
BENCH: g1_mul_sim_gen = 6509682 nanosec
BENCH: g1_mul_dig = 1236836 nanosec
BENCH: g1_map = 3690220 nanosec
-- Group G_2:
** Utilities:
BENCH: g2_null = 55 nanosec
BENCH: g2_new = 49 nanosec
BENCH: g2_free = 50 nanosec
BENCH: g2_is_infty = 118 nanosec
BENCH: g2_set_infty = 459 nanosec
BENCH: g2_copy = 460 nanosec
BENCH: g2_cmp = 28517 nanosec
BENCH: g2_cmp (1 norm) = 566093 nanosec
BENCH: g2_cmp (2 norm) = 1127 nanosec
BENCH: g2_rand = 4608947 nanosec
BENCH: g2_is_valid = 3478230 nanosec
BENCH: g2_size_bin (0) = 606 nanosec
BENCH: g2_size_bin (1) = 673 nanosec
BENCH: g2_write_bin (0) = 5405 nanosec
BENCH: g2_write_bin (1) = 5184 nanosec
BENCH: g2_read_bin (0) = 20894 nanosec
BENCH: g2_read_bin (1) = 1923601 nanosec
** Arithmetic:
BENCH: g2_add = 59057 nanosec
BENCH: g2_sub = 59699 nanosec
BENCH: g2_dbl = 24700 nanosec
BENCH: g2_neg = 485 nanosec
BENCH: g2_mul = 5917218 nanosec
BENCH: g2_mul_gen = 4567744 nanosec
BENCH: g2_mul_pre = 13152007 nanosec
BENCH: g2_mul_fix = 4596973 nanosec
BENCH: g2_mul_sim = 9717836 nanosec
BENCH: g2_mul_sim_gen = 9701413 nanosec
BENCH: g2_mul_dig = 1780691 nanosec
BENCH: g2_map = 10097627 nanosec
-- Group G_T:
** Utilities:
BENCH: gt_null = 55 nanosec
BENCH: gt_new = 49 nanosec
BENCH: gt_free = 48 nanosec
BENCH: gt_copy = 570 nanosec
BENCH: gt_zero = 762 nanosec
BENCH: gt_set_unity = 795 nanosec
BENCH: gt_is_unity = 264 nanosec
BENCH: gt_rand = 8837284 nanosec
BENCH: gt_cmp = 176 nanosec
BENCH: gt_size_bin (0) = 77 nanosec
BENCH: gt_write_bin (0) = 13980 nanosec
BENCH: gt_read_bin (0) = 28087 nanosec
BENCH: gt_size_bin (1) = 193269 nanosec
BENCH: gt_write_bin (1) = 203744 nanosec
BENCH: gt_read_bin (1) = 610806 nanosec
BENCH: gt_is_valid = 4951563 nanosec
** Arithmetic:
BENCH: gt_mul = 70669 nanosec
BENCH: gt_sqr = 53154 nanosec
BENCH: gt_inv = 787 nanosec
BENCH: gt_exp = 8869437 nanosec
BENCH: gt_exp_gen = 8775112 nanosec
BENCH: gt_exp_sim = 20560340 nanosec
BENCH: gt_exp_dig = 2709792 nanosec
-- Pairing:
** Arithmetic:
BENCH: pc_map = 18494192 nanosec
BENCH: pc_exp = 8110233 nanosec
BENCH: pc_map_sim (2) = 24866551 nanosec
bench_fp
-- RELIC 0.6.0 configuration:
** Allocation mode: AUTO
** Arithmetic backend: ARM_ASM_254
** Benchmarking options:
Number of times: 900
** Multiple precision module options:
Precision: 1024 bits, 32 words
Arithmetic method: COMBA;COMBA;MONTY;SLIDE;BASIC;BASIC
** Prime field module options:
Prime size: 254 bits, 8 words
Arithmetic method: INTEG;INTEG;INTEG;MONTY;EXGCD;LOWER;SLIDE
** Prime field extension module options:
Arithmetic method: INTEG;INTEG;LAZYR
** Prime elliptic curve module options:
Arithmetic method: PROJC;LWNAF;COMBS;INTER
** Bilinear pairing module options:
Arithmetic method: LAZYR;OATEP
** Binary field module options:
Polynomial size: 283 bits, 9 words
Arithmetic method: LODAH;QUICK;QUICK;QUICK;QUICK;QUICK;EXGCD;SLIDE;QUICK
** Binary elliptic curve module options:
Arithmetic method: PROJC;LWNAF;COMBS;INTER
** Elliptic Curve Cryptography module options:
Arithmetic method: PRIME
** Edwards Curve Cryptography module options:
Arithmetic method: PROJC;LWNAF;COMBS;INTER
** Hash function module options:
Chosen method: SH256
-- Benchmarks for the FP module:
-- Prime modulus:
25236482 40000001 BA344D80 00000008 61210000 00000013 A7000000 00000013
-- Utilities:
BENCH: fp_null = 74 nanosec
BENCH: fp_new = 48 nanosec
BENCH: fp_free = 49 nanosec
BENCH: fp_copy = 109 nanosec
BENCH: fp_zero = 110 nanosec
BENCH: fp_is_zero = 67 nanosec
BENCH: fp_get_bit = 59 nanosec
BENCH: fp_set_bit = 53 nanosec
BENCH: fp_set_dig (1) = 126 nanosec
BENCH: fp_set_dig = 960 nanosec
BENCH: fp_bits = 79 nanosec
BENCH: fp_rand = 13950 nanosec
BENCH: fp_size_str (16) = 134044 nanosec
BENCH: fp_write_str (16) = 269051 nanosec
BENCH: fp_read_str (16) = 30961 nanosec
BENCH: fp_write_bin = 1255 nanosec
BENCH: fp_read_bin = 2500 nanosec
BENCH: fp_cmp = 119 nanosec
BENCH: fp_cmp_dig = 1044 nanosec
-- Arithmetic:
BENCH: fp_add = 108 nanosec
BENCH: fp_add_basic = 230 nanosec
BENCH: fp_add_integ = 107 nanosec
BENCH: fp_add_dig (1) = 133 nanosec
BENCH: fp_add_dig = 1043 nanosec
BENCH: fp_sub = 107 nanosec
BENCH: fp_sub_basic = 127 nanosec
BENCH: fp_sub_integ = 105 nanosec
BENCH: fp_sub_dig (1) = 136 nanosec
BENCH: fp_sub_dig = 1045 nanosec
BENCH: fp_neg = 115 nanosec
BENCH: fp_neg_basic = 133 nanosec
BENCH: fp_neg_integ = 115 nanosec
BENCH: fp_mul = 1396 nanosec
BENCH: fp_mul_basic = 1622 nanosec
BENCH: fp_mul_integ = 1403 nanosec
BENCH: fp_mul_comba = 1397 nanosec
BENCH: fp_mul_karat = 3513 nanosec
BENCH: fp_mul_dig = 2367 nanosec
BENCH: fp_sqr = 1972 nanosec
BENCH: fp_sqr_basic = 2133 nanosec
BENCH: fp_sqr_integ = 1995 nanosec
BENCH: fp_sqr_comba = 1979 nanosec
BENCH: fp_dbl = 258 nanosec
BENCH: fp_dbl_basic = 237 nanosec
BENCH: fp_dbl_integ = 260 nanosec
BENCH: fp_hlv = 167 nanosec
BENCH: fp_hlv_basic = 165 nanosec
BENCH: fp_hlv_integ = 171 nanosec
BENCH: fp_lsh = 182 nanosec
BENCH: fp_rsh = 228 nanosec
BENCH: fp_rdc = 782 nanosec
BENCH: fp_rdc_basic = 5831 nanosec
BENCH: fp_rdc_monty = 785 nanosec
BENCH: fp_rdc_monty_basic = 1025 nanosec
BENCH: fp_rdc_monty_comba = 788 nanosec
BENCH: fp_inv = 562303 nanosec
BENCH: fp_inv_basic = 546839 nanosec
BENCH: fp_inv_binar = 321260 nanosec
BENCH: fp_inv_monty = 171846 nanosec
BENCH: fp_inv_exgcd = 567145 nanosec
BENCH: fp_inv_divst = 791675 nanosec
BENCH: fp_inv_jmpds = 92568 nanosec
BENCH: fp_inv_lower = 546442 nanosec
BENCH: fp_inv_sim (2) = 591567 nanosec
BENCH: fp_smb = 543190 nanosec
BENCH: fp_smb_basic = 543579 nanosec
BENCH: fp_smb_divst = 236745 nanosec
BENCH: fp_smb_jmpds = 52920 nanosec
BENCH: fp_smb_lower = 543063 nanosec
BENCH: fp_exp = 592878 nanosec
BENCH: fp_exp_basic = 682499 nanosec
BENCH: fp_exp_slide = 592337 nanosec
BENCH: fp_exp_monty = 914035 nanosec
BENCH: fp_srt = 543272 nanosec
BENCH: fp_prime_conv = 2549 nanosec
BENCH: fp_prime_conv_dig = 963 nanosec
BENCH: fp_prime_back = 989 nanosec
ベンチマーク(gmp)
ペアリング1回24ms
-- RELIC 0.6.0 configuration:
** Allocation mode: AUTO
** Arithmetic backend: gmp
** Benchmarking options:
Number of times: 900
** Multiple precision module options:
Precision: 1024 bits, 32 words
Arithmetic method: COMBA;COMBA;MONTY;SLIDE;BASIC;BASIC
** Prime field module options:
Prime size: 254 bits, 8 words
Arithmetic method: INTEG;INTEG;INTEG;MONTY;EXGCD;LOWER;SLIDE
** Prime field extension module options:
Arithmetic method: INTEG;INTEG;LAZYR
** Prime elliptic curve module options:
Arithmetic method: PROJC;LWNAF;COMBS;INTER
** Bilinear pairing module options:
Arithmetic method: LAZYR;OATEP
** Binary field module options:
Polynomial size: 283 bits, 9 words
Arithmetic method: LODAH;QUICK;QUICK;QUICK;QUICK;QUICK;EXGCD;SLIDE;QUICK
** Binary elliptic curve module options:
Arithmetic method: PROJC;LWNAF;COMBS;INTER
** Elliptic Curve Cryptography module options:
Arithmetic method: PRIME
** Edwards Curve Cryptography module options:
Arithmetic method: PROJC;LWNAF;COMBS;INTER
** Hash function module options:
Chosen method: SH256
-- Benchmarks for the PC module:
-- Curve BN-P254:
-- Group G_1:
** Utilities:
BENCH: g1_null = 68 nanosec
BENCH: g1_new = 47 nanosec
BENCH: g1_free = 46 nanosec
BENCH: g1_is_infty = 93 nanosec
BENCH: g1_set_infty = 269 nanosec
BENCH: g1_copy = 223 nanosec
BENCH: g1_cmp = 8146 nanosec
BENCH: g1_cmp (1 norm) = 4407 nanosec
BENCH: g1_cmp (2 norm) = 529 nanosec
BENCH: g1_rand = 2940856 nanosec
BENCH: g1_is_valid = 7102 nanosec
BENCH: g1_size_bin (0) = 94 nanosec
BENCH: g1_size_bin (1) = 91 nanosec
BENCH: g1_write_bin (0) = 3507 nanosec
BENCH: g1_write_bin (1) = 3997 nanosec
BENCH: g1_read_bin (0) = 12941 nanosec
BENCH: g1_read_bin (1) = 595090 nanosec
** Arithmetic:
BENCH: g1_add = 32242 nanosec
BENCH: g1_sub = 32700 nanosec
BENCH: g1_dbl = 20072 nanosec
BENCH: g1_neg = 350 nanosec
BENCH: g1_mul = 5623237 nanosec
BENCH: g1_mul_gen = 2919083 nanosec
BENCH: g1_mul_pre = 4141693 nanosec
BENCH: g1_mul_fix = 2921158 nanosec
BENCH: g1_mul_sim = 8271932 nanosec
BENCH: g1_mul_sim_gen = 7763096 nanosec
BENCH: g1_mul_dig = 1386064 nanosec
BENCH: g1_map = 3544844 nanosec
-- Group G_2:
** Utilities:
BENCH: g2_null = 57 nanosec
BENCH: g2_new = 46 nanosec
BENCH: g2_free = 42 nanosec
BENCH: g2_is_infty = 116 nanosec
BENCH: g2_set_infty = 460 nanosec
BENCH: g2_copy = 450 nanosec
BENCH: g2_cmp = 40844 nanosec
BENCH: g2_cmp (1 norm) = 489348 nanosec
BENCH: g2_cmp (2 norm) = 1173 nanosec
BENCH: g2_rand = 6359735 nanosec
BENCH: g2_is_valid = 4360231 nanosec
BENCH: g2_size_bin (0) = 588 nanosec
BENCH: g2_size_bin (1) = 611 nanosec
BENCH: g2_write_bin (0) = 6789 nanosec
BENCH: g2_write_bin (1) = 6237 nanosec
BENCH: g2_read_bin (0) = 27976 nanosec
BENCH: g2_read_bin (1) = 1912655 nanosec
** Arithmetic:
BENCH: g2_add = 84645 nanosec
BENCH: g2_sub = 85379 nanosec
BENCH: g2_dbl = 37840 nanosec
BENCH: g2_neg = 543 nanosec
BENCH: g2_mul = 8131156 nanosec
BENCH: g2_mul_gen = 6302903 nanosec
BENCH: g2_mul_pre = 14655278 nanosec
BENCH: g2_mul_fix = 6347945 nanosec
BENCH: g2_mul_sim = 13382756 nanosec
BENCH: g2_mul_sim_gen = 13356752 nanosec
BENCH: g2_mul_dig = 2275341 nanosec
BENCH: g2_map = 10563326 nanosec
-- Group G_T:
** Utilities:
BENCH: gt_null = 53 nanosec
BENCH: gt_new = 47 nanosec
BENCH: gt_free = 44 nanosec
BENCH: gt_copy = 562 nanosec
BENCH: gt_zero = 771 nanosec
BENCH: gt_set_unity = 778 nanosec
BENCH: gt_is_unity = 257 nanosec
BENCH: gt_rand = 10977691 nanosec
BENCH: gt_cmp = 169 nanosec
BENCH: gt_size_bin (0) = 76 nanosec
BENCH: gt_write_bin (0) = 17747 nanosec
BENCH: gt_read_bin (0) = 35975 nanosec
BENCH: gt_size_bin (1) = 269147 nanosec
BENCH: gt_write_bin (1) = 281477 nanosec
BENCH: gt_read_bin (1) = 540315 nanosec
BENCH: gt_is_valid = 6292718 nanosec
** Arithmetic:
BENCH: gt_mul = 94194 nanosec
BENCH: gt_sqr = 71366 nanosec
BENCH: gt_inv = 971 nanosec
BENCH: gt_exp = 12049332 nanosec
BENCH: gt_exp_gen = 11921140 nanosec
BENCH: gt_exp_sim = 27561129 nanosec
BENCH: gt_exp_dig = 3630160 nanosec
-- Pairing:
** Arithmetic:
BENCH: pc_map = 24259299 nanosec
BENCH: pc_exp = 10329882 nanosec
BENCH: pc_map_sim (2) = 33122336 nanosec
Discussion