🔖

Armv8.7のFEAT_AFPをApple M4で試す、あるいはx86とArmの浮動小数点演算の違い

に公開
3

Discussion

pikapika

Windows on Armの実機をたまたま持っていたので試してみました。(Snapdragon X Eliteです)
x64バイナリは正しくエミュレートして動いているようです。
x64バイナリを挙動ごとエミュレーションしているようです。

Armでの実行結果(FPCR.AH=0):

C:\Users\pika> arm-altfp2_arm64.exe
Not setting FPCR.AH
=== underflow ===
0x1.0000001000000p-1022 * 0x1.ffffffe000000p-1, which yields 0x1.0000000000000p-1022, raises UNDERFLOW; underflow is detected before rounding
=== FMA ===
FP_FAST_FMA is not defined
fma(0, INFINITY, NAN) = nan, raises INVALID
=== flush to zero ===
FE_UNDERFLOW is set.
0x0.0000000000000p+0
FE_UNDERFLOW is not set.
0x1.0000000000000p-1022
=== bit pattern of NaN ===
0.0 / 0.0: 0x7fc00000
=== propagation of NaN ===
qNaN(1234) + sNaN(cafe): 0x7fc0cafe
=== FNEG, FABS of NaN ===
qNaN: 0xffc01234
-qNaN: 0x7fc01234, INVALID=0
fabsf(qNaN): 0x7fc01234, INVALID=0
sNaN: 0xff80cafe
-sNaN: 0x7f80cafe, INVALID=0
fabsf(sNaN): 0x7f80cafe, INVALID=0
=== FMIN ===
fmin(0.0, -0.0) = -0 (0x80000000), INVALID=0
fmin(-0.0, 0.0) = -0 (0x80000000), INVALID=0
fmin(3.0, qNaN) = nan (0x7fc01234), INVALID=0
fmin(qNaN, 3.0) = nan (0x7fc01234), INVALID=0
fmin(qNaN(1234), sNaN(cafe)) = nan (0x7fc0cafe), INVALID=1
fmin(sNaN(cafe), qNaN(1234)) = nan (0x7fc0cafe), INVALID=1

Armでの実行結果(FPCR.AH=1):

C:\Users\pika> arm-altfp2_arm64.exe AH
FPCR.AH set.
=== underflow ===
0x1.0000001000000p-1022 * 0x1.ffffffe000000p-1, which yields 0x1.0000000000000p-1022, does not raise UNDERFLOW; underflow is detected after rounding
=== FMA ===
FP_FAST_FMA is not defined
fma(0, INFINITY, NAN) = nan, does not raise INVALID
=== flush to zero ===
FE_UNDERFLOW is set.
0x0.0000000000000p+0
FE_UNDERFLOW is set.
0x0.0000000000000p+0
=== bit pattern of NaN ===
0.0 / 0.0: 0xffc00000
=== propagation of NaN ===
qNaN(1234) + sNaN(cafe): 0x7fc01234
=== FNEG, FABS of NaN ===
qNaN: 0xffc01234
-qNaN: 0xffc01234, INVALID=0
fabsf(qNaN): 0xffc01234, INVALID=0
sNaN: 0xff80cafe
-sNaN: 0xff80cafe, INVALID=0
fabsf(sNaN): 0xff80cafe, INVALID=0
=== FMIN ===
fmin(0.0, -0.0) = -0 (0x80000000), INVALID=0
fmin(-0.0, 0.0) = 0 (0x00000000), INVALID=0
fmin(3.0, qNaN) = nan (0x7fc01234), INVALID=1
fmin(qNaN, 3.0) = 3 (0x40400000), INVALID=1
fmin(qNaN(1234), sNaN(cafe)) = nan (0x7f80cafe), INVALID=1
fmin(sNaN(cafe), qNaN(1234)) = nan (0x7fc01234), INVALID=1

x86_64での実行結果:

C:\Users\pika> arm-altfp2_x64.exe
Not AArch64
=== underflow ===
0x1.0000001000000p-1022 * 0x1.ffffffe000000p-1, which yields 0x1.0000000000000p-1022, does not raise UNDERFLOW; underflow is detected after rounding
=== FMA ===
FP_FAST_FMA is not defined
fma(0, INFINITY, NAN) = nan, does not raise INVALID
=== flush to zero ===
FE_UNDERFLOW is set.
0x0.0000000000000p+0
FE_UNDERFLOW is set.
0x0.0000000000000p+0
=== bit pattern of NaN ===
0.0 / 0.0: 0xffc00000
=== propagation of NaN ===
qNaN(1234) + sNaN(cafe): 0x7fc01234
=== FNEG, FABS of NaN ===
qNaN: 0xffc01234
-qNaN: 0x7fc01234, INVALID=0
fabsf(qNaN): 0x7fc01234, INVALID=0
sNaN: 0xff80cafe
-sNaN: 0x7f80cafe, INVALID=0
fabsf(sNaN): 0x7f80cafe, INVALID=0
=== FMIN ===
fmin(0.0, -0.0) = -0 (0x80000000), INVALID=0
fmin(-0.0, 0.0) = 0 (0x00000000), INVALID=0
fmin(3.0, qNaN) = nan (0x7fc01234), INVALID=1
fmin(qNaN, 3.0) = 3 (0x40400000), INVALID=1
fmin(qNaN(1234), sNaN(cafe)) = nan (0x7f80cafe), INVALID=1
fmin(sNaN(cafe), qNaN(1234)) = nan (0x7fc01234), INVALID=1
だめぽだめぽ

実験&報告ありがとうございます!Snapdragon X EliteはFEAT_AFPに対応しており、Windowsのエミュレーターはそれを(おそらく)活用しているということですね。

FEAT_AFPに対応していないCPUでの動作がどうなのか気になるところですが、Snapdragon X Eliteが対応しているとなると、今となってはFEAT_AFP非対応のWindows on Armマシンは入手困難かもしれませんね。

pikapika

色々と省略した上にガバガバな返信で失礼しました、返信いただきありがとうございます。

一応Microsoft SQシリーズ(Surface Pro X)やSnapdragon 8cx(ThinkPad X13s Gen1 Snapdragon)がFEAT_AFPに非対応っぽいんですが、それを確認するためだけに入手するのも現実的ではないんで難しいですね。

ここら辺の機種でも24H2ではPrismが動いているらしいんで、何か別の方法でエミュレーションしているとは思うのですが・・・