Accuracy of FSIN and other x87 trigonometric instructions on AMD processors
Asked Answered
C

1

7

On Intel processors, x87 trigonometric instructions such as FSIN have limited accuracy due to the use of a 66-bit approximation of pi even though the computation itself is otherwise accurate to the full 64-bit mantissa of an 80-bit extended-precision floating-point value. (Full accuracy for all valid inputs requires a 128-bit approximation of pi.) The omission in Intel's documentation was corrected after the issue was brought to their attention.

However, I cannot find similarly detailed information about the accuracy of AMD's implementation of x87 trigonometric instructions beyond this mention in the AMD64 Architecture Programmer's Manual, Volume 1:

6.4.5.1 Accuracy of Transcendental Results

x87 computations are carried out in double-extended-precision format, so that the transcendental functions provide results accurate to within one unit in the last place (ulp) for each of the floating-point data types.

Is AMD's implementation of x87 trigonometric instructions actually fully accurate to within one ULP in extended-precision format for all valid inputs, including a 128-bit or better approximation of pi? An answer that pertains to the Zen and Zen 2 architectures (Ryzen and EPYC) would be ideal.

Colonize answered 17/2, 2020 at 16:49 Comment(4)
For older AMD CPUs, I provided an answer here. Note that this question in its current form is about "general computing hardware" and thus technically off-topic here.Whopping
There's a number of other questions about floating-point accuracy and they do pertain to programming (x87 floating-point) on specific processors so AIUI this question is on-topic.Colonize
Test 6381956970095103•2^798, a.k.a. +0x1.6AC5B262CA1FFp850. That is the IEEE-754 binary64 value greater than 4 that is closest to a multiple of π. Its sine is around −2^−60 (closest binary64 is -0x1.14AE72E6BA22Fp-60). If argument reduction is being done with less than a 900-bit value for π, that should reveal it. If AMD has a smaller supported range for FSIN than the entire binary64 finite range, let me know and I will search for the worst case inside it. (FYI, macOS sin returns --0x1.14AE72E6BA22Fp-60.)Undressed
I just ran the utility here on a laptop with a Ryzen 7 2700U and it looks like AMD Zen indeed uses 66-bit pi, because the output of fpuaccuracy examples for the "sin near pi" case is the same as the example output supplied with the program, which was run on an Intel Core i7-2600 (Sandy Bridge). I'm presently away from my Zen 2-based desktop (Ryzen 9 3950X) but will test it when I get a chance. I doubt the result is going to be any different, though.Colonize
C
8

I found a program located at http://notabs.org/fpuaccuracy/ (direct download link; GPLv3) designed to test the accuracy of x87 trigonometric instructions. The reference output for fpuaccuracy examples supplied with the program, generated using an Intel Core i7-2600 (Sandy Bridge), is as follows:

sin with smallest failing argument
argument   4000 C10A 7DC0 DC46 D753   (decimal 3.0162653335001840718)
actual     3FFB FFFF BBF1 3588 24AF   (decimal 0.1249994929300478145)
x87 fpu    3FFB FFFF BBF1 3588 24AE   (decimal 0.12499949293004781449)
error      -1.0002171407788819287 ulp

sin near pi
argument   4000 C90F DAA2 2168 C235   (decimal 3.1415926535897932385)
actual     BFBE ECE6 75D1 FC8F 8CBB   (decimal -5.0165576126683320235E-20)
x87 fpu    BFBF 8000 0000 0000 0000   (decimal -5.42101086242752217E-20)
error      -1376283091369227076.6 ulp

sin with large argument
argument   403D FFFF FFFF 2D2A 9042   (decimal 9223372035086174241)
actual     BFDF E730 CF55 1180 63F3   (decimal -4.2053336735954077951E-10)
x87 fpu    BFF8 C28B 4641 7452 B463   (decimal -0.011874025925697012908)
error      -4.7037861121081250351E+26 ulp

cos with smallest failing argument
argument   3FFF C10E 8AC0 BFEB 5E80   (decimal 1.5082562867317745453)
actual     3FFA FFFF 3EA3 D2D7 355B   (decimal 0.062499279677629184442)
x87 fpu    3FFA FFFF 3EA3 D2D7 355A   (decimal 0.062499279677629184438)
error      -1.005468872258621479 ulp

cos near pi/2
argument   3FFF C90F DAA2 2168 C235   (decimal 1.5707963267948966193)
actual     BFBD ECE6 75D1 FC8F 8CBB   (decimal -2.5082788063341660117E-20)
x87 fpu    BFBE 8000 0000 0000 0000   (decimal -2.710505431213761085E-20)
error      -1376283091369227076.6 ulp

cos with large argument
argument   403D FFFF FFFF 6CE1 B432   (decimal 9223372035620657689)
actual     3FDD DFD2 E369 AE25 7E4A   (decimal 1.0178327217734091432E-10)
x87 fpu    BFF8 C28B 45B2 1490 D117   (decimal -0.011874025404105249357)
error      -1.8815144449581111989E+27 ulp

tan with smallest failing argument
argument   3FFF B8B5 07B4 294A BD53   (decimal 1.4430245999997931928)
actual     4001 F915 0EE5 BAC8 446C   (decimal 7.7838205801874740721)
x87 fpu    4001 F915 0EE5 BAC8 446D   (decimal 7.7838205801874740726)
error      1.0017725812707024772 ulp

tan near pi/2
argument   3FFF C90F DAA2 2168 C235   (decimal 1.5707963267948966193)
actual     C040 8A51 E04D AABD A35F   (decimal -39867976298117107068)
x87 fpu    C040 8000 0000 0000 0000   (decimal -36893488147419103232)
error      743622037674500958.81 ulp

tan with large argument
argument   403D FFFF FFFF DCF6 FE38   (decimal 9223372036560879388)
actual     4005 A86C 499C 14EA BD4A   (decimal 84.211499097398127292)
x87 fpu    401F C10C D618 50D5 E957   (decimal 6477687856.6315280604)
error      9.3353319161898434351E+26 ulp

When run on a laptop with an AMD Ryzen 7 2700U (Zen), I get the following:

sin with smallest failing argument
argument   4000 C10A 7DC0 DC46 D753   (decimal 3.0162653335001840718)
actual     3FFB FFFF BBF1 3588 24AF   (decimal 0.1249994929300478145)
x87 fpu    3FFB FFFF BBF1 3588 24AE   (decimal 0.12499949293004781449)
error      -1.0002171407788819287 ulp

sin near pi
argument   4000 C90F DAA2 2168 C235   (decimal 3.1415926535897932385)
actual     BFBE ECE6 75D1 FC8F 8CBB   (decimal -5.0165576126683320235E-20)
x87 fpu    BFBF 8000 0000 0000 0000   (decimal -5.42101086242752217E-20)
error      -1376283091369227076.6 ulp

sin with large argument
argument   403D FFFF FFFF 2D2A 9042   (decimal 9223372035086174241)
actual     BFDF E730 CF55 1180 63F3   (decimal -4.2053336735954077951E-10)
x87 fpu    BFF8 C28B 4641 7452 B463   (decimal -0.011874025925697012908)
error      -4.7037861121081250351E+26 ulp

cos with smallest failing argument
argument   3FFF C10E 8AC0 BFEB 5E80   (decimal 1.5082562867317745453)
actual     3FFA FFFF 3EA3 D2D7 355B   (decimal 0.062499279677629184442)
x87 fpu    3FFA FFFF 3EA3 D2D7 355A   (decimal 0.062499279677629184438)
error      -1.005468872258621479 ulp

cos near pi/2
argument   3FFF C90F DAA2 2168 C235   (decimal 1.5707963267948966193)
actual     BFBD ECE6 75D1 FC8F 8CBB   (decimal -2.5082788063341660117E-20)
x87 fpu    BFBE 8000 0000 0000 0000   (decimal -2.710505431213761085E-20)
error      -1376283091369227076.6 ulp

cos with large argument
argument   403D FFFF FFFF 6CE1 B432   (decimal 9223372035620657689)
actual     3FDD DFD2 E369 AE25 7E4A   (decimal 1.0178327217734091432E-10)
x87 fpu    BFF8 C28B 45B2 1490 D117   (decimal -0.011874025404105249357)
error      -1.8815144449581111989E+27 ulp

tan with smallest failing argument
argument   3FFF B8B5 07B4 294A BD53   (decimal 1.4430245999997931928)
actual     4001 F915 0EE5 BAC8 446C   (decimal 7.7838205801874740721)
x87 fpu    4001 F915 0EE5 BAC8 446C   (decimal 7.7838205801874740721)
error      0.0017725812707024772387 ulp

tan near pi/2
argument   3FFF C90F DAA2 2168 C235   (decimal 1.5707963267948966193)
actual     C040 8A51 E04D AABD A35F   (decimal -39867976298117107068)
x87 fpu    C040 8000 0000 0000 0000   (decimal -36893488147419103232)
error      743622037674500958.81 ulp

tan with large argument
argument   403D FFFF FFFF DCF6 FE38   (decimal 9223372036560879388)
actual     4005 A86C 499C 14EA BD4A   (decimal 84.211499097398127292)
x87 fpu    401F C10C D618 50D5 E957   (decimal 6477687856.6315280604)
error      9.3353319161898434351E+26 ulp

With one exception (tan with smallest failing argument), the results are identical. I also tested on my Ryzen 9 3950X (Zen 2) and got the same results.

In conclusion, recent AMD processors, including the Zen and Zen 2 architectures, use a 66-bit approximation of pi and will produce the same kinds of inaccuracies modern Intel processors give for x87 trigonometric instructions when given certain arguments.

Colonize answered 17/2, 2020 at 18:48 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.