What is the error of trigonometric instructions on x86?

Asked 20/2, 2014 at 13:15 Answered 13/1, 2017 at 5:48

Solved math x86 floating-point trigonometry x87

Where can I find the information about error ranges for trigonometric function instructions on x86 processors, like fsincos?

Stalky answered 20/2, 2014 at 13:15 Comment(3)

I expect it to be 1 ulp as required by IEEE 754. – Frederick 20/2, 2014 at 13:42

@lhf: IEEE-754 imposes no requirements on trigonometric functions (and if it did, the requirement wouldn’t be 1 ulp; operations standardized by IEEE-754 are generally required to be correctly rounded, which corresponds roughly to a 0.5 ulp tolerance). – Acrolein 20/2, 2014 at 14:6

Related: randomascii.wordpress.com/2014/10/09/… - Intel Underestimates Error Bounds by 1.3 quintillion (in their previous docs for fsin) – Australia 15/3, 2021 at 4:31

What you ask is rarely an interesting question, and most likely you really want to know something different. So let me answer different questions first:

How to calculate trigonometric function to a certain accuracy?

Just use a longer datatype. With x86, if you need the result with double accuracy, do an 80-bit extended double calculation and you are on the safe side.

How to get platform-independent accuracy?

You need a specialized software solution for this, like MPFR

That said, let me come back to your original question. Short answer: for small operands it should be typically within 1 ulp. For larger operands it's getting worse. The only way to find out for sure is to test this for yourself, like this guy did. There is no reliable information from the processor vendors.

Massasoit answered 20/2, 2014 at 14:5 Comment(1)

Thanks. I was actually interested in error of transcendental instructions, but was also interested in how to get more accuracy and you answered it as well. – Stalky 20/2, 2014 at 15:6

For Intel CPUs the accuracy of the built-in transcendental instructions is documented in Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 1, section 8.3.10 Transcendental Instruction Accuracy:

With the Pentium processor and later IA-32 processors, the worst case error on transcendental functions is less than 1 ulp when rounding to the nearest (even) and less than 1.5 ulps when rounding in other modes.

It should be noted that the error bound of 1 ulp applies to the 80-bit extended-precision format, as all transcendental function instructions deliver extended-precision results. The issue noted by Stephen Cannon in an earlier comment regarding a loss of accuracy, relative to a mathematical reference, for the trigonometric function instructions FSIN, FCOS, FSCINCOS, FPTAN, due to argument reduction with a 66-bit machine PI, is acknowledged by Intel. Guidance is provided as follows:

Regardless of the target precision (single, double, or double-extended), it is safe to reduce the argument to a value smaller in absolute value than about 3π/4 for FSIN, and smaller than about 3π /8 for FCOS, FSINCOS, and FPTAN. [...] For example, accuracy measurements show that the double-extended precision result of FSIN will not have errors larger than 0.72 ulp for |x| < 2.82 [...] Likewise, the double-extended precision result of FCOS will not have errors larger than 0.82 ulp for |x| < 1.31 [...]

It is further acknowledged that the error bound of 1 ulp for the logarithmic function instructions FYL2X and FYL2XP1 holds only when y = 1 (this was not clear in some of Intel's older documentation):

The instructions FYL2X and FYL2XP1 are two operand instructions and are guaranteed to be within 1 ulp only when y equals 1. When y is not equal to 1, the maximum ulp error is always within 1.35

Using a multi-precision library, it is straightforward to put Intel's claims to a test. To collect the following data, I used Richard Brent's MP library as a reference, and ran 2³¹ random test cases in the intervals indicated:

Intel Xeon CPU E3-1270 v2 "IvyBridge", Intel64 Family 6 Model 58 Stepping 9, GenuineIntel

2xm1 [-1,1]        max. ulp = 0.898306 at x = -1.8920e-001 (BFFC C1BED062 C071D472)
sin [-2.82,+2.82]  max. ulp = 0.706783 at x =  5.1323e-001 (3FFE 8362D6B1 FC93DFA0)
cos [-1.41,+1.41]  max. ulp = 0.821634 at x = -1.3201e+000 (BFFF A8F8486E 591A59D7)
tan [-1.41,+1.41]  max. ulp = 0.990388 at x =  1.3179e+000 (3FFF A8B0CAB9 0039C790)
atan [-1,1]        max. ulp = 0.747328 at x =  1.2252e-002 (3FF8 C8BB9E06 B9EB4DF8), y =  3.9204e-001 (3FFD C8B8DC94 AA6655B4)
y2lx [0.5,2.0]     max. ulp = 0.994396 at x =  1.0218e+000 (3FFF 82C95B56 8A70EB2D), y =  1.0000e+000 (3FFF 80000000 00000000)
yl2x [1.0,1.2]     max. ulp = 1.202769 at x =  1.0915e+000 (3FFF 8BB70F1B C5F7E103), y = -9.8934e-001 (BFFE FD453A23 AC926478)
yl2xp1 [-0.7,1.44] max. ulp = 0.990469 at x =  2.1709e-002 (3FF9 B1D61A98 BF349080), y =  1.0000e+000 (3FFF 80000000 00000000)
yl2xp1 [-1, 1]     max. ulp = 1.206979 at x =  9.1169e-002 (3FFB BAB69127 C1D5C158), y = -9.9281e-001 (BFFE FE28A91F 132F0C35)

While such non-exhaustive testing cannot prove error bounds, the maximum errors found appear to confirm Intel's documentation.

I do not have any modern AMD processors to test, but do have test data for an old 32-bit Athlon CPU. Full disclosure: I designed the algorithms for the transcendental functions instructions used in 32-bit Athlon processors. My accuracy target was less than 1 ulp for all the instructions; however the same caveat about argument reduction by 66-bit machine PI for trigonometric functions already mentioned above applies.

Athlon XP-2100 "Palomino", x86 Family 6 Model 6 Stepping 2, AuthenticAMD

2xm1 [-1,1]        max. ulp = 0.720006 at x =  5.6271e-001 (3FFE 900D9E90 A533535D)
sin [-2.82, +2.82] max. ulp = 0.663069 at x = -2.8200e+000 (C000 B47A7BB2 305631FE)
cos [-1.41, +1.41] max. ulp = 0.671089 at x = -1.3189e+000 (BFFF A8D0CF9E DC0BCA43)
tan [-1.41, +1.41] max. ulp = 0.783821 at x = -1.3225e+000 (BFFF A947067E E3F4C39C)
atan [-1,1]        max. ulp = 0.665893 at x =  5.5333e-001 (3FFE 8DA6B606 C58B206A) y =  5.5169e-001 (3FFE 8D3B9DC8 5EA87546)
yl2x [0.4,2.5]     max. ulp = 0.716276 at x =  6.9826e-001 (3FFE B2C128C3 0EF1EC00) y = -1.2062e-001 (BFFB F7064049 BC362838)
yl2xp1 [-1,4]      max. ulp = 0.691403 at x =  1.9090e-001 (3FFC C37C0397 F8184934) y = -2.4796e-001 (BFFC FDE93CA9 980BF78C)

The AMD64 Architecture Programmer’s Manual, Vol. 1, in section 6.4.5.1 Accuracy of Transcendental Results, documents the error bounds as follows:

x87 computations are carried out in double-extended-precision format, so that the transcendental functions provide results accurate to within one unit in the last place (ulp) for each of the floating-point data types.

Deadbeat answered 13/1, 2017 at 5:48 Comment(0)

You can read the Intel® 64 and IA-32 Architectures Developer's Manual: Vol. 1 section 8.3.10 on Transcendental Instruction Accuracy. There is a precise formula, but also the more accessible statement

With the Pentium processor and later IA-32 processors, the worst case error on transcendental functions is less than 1 ulp when rounding to the nearest (even) and less than 1.5 ulps when rounding in other modes.

Tiedeman answered 20/2, 2014 at 14:1 Comment(3)

When considering the trig functions specifically, it is vital to keep in mind that they accuracy bounds are computed against a reference function that uses a 66-bit approximation to pi (see 8.3.8 in the same document). If you compare the results to the mathematically exact functions (what most people would naively want to do), the error can be quite a bit larger than 1 ulp (once you are outside the fundamental domain of the function the error grows very quickly). – Acrolein 20/2, 2014 at 14:15

Intel has since corrected that documentation after Bruce Dawson pointed out how wrong it was when range-reduction led to catastrophic cancellation for fsin inputs near +Pi: Intel Underestimates Error Bounds by 1.3 quintillion – Australia 15/3, 2021 at 4:35

Downvoting per @PeterCordes 's pointing out that the quoted passage is wrong and the documentation no longer says that. However, I looked at the "corrected" Trancendental Instructions Accuracy section and it 's alarmingly unsatisfying: in fact, it does still say that, but a subsequent paragraph has been added saying that the previous paragraph isn't true (so it's now important to avoid quoting it out of context), and, further, seeming to imply that it's not possible/feasible to do correct arg reduction, which isn't true. What a mess, and a good answer would need to point all this out. – Gerrygerrymander 1/7 at 1:48

Recommended topics

Hot tags