Which is more efficient for sines and cosines? Sin and Cos or Sin and Sqrt?

Asked 13/9, 2013 at 19:38 Answered 26/7, 2024 at 15:56

Solved c++performance optimization sqrt trigonometry

Unfortunately, the standard C++ library doesn't have a single call for sincos, which gives a place for this question.

First question:

If I want to calculate a sin and a cos, is it cheaper to calculate a sin and a cos, or calculate a sin then a sqrt(1-sin^2) to get the cos?

Second question:

The intel math kernel library provides very good functions for standard mathematical functions calculations, so a function vdSinCos() exists to solve the problem in a very optimized way, but the intel compiler isn't for free. Is there any open source library (C, C++, Fortran) available in linux distributions that would have those functions where I can simply link to them and have the optimum implementations?

Note: I wouldn't like to go into instruction calls, since they're not supported by all CPUs. I would like to link to a general library that would do the job for me on any CPU.

Thanks.

Universe answered 13/9, 2013 at 19:38 Comment(11)

You will have to measure the performance. Its the only way to know for sure. – Verboten 13/9, 2013 at 19:47

BTW: calculating sin() and then cos() via sqrt(1-sin^2), as OP proposes, is numerically more stable than calculating cos() and then sin() via sqrt(1-cos^2). Good that is not suggested. For small angles the difference is apparent. – Rhapsodic 17/9, 2013 at 21:25

@chux Are you sure that your argument won't be the opposite for angles closer to pi/2? – Universe 18/9, 2013 at 9:8

@Samer Afach Certainly. FP numbers are distributed logarithmically, not linearly else you're correct. Between 0.0 & 1.0 are as many FP numbers as between 1.0 & INF. When |x| is about DBL_EPSILON or less, Sine(x) --> x, and Cosine(x) --> 1.0. So Sine(x) as y=sin(x) and Cosine(x) as sqrt(1-y*y) yield sin, cos of x and 1.0. Doing Doing Cosine(x) as y=cos(x) and Sine(x) as sqrt(1-y*y) yields sin, cos of **0.0** and 1.0, a total loss of precision in sine. As x grows the issues becomes less until x is about 1.0. Number theory could follow, but this deserves its own question. – Rhapsodic 18/9, 2013 at 12:47

@chux I see your point. The logarithmic distribution biases the accuuracy. Thanks! – Universe 18/9, 2013 at 18:59

@chux-ReinstateMonica It looks to me like The Quantum Physicist was correct that sqrt(1-sin^2) is a poor approximation to cos near pi/2. In gnuplot: plot [0:1e-7] sqrt(1-sin(pi/2-x)**2), cos(pi/2-x) No? You seem to be asserting otherwise, but I didn't follow your argument. – Sorosis 15/5, 2022 at 14:41

@DonHatch Sorry not clear enough for you. In comment the only question I found was the terse No? and was not able to discern its context clearly. – Rhapsodic 15/5, 2022 at 18:49

@chux-ReinstateMonica Sorry. What I'm trying to say is: TQP said "are you sure that your argument won't be the opposite for angles closer to pi/2"? Which seemed, to me, to imply that their impression was that sqrt(1-sin^2) isn't a good way to compute cos near pi/2, and I have that same impression. So I made this plot which seems to confirm our impression that sqrt(1-sin^2) isn't very good for arguments near pi/2, and I'm asking you if you agree. – Sorosis 16/5, 2022 at 19:34

@DonHatch Yes. The key point is that 1-pow(sin(pi/2-x),2) suffers severe lost of precisions for x near pi/2. Certainly OK for course answers, yet not of high quality. – Rhapsodic 17/5, 2022 at 0:35

@DonHatch this may interest you as sin(near pi) problems are like cos(near pi/2). – Rhapsodic 17/5, 2022 at 0:43

@chux-ReinstateMonica Thanks for clarifying and for the link, that's a nice writeup. So regarding the choice between sin and sqrt(1-sin^2) vs cos and sqrt(1-cos^2), if I understand correctly, what TQP proposed earlier was exactly correct: the former is better when closer to zero, and the latter is better when closer to pi/2. – Sorosis 17/5, 2022 at 7:46

The GNU C library has a sincos() function, which will take advantage of the "FSINCOS" instruction which most modern instruction sets have. I'd say that's your best bet; it should be just as fast as the Intel library method.

If you don't do that, I'd go with the "sqrt(1-sin(x)^2)" route. In every processor architecture document I've looked at so far, the FSQRT instruction is significantly faster than the FSIN function.

Tallulah answered 13/9, 2013 at 21:35 Comment(0)

The answer to almost every performance problem is "why don't you measure it in your code", because there are a large number of different factors that affect the performance of almost any calculation like this. For example, "Who produces the math functions". Square root is relatively simple to calculate, but I'm not convinced it's a huge difference between sqrt(1-sin*sin) and calculating cos again. What processor may also be a factor, and what other calculations are done "around" the sin/cos calculations.

I wouldn't be surprised if there is a library around somewhere that has this sort of function, but I haven't been looking.

Muchness answered 13/9, 2013 at 19:47 Comment(2)

Thanks for the answer. I want something as general as possible. A single solution for my processor isn't the answer. Isn't there a statistical answer to the question? – Universe 13/9, 2013 at 19:48

Short answer: No. Long Answer: No, because even if I prove to you that cos is faster than sqrt(1-sin^2) statistically, your code may use registers in a different way, causing more overhead, making the alternative method more beneficial... But yes, it's a good idea to test on several types and models of processors, so that you don't end up with something that is fast on Intel and slow on AMD, etc. It gets even more complex, of course, if we are talking "any processor", as the variety of possible answers suddenly increased a lot. – Muchness 13/9, 2013 at 19:52

If precision is not critical the fastest way to get sin or cos is to use tables. Hold some global const array with sin and cos values for all agles with a step you need. So your sin/cos function just need to cast angle to index and you get the result.

Eyeful answered 18/9, 2013 at 20:46 Comment(0)

I had the same issue, and so I benchmarked it. The result was that depending upon what you are doing, your compiler may be better at optimising than you, even if it doesn't use the intrinsic sincos function.

I wrote a small test program to test using sincos intrinsic, std::sin with std::cos, and std::sin with cos calculated from sqrt(1-sin*sin)

The test involved generating 1e8 random numbers from 0-2*M_PI. Each test calculated sin and cos for each random number, summing the values and then outputting the sum to stdout - this ensured the whole program wasn't optimised away. I compiled with O3 and fp:fast

Using sqrt(1-sin*sin) was by far the slowest. This was because I needed an if statement to check the sign of the result. This meant the loop could not be vectorised.

The other options were similar speeds. Initially I had created a fastSinCos function that accepted 4 doubles and returned 4 doubles. I then added the 4 doubles to the sum. This was slower than just using sum += std::sin(input[i])+std::cos(input[i]). It turned out the compiler had vectorised the sum in the naive implementation so was beating me this way.

When I modified my code to create a ```fastSinCosSum`` function where the sums were vectorised I managed to beat the naive version, but only by 10%.

If I restricted the range of the inputs to M_PI/2.0-3.0*M_PI/2.0 so I knew that the result of cos was always negative, then the speed was identical to the naive version.

As 1e8 doubles is bigger than my cache, I suspect cache misses could be the actual bottleneck. However, even then, the test only took around a second to run, so it seems daft to worry about it.

So in the end, unless a 10% gain in the most idealised setting matters to you, I suspect you are better off ensuring the compiler can vectorize rather than trying to use the intrinsic functions. The generated assembly for the three options is shown below.

        fastSinCosSum(&inputs[i], r_sum3); //pass in the address of the first of the 4 elements to use and a register to store 4 sums
00007FF7FC321460  vmovupd     ymm0,ymmword ptr [r14+rdi*8]  
00007FF7FC321466  call        __vdecl_sincos4 (07FF7FC322AD0h)  
00007FF7FC32146B  vaddpd      ymm0,ymm0,ymmword ptr [r_sum3]  
00007FF7FC321470  vaddpd      ymm0,ymm0,ymm1  
00007FF7FC321474  vmovupd     ymmword ptr [r_sum3],ymm0 


        sum2 += std::sin(inputs[i]) + std::cos(inputs[i]); // just calculate naively
00007FF7FC321305  vmovupd     ymm0,ymmword ptr [r14+rbx*8]  
00007FF7FC32130B  call        __vdecl_cos4 (07FF7FC322A80h)  
00007FF7FC321310  vmovupd     ymmword ptr [rbp+60h],ymm0  
00007FF7FC321315  vmovupd     ymm0,ymmword ptr [r14+rbx*8]  
00007FF7FC32131B  call        __vdecl_sin4 (07FF7FC322AA0h)  
00007FF7FC321320  vaddpd      ymm1,ymm0,ymmword ptr [rbp+60h]  
00007FF7FC321325  vaddpd      ymm0,ymm1,ymmword ptr [rbp+20h]  
00007FF7FC32132A  vmovupd     ymmword ptr [rbp+20h],ymm0 

        double sin = std::sin(inputs[i]);
00007FF607CD1363  vmovupd     ymm0,ymmword ptr [r14+rbx*8]  
00007FF607CD1369  call        __vdecl_sin4 (07FF607CD2A70h)  
        sum2a += sin - std::sqrt(1 - sin * sin); // calculate cos using sqrt. The angles are limited so we know the sign of the result is negative
00007FF607CD136E  vmovupd     ymm1,ymmword ptr [__ymm@3ff00000000000003ff00000000000003ff00000000000003ff0000000000000 (07FF607CD7480h)]  
00007FF607CD1376  vfnmadd231pd ymm1,ymm0,ymm0  
00007FF607CD137B  vsqrtpd     ymm1,ymm1  
00007FF607CD137F  vsubpd      ymm0,ymm0,ymm1  
00007FF607CD1383  vaddpd      ymm1,ymm0,ymmword ptr [rbp+60h]  
00007FF607CD1388  vmovupd     ymmword ptr [rbp+60h],ymm1

Iterate answered 26/7, 2024 at 15:56 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags