What is the complexity / real cost of exp in cmath compared to a FLOP?
Asked Answered
S

4

1

[I globally edited the question to be more "useful" and clear]

I was wondering about the complexity of the implementation of the function exp in cmath.

By complexity, I mean algorithmic complexity if possible. Otherwise cost compared to a floating point operation (addition for example)

The following lines :

double x = 3;
double y = std::exp(x);

compile to :

...
19,23d16
       movq    %rax, -40(%rbp)
       movsd   -40(%rbp), %xmm0
       call    exp
       movsd   %xmm0, -40(%rbp)
       movq    -40(%rbp), %rax
...

exp has to be dynamically loaded at runtime but I can't find many information on the implementation algorithmic complexity. It seems there's no call to a special processor instruction (at least on my x86_64 platform with gcc) so there must be an implementation somewhere that I can't find. On my mind, the algorithm is likely to use the binary representation of the input to have a very weak complexity but I haven't been able to find a valuable reference on this topic.

Maybe speaking of algorithmical complexity is not really possible in this case and all what we can do is testing (cf. answers below) but I don't know how we can quantify objectively the difference between a floating point operation and a call to exp ?

Sapwood answered 20/10, 2010 at 16:7 Comment(9)
This is specific for platform and compiler. Which one are you using?Marlowe
Are you asking how complex a specific implementation is or what the algorithmical complexity of a specific implementation is?Waterfowl
Again: The algorithmical complexity of which implementation?Marlowe
Maybe the implementation of cmath?Commandant
Which implementation of cmath? :)Marlowe
Is the implementation of c++ a part of the libstdc++ (on linux) ? Maybe I should look the code to find more information about my implementation...Sapwood
That call exp in the midst of your disassembly is just a function call to the C library function exp(). That could be implemented any number of ways.Neutron
You can compare the complexity by the usual method: testing.Sparerib
"speaking of algorithmical complexity is not really possible in this case": — No it isn't… would you mind changing the title of the question? "And I don't know how we can quantify the difference between a floating point operation and a call to exp." — it is quantified as the maximum number of clock cycles that exp can take.Correspond
H
4

It seems that the complexity is actually constant since MSVC9 compiler does some bit-magic involving specific tables, bitmasks and biases. As there are few branches after all instruction pipeline should help a lot. Below is what it does actually.

unpcklpd    xmm0,xmm0 
movapd      xmm1,xmmword ptr [cv] 
movapd      xmm6,xmmword ptr [Shifter] 
movapd      xmm2,xmmword ptr [cv+10h] 
movapd      xmm3,xmmword ptr [cv+20h] 
pextrw      eax,xmm0,3 
and         eax,7FFFh 
mov         edx,408Fh 
sub         edx,eax 
sub         eax,3C90h 
or          edx,eax 
cmp         edx,80000000h 
jae         RETURN_ONE 
mulpd       xmm1,xmm0 
addpd       xmm1,xmm6 
movapd      xmm7,xmm1 
subpd       xmm1,xmm6 
mulpd       xmm2,xmm1 
movapd      xmm4,xmmword ptr [cv+30h] 
mulpd       xmm3,xmm1 
movapd      xmm5,xmmword ptr [cv+40h] 
subpd       xmm0,xmm2 
movd        eax,xmm7 
mov         ecx,eax 
and         ecx,3Fh 
shl         ecx,4 
sar         eax,6 
mov         edx,eax 
subpd       xmm0,xmm3 
movapd      xmm2,xmmword ptr Tbl_addr[ecx] 
mulpd       xmm4,xmm0 
movapd      xmm1,xmm0 
mulpd       xmm0,xmm0 
addpd       xmm5,xmm4 
mulsd       xmm0,xmm0 
addsd       xmm1,xmm2 
unpckhpd    xmm2,xmm2 
movdqa      xmm6,xmmword ptr [mmask] 
pand        xmm7,xmm6 
movdqa      xmm6,xmmword ptr [bias] 
paddq       xmm7,xmm6 
psllq       xmm7,2Eh 
mulpd       xmm0,xmm5 
addsd       xmm1,xmm0 
orpd        xmm2,xmm7 
unpckhpd    xmm0,xmm0 
addsd       xmm0,xmm1 
add         edx,37Eh 
cmp         edx,77Ch 
ja          ADJUST 
mulsd       xmm0,xmm2 
sub         esp,10h 
addsd       xmm0,xmm2 
movlpd      qword ptr [esp+4],xmm0 
fld         qword ptr [esp+4] 
add         esp,10h 
ret              
Husted answered 20/10, 2010 at 17:11 Comment(0)
S
4

Speaking generally, the complexity for primitive types should be very fast. As the commenters mention, there are sometimes instructions for it, and if not there are well-known fast algorithms, Knuth has a good section on such things.

The usual implementation for exponentiation is square-and-multiply which makes use of the observation that you can break any exponentiation up into some number of squarings plus at most one more multiply. The basic algorithm for n**k is given here and is O( lg k).

Sparerib answered 20/10, 2010 at 16:27 Comment(5)
How can we compare calls to exp to additions and multiplications?Commandant
Thanks for your answer. I think there must be two cases depending on the platform : first, there's a specific hardware instruction for exp ; second, there's somewhere in the c++ runtime libs a square and multiply algorithm.Sapwood
But then exp(n) is O(log(n)). I believe this is only true for arbitrary precision.Commandant
The number of bits is fixed, albeit implementation dependent. And if the number of bits is fixed then the complexity has an upper bound.Gemagemara
@Wok, according to source code of glibc they use O(1) computation for exp(x).Mussel
H
4

It seems that the complexity is actually constant since MSVC9 compiler does some bit-magic involving specific tables, bitmasks and biases. As there are few branches after all instruction pipeline should help a lot. Below is what it does actually.

unpcklpd    xmm0,xmm0 
movapd      xmm1,xmmword ptr [cv] 
movapd      xmm6,xmmword ptr [Shifter] 
movapd      xmm2,xmmword ptr [cv+10h] 
movapd      xmm3,xmmword ptr [cv+20h] 
pextrw      eax,xmm0,3 
and         eax,7FFFh 
mov         edx,408Fh 
sub         edx,eax 
sub         eax,3C90h 
or          edx,eax 
cmp         edx,80000000h 
jae         RETURN_ONE 
mulpd       xmm1,xmm0 
addpd       xmm1,xmm6 
movapd      xmm7,xmm1 
subpd       xmm1,xmm6 
mulpd       xmm2,xmm1 
movapd      xmm4,xmmword ptr [cv+30h] 
mulpd       xmm3,xmm1 
movapd      xmm5,xmmword ptr [cv+40h] 
subpd       xmm0,xmm2 
movd        eax,xmm7 
mov         ecx,eax 
and         ecx,3Fh 
shl         ecx,4 
sar         eax,6 
mov         edx,eax 
subpd       xmm0,xmm3 
movapd      xmm2,xmmword ptr Tbl_addr[ecx] 
mulpd       xmm4,xmm0 
movapd      xmm1,xmm0 
mulpd       xmm0,xmm0 
addpd       xmm5,xmm4 
mulsd       xmm0,xmm0 
addsd       xmm1,xmm2 
unpckhpd    xmm2,xmm2 
movdqa      xmm6,xmmword ptr [mmask] 
pand        xmm7,xmm6 
movdqa      xmm6,xmmword ptr [bias] 
paddq       xmm7,xmm6 
psllq       xmm7,2Eh 
mulpd       xmm0,xmm5 
addsd       xmm1,xmm0 
orpd        xmm2,xmm7 
unpckhpd    xmm0,xmm0 
addsd       xmm0,xmm1 
add         edx,37Eh 
cmp         edx,77Ch 
ja          ADJUST 
mulsd       xmm0,xmm2 
sub         esp,10h 
addsd       xmm0,xmm2 
movlpd      qword ptr [esp+4],xmm0 
fld         qword ptr [esp+4] 
add         esp,10h 
ret              
Husted answered 20/10, 2010 at 17:11 Comment(0)
M
2

Here one can find a fast exp implementation that uses the SSE instructions.

Marten answered 20/10, 2010 at 16:53 Comment(0)
R
1

Are you interested in the time taken by exponentiation, as compared to time taken by other floating-point operations? That will vary from implementation to implementation, and also from computer to computer (one may have a different math processor), so there's no one answer we can give.

The right approach, if you want to know, is to write test functions and time them. Loop through a million floating-point assignments and time it, then loop through a million floating-point assignments of exponentials and time that, then subtract. Watch out for the optimizer on this one, as if you don't use the result of the assignments it's allowed to remove the whole loop. You'll know that by extremely fast runtimes that don't vary with the size of the loop.

Raisin answered 20/10, 2010 at 17:11 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.