Logarithm with SSE, or switch to FPU?
Asked Answered
M

2

11

I'm doing some statistics calculations. I need them to be fast, so I rewrote most of it to use SSE. I'm pretty much new to it, so I was wondering what the right approach here is:

To my knowledge, there is no log2 or ln function in SSE, at least not up to 4.1, which is the latest version supported by the hardware I use.

Is it better to:

  1. extract 4 floats, and do FPU calculations on them to determine enthropy - I won't need to load any of those values back into SSE registers, just sum them up to another float
  2. find a function for SSE that does log2
Marcello answered 17/1, 2012 at 23:8 Comment(8)
What kind of range and accuracy do you need for your log2 ?Argumentation
Same accuracy I get from the FPU would be desirableMarcello
There seem to be a few SSE log2 implementations around, e.g. jrfonseca.blogspot.com/2008/09/…Argumentation
Neat, thanks! I'll try that and benchmark it. Extracting the floats to an array and then doing 4 consecutive log2's on that via FPU was disappointingly slow. Instruments said it's wasting 95% of its time there.Marcello
There is also the Intel Approximate Maths Library - it's old (2000) but it's SSE2 and it should still work reasonably well: intel.com/design/pentiumiii/devtools/AMaths.zipArgumentation
Woah... I'd tried the implementation from the blog you linked, the one I can approximate as close as I'd like. It's FAST. Cut down processing time down to about 10%. Thanks a LOT!Marcello
OK - I'll put those two links in an answer for future reference.Argumentation
Here is another link: http://gruntthepeon.free.fr/ssemath. Implements only the log function with SSE, but with with one more instruction you'll get the log2Anu
A
10

There seem to be a few SSE log2 implementations around, e.g. this one.

There is also the Intel Approximate Maths Library which has a log2 function among others - it's old (2000) but it's SSE2 and it should still work reasonably well.


See also:
Argumentation answered 18/1, 2012 at 9:37 Comment(3)
Due to the method used on the blog, the function is now memory bound, instead of CPU bound. I unrolled the loop a little to make use of some _mm_prefetch love, and it still is memory bound. Thanks for that awesome pointer!Marcello
Glad it worked for you. You probably already know this, but if you're hitting a memory bandwidth bottleneck then try to combine other operations with your log2 so that you make more use of data while it's in cache.Argumentation
If you are updating your answer, you might want to mention libmvec, which is shipped with recent glibc.Custodial
E
2

There is no SSE instruction that implements a logarithm function. However, there's also no single x86 instruction that performs a generic logarithm either. If you're thinking about using a logarithm function like log or log10 from the C standard library, it's worth taking a look at the implementation that is used in an open-source library like libc. You can easily roll your own logarithm approximation that operates across all elements in an SSE register.

Such a function is often implemented using a polynomial approximation that is valid within some accuracy specification over a certain region of input arguments, such as a Taylor series. You can then take advantage of logarithm properties to wrap a generic input argument into the acceptable input range for your logarithm routine. In addition, you can parameterize the base of the logarithm by taking advantage of the property:

log_y(x) = log_a(x) / log_a(y)

Where a is the base of the logarithm routine that you created.

Exhalation answered 17/1, 2012 at 23:23 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.