pow for SSE types

Asked 19/9, 2014 at 14:14 Answered 29/3, 2017 at 11:12

I do some explicitly vectorised computations using SSE types, such as __m128 (defined in xmmintrin.h etc), but now I need to raise all elements of the vector to some (same) power, i.e. ideally I would want something like __m128 _mm_pow_ps(__m128, float), which unfortunately doesn't exist.

What is the best way around this? I could store the vector, call std::pow on each element, and then reload it. Is this the best I can do? How do compilers implement a call to std::pow when auto-vectorising code that otherwise is well vectorisable? Are there any libraries that provide something useful?

(note that this question is related by not a duplicate and certainly doesn't have a useful answer.)

Irfan answered 19/9, 2014 at 14:14 Comment(3)

I've used gruntthepeon.free.fr/ssemath for exp/log and write pow(x,k) as exp(k*log(x) when auto-vectorisation was not an option. Not sure how it compares with auto-vectorized code. – Espalier 19/9, 2014 at 14:47

You could use Agner Fog's vector class. He has SIMD math functions (including pow, exp, log, sin,...) for SSE, AVX, and AVX512 for single and float and ints. I don't see any good reason to use Intel's SVML or AMD's libm anymore. – Career 20/9, 2014 at 15:1

@Zboson, Is there a good C library for exp() with SSE4 support? – Venereal 30/10, 2017 at 19:0

Use the formula exp(y*log(x)) for pow(x, y) and a library with SSE implementations of exp() and log().

Edit by @Royi: The above holds only for cases both x and y are positive. Otherwise more carefull Math is needed. See https://math.stackexchange.com/questions/2089690.

Colubrid answered 19/9, 2014 at 14:46 Comment(6)

I had a look at that library. It looks restricted to gcc, only knowns about SSE2, and the documentation in the code is poor. I also would want it for the AVX types __m256 __m256d`. – Irfan 19/9, 2014 at 15:11

@Irfan works fine with MSVC (note benchmarks with VS2010 at the bottom of the link), and code becomes more clear when looking at cephes library which seems to be the main inspiration. – Espalier 20/9, 2014 at 0:21

I need it work wit gcc, icc, clang. The original cephes library is great! If there is nothing better, I can at least implement my own log and exp along the lines of these libraries. – Irfan 20/9, 2014 at 9:1

@Irfan Did you even try to compile the library I've linked? There are about 5 compiler specific lines that and they should work with all the compilers you've mentioned. – Colubrid 23/9, 2014 at 15:13

Sorry, had other things to do in the last couple of days. But this library does not support all the vector types I need. – Irfan 24/9, 2014 at 16:2

Using this formula, how does one handle negative numbers ? – Dickson 22/9, 2017 at 13:31

I really recommend the Intel Short Vector Math Library for these types of operations. The library is bundled with the Intel compiler which you mention in the list of compilers to support. I doubt it would be useful for gcc and clang but it could serve as a reference point for benchmarking wherever pow implementation you come up with.

https://software.intel.com/sites/products/documentation/doclib/iss/2013/compiler/cpp-lin/GUID-DEB8B19C-E7A2-432A-85E4-D5648250188E.htm

Benedic answered 22/9, 2014 at 10:4 Comment(4)

SVML can be useful with gcc. gcc -mveclibabi=svml will even let the vectorizer create calls to vmlsPow4 and such. – Rostock 11/4, 2015 at 6:27

@MarcGlisse, Does gcc include Intel SVML built in? – Venereal 30/10, 2017 at 18:30

gcc does not include SVML, it only has the knowledge of how to generate calls to it, if you promise that you will have it available for linking. – Rostock 30/10, 2017 at 18:33

Your link is no longer working. – Kobe 27/2 at 19:24

An AVX version of the ssemath library is now available: http://software-lisc.fbk.eu/avx_mathfun/

with the library you can use:

exp256_ps(y*log256_ps(x)); // for pow(x, y)

Hernandes answered 29/3, 2017 at 11:12 Comment(2)

Yes, these provide the log, exp, sin, cos, and a sincos function for 8 floats using AVX. Unfortunately, the corresponding double versions are still outstanding (I actually needs those more at the moment). – Irfan 29/3, 2017 at 19:50

You could try the Intel SPMD compiler: ispc.github.io/ispc.html the documentation says it supports pow and AVX – Hernandes 31/3, 2017 at 7:35

-2

Make a vector out of the float.

 _mm_pow_ps(v,_mm_ps1(f))

Polyneuritis answered 10/4, 2015 at 5:48 Comment(2)

There is no _mm_pow_ps(), I'm afraid. Otherwise, I had not asked. – Irfan 10/4, 2015 at 11:47

Ah, I misunderstood the question. Taylor series is traditional, as noted earlier gruntthepeon.free.fr/ssemath is a good resource. Depending on how important accuracy is you can make the number of terms much lower. – Sierrasiesser 12/4, 2015 at 23:18

Recommended topics

Hot tags