SSE2 instruction to load integers in reverse order

About

Asked 16/5, 2013 at 10:4 Answered 16/5, 2013 at 10:9

Is there any SSE2 instruction to load a 128 bit int vector register from an int buffer, in reverse order ?

Mingy answered 16/5, 2013 at 10:4 Comment(0)

It's quite easy to reverse 32 bit int elements after a normal load:

__m128i v = _mm_load_si128(buff);                    // MOVDQA
v = _mm_shuffle_epi32(v, _MM_SHUFFLE(0, 1, 2, 3));   // PSHUFD  - mask = 00 01 10 11 = 0x1b

You can do the same thing for 16 bit short elements, but it takes more instructions:

__m128i v = _mm_load_si128(buff);                    // MOVDQA
v = _mm_shuffle_epi32(v, _MM_SHUFFLE(0, 1, 2, 3));   // PSHUFD  - mask = 00 01 10 11 = 0x1b
v = _mm_shufflelo_epi16(v, _MM_SHUFFLE(2, 3, 0, 1)); // PSHUFLW - mask = 10 11 00 01 = 0xb1
v = _mm_shufflehi_epi16(v, _MM_SHUFFLE(2, 3, 0, 1)); // PSHUFHW - mask = 10 11 00 01 = 0xb1

Note that you can do this with fewer instructions using _mm_shuffle_epi8 (PSHUFB), if SSSE3 is available:

const __m128i vm = _mm_setr_epi8(14, 15, 12, 13, 10, 11, 8, 9, 6, 7, 4, 5, 2, 3, 0, 1);
                                     // initialise vector mask for use with PSHUFB
                                     // NB: do this once, outside any processing loop
...
__m128i v = _mm_load_si128(buff);    // MOVDQA
v = _mm_shuffle_epi8(v, vm);         // PSHUFB

Gray answered 16/5, 2013 at 10:9 Comment(11)

Thanks Paul.Your logic is working fine.But I couldnt understand the usage of the second parameter "0x1B". Is it some sort of mask? Another doubt is ..Is it possible to do the same operation on shorts? – Mingy 16/5, 2013 at 10:40

I've added a second example for loading and reversing shorts. The mask is covered in the Intel docs but I've added comments to show how it is constructed. – Gray 16/5, 2013 at 11:47

P.S. I highly recommend downloading the Intel Intrinsics Guide - a very useful tool for WIN/Mac OS X/Linux which documents all the SSE/AVX instructions and intrinsics in a very accessible way. – Gray 16/5, 2013 at 11:53

I would use PSHUFB for reversing a vector of shorts, unless SSSE3 isn’t available. – Ferity 21/5, 2013 at 13:52

Sure, but the OP specifically asked for SSE2 solutions. I'll add a note to the answer though. – Gray 21/5, 2013 at 15:33

SSE3, SSSE3, SSE4.1 and SSE4.2 all are supported. As far as the usage of _mm_shuffle_epi8 (PSHUFB) is concerned, I am not exactly able to figure out the usage of the mask.Can someone pls explain? – Mingy 27/5, 2013 at 7:57

OK - I've added a PSHUFB example above for reversing the order of 16 bit ints in a vector. – Gray 27/5, 2013 at 8:28

Thanks._mm_shuffle_epi8 now seems to make sense to me. I am a novice in Intel intrinsic programming(although I have worked with NEON intrinsics) and initially it seemed to me that there are no straightforward instructions in SSE to accomplish certain functionalities.But now it looks most operations are possible with the provided intruction sets combined with the correct logic :-) – Mingy 27/5, 2013 at 9:13

Yes, that's true - there are quite a few "tricks of the trade" that you need to learn to get the best out of SIMD in general, and SSE in particular. – Gray 27/5, 2013 at 15:18

@Paul..Are there any tutorials or papers which can help me in learning a few "tricks of the trade" as well.Pls suggest. – Mingy 28/5, 2013 at 7:33

Unfortunately there is not much out there - the best thing you can do is read and understand any existing code that you can find, e.g. in open source codebases, and also of course by writing your own optimised SIMD routines. – Gray 28/5, 2013 at 8:17

-2

EDIT: (The following is for single precision floating point scalars, leaving it here just in case)

The most approximate (and handy) is _mm_loadr_ps intrinsic. Be aware the address must be 16byte aligned.

Although this intrinsic translates to more than instruction (MOVAPS + shuffling).

Predestinarian answered 16/5, 2013 at 10:7 Comment(3)

Thanks for the reply but this instruction loads four single-precision, floating-point values in reverse order.I am looking for the same operation for integers but I guess there is no support for that. – Mingy 16/5, 2013 at 10:13

Yes I didn't notice you were talking about integer values (should have re-read your title). Paul R answer is what you need. – Predestinarian 16/5, 2013 at 10:50

Yes.Just curious, can the same operation be done with short values ? – Mingy 16/5, 2013 at 11:12

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags