Is there any SSE2 instruction to load a 128 bit int
vector register from an int
buffer, in reverse order ?
SSE2 instruction to load integers in reverse order
It's quite easy to reverse 32 bit int
elements after a normal load:
__m128i v = _mm_load_si128(buff); // MOVDQA
v = _mm_shuffle_epi32(v, _MM_SHUFFLE(0, 1, 2, 3)); // PSHUFD - mask = 00 01 10 11 = 0x1b
You can do the same thing for 16 bit short
elements, but it takes more instructions:
__m128i v = _mm_load_si128(buff); // MOVDQA
v = _mm_shuffle_epi32(v, _MM_SHUFFLE(0, 1, 2, 3)); // PSHUFD - mask = 00 01 10 11 = 0x1b
v = _mm_shufflelo_epi16(v, _MM_SHUFFLE(2, 3, 0, 1)); // PSHUFLW - mask = 10 11 00 01 = 0xb1
v = _mm_shufflehi_epi16(v, _MM_SHUFFLE(2, 3, 0, 1)); // PSHUFHW - mask = 10 11 00 01 = 0xb1
Note that you can do this with fewer instructions using _mm_shuffle_epi8
(PSHUFB
), if SSSE3 is available:
const __m128i vm = _mm_setr_epi8(14, 15, 12, 13, 10, 11, 8, 9, 6, 7, 4, 5, 2, 3, 0, 1);
// initialise vector mask for use with PSHUFB
// NB: do this once, outside any processing loop
...
__m128i v = _mm_load_si128(buff); // MOVDQA
v = _mm_shuffle_epi8(v, vm); // PSHUFB
Thanks Paul.Your logic is working fine.But I couldnt understand the usage of the second parameter "0x1B". Is it some sort of mask? Another doubt is ..Is it possible to do the same operation on shorts? –
Mingy
I've added a second example for loading and reversing shorts. The mask is covered in the Intel docs but I've added comments to show how it is constructed. –
Gray
P.S. I highly recommend downloading the Intel Intrinsics Guide - a very useful tool for WIN/Mac OS X/Linux which documents all the SSE/AVX instructions and intrinsics in a very accessible way. –
Gray
I would use PSHUFB for reversing a vector of shorts, unless SSSE3 isn’t available. –
Ferity
Sure, but the OP specifically asked for SSE2 solutions. I'll add a note to the answer though. –
Gray
SSE3, SSSE3, SSE4.1 and SSE4.2 all are supported. As far as the usage of _mm_shuffle_epi8 (PSHUFB) is concerned, I am not exactly able to figure out the usage of the mask.Can someone pls explain? –
Mingy
OK - I've added a
PSHUFB
example above for reversing the order of 16 bit ints in a vector. –
Gray Thanks._mm_shuffle_epi8 now seems to make sense to me. I am a novice in Intel intrinsic programming(although I have worked with NEON intrinsics) and initially it seemed to me that there are no straightforward instructions in SSE to accomplish certain functionalities.But now it looks most operations are possible with the provided intruction sets combined with the correct logic :-) –
Mingy
Yes, that's true - there are quite a few "tricks of the trade" that you need to learn to get the best out of SIMD in general, and SSE in particular. –
Gray
@Paul..Are there any tutorials or papers which can help me in learning a few "tricks of the trade" as well.Pls suggest. –
Mingy
Unfortunately there is not much out there - the best thing you can do is read and understand any existing code that you can find, e.g. in open source codebases, and also of course by writing your own optimised SIMD routines. –
Gray
EDIT: (The following is for single precision floating point scalars, leaving it here just in case)
The most approximate (and handy) is _mm_loadr_ps
intrinsic. Be aware the address must be 16byte aligned.
Although this intrinsic translates to more than instruction (MOVAPS
+ shuffling).
Thanks for the reply but this instruction loads four single-precision, floating-point values in reverse order.I am looking for the same operation for integers but I guess there is no support for that. –
Mingy
Yes I didn't notice you were talking about integer values (should have re-read your title). Paul R answer is what you need. –
Predestinarian
Yes.Just curious, can the same operation be done with short values ? –
Mingy
© 2022 - 2024 — McMap. All rights reserved.