I'm working on a port of SSE2 to NEON. The port is early stage and it's producing incorrect results. Part of the reason for the incorrect results is _mm_shuffle_epi32
and the NEON instructions I selected.
The documentation for _mm_shuffle_epi32
is on the lean side from Microsoft. The Intel documentation is better, but it's not clear to me what some of the pseudo-code is doing.
SELECT4(src, control)
0: tmp[31:0] := src[31:0]
1: tmp[31:0] := src[63:32]
2: tmp[31:0] := src[95:64]
3: tmp[31:0] := src[127:96]
RETURN tmp[31:0]
dst[31:0] := SELECT4(a[127:0], imm8[1:0])
dst[63:32] := SELECT4(a[127:0], imm8[3:2])
dst[95:64] := SELECT4(a[127:0], imm8[5:4])
dst[127:96] := SELECT4(a[127:0], imm8[7:6])
I need help envisioning what _mm_shuffle_epi32
does. Or more correctly, the permutation applied to the value by the immediate. I guess I need to see it as basic C and ANDs and ORs.
Given C statements and macros like:
v2 = _mm_shuffle_epi32(v1, _MM_SHUFFLE(i1,i2,i3,i4));
What does the resulting C expression look like when it's unrolled into basic C statements?
, opposite of_mm_setr
. – Ketti