Is it possible to compare more than a pair of numbers in one instruction using SSE4?
Intel Reference says the following about PCMPGTQ
PCMPGTQ — Compare Packed Data for Greater Than
Performs an SIMD compare for the packed quadwords in the destination operand (first operand) and the source operand (second operand). If the data element in the first (destination) operand is greater than the corresponding element in the second (source) operand, the corresponding data element in the destination is set to all 1s; otherwise, it is set to 0s.
which is not really what I want because I want to be able to decide which integers are greater and which are smaller in the vector.
For example, if I need to compare
32 with 45
13 with 78
44 with 12
99 with 66
I was planning to put [32, 13, 44, 99]
in one vector and [45, 78, 12, 66]
in another vector and compare them using SSE4 in one instruction, and have [0, 0, 1, 1]
as result (0 - less, 1 - greater)
But it seems this is not what PCMPGTQ does. Any suggestions on how to use parallelism at this level to speedup this comparison?
PCMPGTQ
by[1, 1, 1, 1]
. Or am I missing something? – Publicist