How to compare two vectors using SIMD and get a strncmp like result?
Asked Answered
B

1

5

I want to achieve something like strncmp result but not that complicated I tried to read https://code.woboq.org/userspace/glibc/sysdeps/x86_64/multiarch/strcmp-avx2.S.html source code but I failed to understand it

suppose we have to 256 bit vector how can I compare these two based on 8 bit comparison to achieve result like strncmp

I know there is a library but I want to understand the basics.

how it return -1,0,1 result with _mm256_cmpeq_epi8 and _mm256_min_epu8

Brooking answered 8/2, 2022 at 14:18 Comment(2)
I can't answer the SIMD part, but how it return -1,0,1 result is not required. strncmp does not need to return -1, 0, 1 it just needs to return <0, 0, >0Dun
thats true. but the problem is: the first bit of a negative number must be 1 and if we use _mm256_movemask_epi8, a string maybe be less than the other string but the first byte maybe equal(thus resulting in a positive number)Brooking
A
6

I would do it like that.

inline int compareBytes( __m256i a, __m256i b )
{
    // Compare for both a <= b and a >= b
    __m256i min = _mm256_min_epu8( a, b );
    __m256i le = _mm256_cmpeq_epi8( a, min );
    __m256i ge = _mm256_cmpeq_epi8( b, min );

    // Reverse bytes within 16-byte lanes
    const __m128i rev16 = _mm_set_epi8( 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 );
    const __m256i rev32 = _mm256_broadcastsi128_si256( rev16 );
    le = _mm256_shuffle_epi8( le, rev32 );
    ge = _mm256_shuffle_epi8( ge, rev32 );

    // Move the masks to scalar registers
    uint32_t lessMask = (uint32_t)_mm256_movemask_epi8( le );
    uint32_t greaterMask = (uint32_t)_mm256_movemask_epi8( ge );

    // Flip high/low 16-bit pieces in the masks.
    // Apparently, modern compilers are smart enough to emit ROR instructions for that code
    lessMask = ( lessMask >> 16 ) | ( lessMask << 16 );
    greaterMask = ( greaterMask >> 16 ) | ( greaterMask << 16 );

    // Produce the desired result
    if( lessMask > greaterMask )
        return -1;
    else if( lessMask < greaterMask )
        return +1;
    else
        return 0;
}

The reason that method works, integer comparison is essentially searching for the most significant bit which differs, and comparison result is equal to the difference in that most significant different bit. Because we reversed order of the bytes being tested, the first byte in the vectors corresponds to the most significant bit in the masks. For this reason, ( lessMask > greaterMask ) expression evaluates to true when for the first different byte in the source vectors ( a < b ) evaluated to true.

Arguelles answered 8/2, 2022 at 16:1 Comment(3)
Oh right, good edit, scalar ror wins again over SIMD lane-swap; funny it came up twice in the past few days. Too bad x86-64 never added an instruction like ARM's rbit. As a fixed bit-shuffle, it'd be cheap to implement, just wires.Mannikin
@PeterCordes Yeah, I’ve checked and both gcc and clang indeed generating ror from my C++ bit shifts. Too bad none of them cared to emit vbroadcasti128 for the shuffling constant, as written in the source code.Arguelles
Yeah, compilers are garbage at loading constants efficiently, often shooting themselves in the foot or just constant-propagating into bloated 32-byte constants. It's like they don't realize that broadcast loads are as cheap as normal loads, or their optimizers aren't designed to be be able to stop there.Mannikin

© 2022 - 2024 — McMap. All rights reserved.