How to compare two vectors using SIMD and get a strncmp like result? - McMap

About

How to compare two vectors using SIMD and get a strncmp like result?

Asked 8/2, 2022 at 14:18 Answered 8/2, 2022 at 16:1

Solved c simd avx avx2

B

1

5

I want to achieve something like strncmp result but not that complicated I tried to read https://code.woboq.org/userspace/glibc/sysdeps/x86_64/multiarch/strcmp-avx2.S.html source code but I failed to understand it

suppose we have to 256 bit vector how can I compare these two based on 8 bit comparison to achieve result like strncmp

I know there is a library but I want to understand the basics.

how it return -1,0,1 result with _mm256_cmpeq_epi8 and _mm256_min_epu8

Brooking answered 8/2, 2022 at 14:18 Comment(2)

I can't answer the SIMD part, but how it return -1,0,1 result is not required. strncmp does not need to return -1, 0, 1 it just needs to return <0, 0, >0 – Dun 8/2, 2022 at 14:30

thats true. but the problem is: the first bit of a negative number must be 1 and if we use _mm256_movemask_epi8, a string maybe be less than the other string but the first byte maybe equal(thus resulting in a positive number) – Brooking 8/2, 2022 at 15:47

A

6

I would do it like that.

inline int compareBytes( __m256i a, __m256i b )
{
    // Compare for both a <= b and a >= b
    __m256i min = _mm256_min_epu8( a, b );
    __m256i le = _mm256_cmpeq_epi8( a, min );
    __m256i ge = _mm256_cmpeq_epi8( b, min );

    // Reverse bytes within 16-byte lanes
    const __m128i rev16 = _mm_set_epi8( 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 );
    const __m256i rev32 = _mm256_broadcastsi128_si256( rev16 );
    le = _mm256_shuffle_epi8( le, rev32 );
    ge = _mm256_shuffle_epi8( ge, rev32 );

    // Move the masks to scalar registers
    uint32_t lessMask = (uint32_t)_mm256_movemask_epi8( le );
    uint32_t greaterMask = (uint32_t)_mm256_movemask_epi8( ge );

    // Flip high/low 16-bit pieces in the masks.
    // Apparently, modern compilers are smart enough to emit ROR instructions for that code
    lessMask = ( lessMask >> 16 ) | ( lessMask << 16 );
    greaterMask = ( greaterMask >> 16 ) | ( greaterMask << 16 );

    // Produce the desired result
    if( lessMask > greaterMask )
        return -1;
    else if( lessMask < greaterMask )
        return +1;
    else
        return 0;
}

The reason that method works, integer comparison is essentially searching for the most significant bit which differs, and comparison result is equal to the difference in that most significant different bit. Because we reversed order of the bytes being tested, the first byte in the vectors corresponds to the most significant bit in the masks. For this reason, ( lessMask > greaterMask ) expression evaluates to true when for the first different byte in the source vectors ( a < b ) evaluated to true.

Arguelles answered 8/2, 2022 at 16:1 Comment(3)

Oh right, good edit, scalar ror wins again over SIMD lane-swap; funny it came up twice in the past few days. Too bad x86-64 never added an instruction like ARM's rbit. As a fixed bit-shuffle, it'd be cheap to implement, just wires. – Mannikin 8/2, 2022 at 16:38

@PeterCordes Yeah, I’ve checked and both gcc and clang indeed generating ror from my C++ bit shifts. Too bad none of them cared to emit vbroadcasti128 for the shuffling constant, as written in the source code. – Arguelles 8/2, 2022 at 16:45

Yeah, compilers are garbage at loading constants efficiently, often shooting themselves in the foot or just constant-propagating into bloated 32-byte constants. It's like they don't realize that broadcast loads are as cheap as normal loads, or their optimizers aren't designed to be be able to stop there. – Mannikin 8/2, 2022 at 16:47

Recommended topics

#Godot #Unity #Godot 4.X #Mongodb

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

© 2022 - 2024 — McMap. All rights reserved.