Computing Hamming distances to several strings with SSE
Asked Answered
D

1

13

I have n (8 bit) character strings all of them of the same length (say m), and another string s of the same length. I need to compute Hamming distances from s to each of the others strings. In plain C, something like:

unsigned char strings[n][m];
unsigned char s[m];
int distances[n];

for(i=0; i<n; i++) {
  int distances[i] = 0;
  for(j=0; j<m; j++) {
    if(strings[i][j] != s[j])
      distances[i]++;
  }
}

I would like to use SIMD instructions with gcc to perform such computations more efficiently. I have read that PcmpIstrI in SSE 4.2 can be useful and my target computer supports that instruction set, so I would prefer a solution using SSE 4.2.

EDIT:

I wrote following function to compute Hamming distance between two strings:

static inline int popcnt128(__m128i n) {
  const __m128i n_hi = _mm_unpackhi_epi64(n, n);
  return _mm_popcnt_u64(_mm_cvtsi128_si64(n)) + _mm_popcnt_u64(_mm_cvtsi128_si64(n_hi));
}

int HammingDist(const unsigned char *p1, unsigned const char *p2, const int len) {
#define MODE (_SIDD_UBYTE_OPS | _SIDD_CMP_EQUAL_EACH | _SIDD_BIT_MASK | _SIDD_NEGATIVE_POLARITY)
  __m128i smm1 = _mm_loadu_si128 ((__m128i*) p1);
  __m128i smm2 = _mm_loadu_si128 ((__m128i*) p2);
  __m128i ResultMask;

  int iters = len / 16;
  int diffs = 0;
  int i;

  for(i=0; i<iters; i++) {
    ResultMask = _mm_cmpestrm (smm1,16,smm2,16,MODE); 

    diffs += popcnt128(ResultMask);
    p1 = p1+16;
    p2 = p2+16;
    smm1 = _mm_loadu_si128 ((__m128i*)p1);
    smm2 =_mm_loadu_si128 ((__m128i*)p2);
  }

  int mod = len % 16;
  if(mod>0) {
     ResultMask = _mm_cmpestrm (smm1,mod,smm2,mod,MODE); 
     diffs += popcnt128(ResultMask);
  }

  return diffs;
} 

So I can solve my problem by means of:

for(i=0; i<n; i++) {
  int distances[i] = HammingDist(s, strings[i], m);
}

Is this the best I can do or can I use the fact that one of the strings compared is always the same? In addition, should I do some alignment on my arrays to improve performance?

ANOTHER ATTEMPT

Following Harold's recomendation, I have written following code:

void _SSE_hammingDistances(const ByteP str, const ByteP strings, int *ds, const int n, const int m) {
    int iters = m / 16;

    __m128i *smm1, *smm2, diffs;

    for(int j=0; j<n; j++) {
        smm1 = (__m128i*)  str;
        smm2 = (__m128i*)  &strings[j*(m+1)]; // m+1, as strings are '\0' terminated

        diffs =  _mm_setzero_si128();

        for (int i = 0; i < iters; i++) {
            diffs = _mm_add_epi8(diffs, _mm_cmpeq_epi8(*smm1, *smm2));
            smm1 += 1;
            smm2 += 1;
        }

        int s = m;
        signed char *ptr = (signed char *) &diffs;
        for(int p=0; p<16; p++) {
            s += *ptr;
            ptr++;
        }

        *ds = s;
        ds++;
    }
}

but I am not able to do the final addition of bytes in __m128i by using psadbw. Can anyone please help me with that?

Diamante answered 28/7, 2013 at 21:11 Comment(5)
What exactly is your question?Graehme
Actually pcmpistri isn't useful at all, in this situation all you need is a plain old pcmpeqb. And you don't need any of that popcnt-stuff either, just subtract the result of the comparison from the count (because the result is -1 where they are different), and psadbw them in the end (or if your strings are 4K long or longer, psadbw just before processing 4K bytes)Cathey
Thanks harold, I have posted my attempt although I was not able to use psadbw.Diamante
If one of the operands of psadbw is zero, it just sums the bytes of the other operand. You can use that instead of the loop that does the horizontal sum. psadbw sums by blocks of 8 though so you still need to extract 2 words and sum them. Btw I notice that you add the results of the comparison, you can do that but that will give you a negative result (and with psadbw, that's bad - it treats the bytes as unsigned instead of sign-extending them)Cathey
Remember that your i loop must not have more than 255 iterations, because otherwise you'll get byte overflow. At the very end, you may try also to unroll i loop 2 or 4 times to see if it gives profit.Concrescence
F
4

Here's an improved version of your latest routine, which uses PSADBW (_mm_sad_epu8) to eliminate the scalar code:

void hammingDistances_SSE(const uint8_t * str, const uint8_t * strings, int * const ds, const int n, const int m)
{
    const int iters = m / 16;

    const __m128i smm1 = _mm_loadu_si128((__m128i*)str);

    assert((m & 15) == 0);      // m must be a multiple of 16

    for (int j = 0; j < n; j++)
    {
        __m128i smm2 = _mm_loadu_si128((__m128i*)&strings[j*(m+1)]); // m+1, as strings are '\0' terminated

        __m128i diffs = _mm_setzero_si128();

        for (int i = 0; i < iters; i++)
        {
            diffs = _mm_sub_epi8(diffs, _mm_cmpeq_epi8(smm1, smm2));
        }

        diffs = _mm_sad_epu8(diffs, _mm_setzero_si128());
        ds[j] = m - (_mm_extract_epi16(diffs, 0) + _mm_extract_epi16(diffs, 4));
    }
}
Fetus answered 19/10, 2015 at 8:16 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.