Query points on the vertices of a Hamming cube

I have N points that lie only on the vertices of a cube, of dimension D, where D is something like 3.

A vertex may not contain any point. So every point has coordinates in {0, 1}^D. I am only interested in query time, as long as the memory cost is reasonable ( not exponential in N for example :) ).

Given a query that lies on one of the cube's vertices and an input parameter r, find all the vertices (thus points) that have hamming distance <= r with the query.

What's the way to go in a c++ environment?

I am thinking of a kd-tree, but I am not sure and want help, any input, even approximative, would be appreciated! Since hamming distance comes into play, bitwise manipulations should help (e.g. XOR).

There is a nice bithack to go from one bitmask with k bits set to the lexicographically next permutation, which means it's fairly simple to loop through all masks with k bits set. XORing these masks with an initial value gives all the values at hamming distance exactly k away from it.

So for D dimensions, where D is less than 32 (otherwise change the types),

uint32_t limit = (1u << D) - 1;
for (int k = 1; k <= r; k++) {
    uint32_t diff = (1u << k) - 1;
    while (diff <= limit) {
        // v is the input vertex
        uint32_t vertex = v ^ diff;
        // use it
        diff = nextBitPermutation(diff);
    }
}

Where nextBitPermutation may be implemented in C++ as something like (if you have __builtin_ctz)

uint32_t nextBitPermutation(uint32_t v) {
    // see https://graphics.stanford.edu/~seander/bithacks.html#NextBitPermutation
    uint32_t t = v | (v - 1);
    return (t + 1) | (((~t & -~t) - 1) >> (__builtin_ctz(v) + 1));
}

Or for MSVC (not tested)

uint32_t nextBitPermutation(uint32_t v) {
    // see https://graphics.stanford.edu/~seander/bithacks.html#NextBitPermutation
    uint32_t t = v | (v - 1);
    unsigned long tzc;
    _BitScanForward(&tzc, v); // v != 0 so the return value doesn't matter
    return (t + 1) | (((~t & -~t) - 1) >> (tzc + 1));
}

If D is really low, 4 or lower, the old popcnt-with-pshufb works really well and generally everything just lines up well, like this:

uint16_t query(int vertex, int r, int8_t* validmask)
{
    // validmask should be array of 16 int8_t's,
    // 0 for a vertex that doesn't exist, -1 if it does
    __m128i valid = _mm_loadu_si128((__m128i*)validmask);
    __m128i t0 = _mm_set1_epi8(vertex);
    __m128i r0 = _mm_set1_epi8(r + 1);
    __m128i all =        _mm_setr_epi8(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15);
    __m128i popcnt_lut = _mm_setr_epi8(0, 1, 1, 2, 1, 2, 2, 3, 1, 2,  2,  3,  2,  3,  3,  4);
    __m128i dist = _mm_shuffle_epi8(popcnt_lut, _mm_xor_si128(t0, all));
    __m128i close_enough = _mm_cmpgt_epi8(r0, dist);
    __m128i result = _mm_and_si128(close_enough, valid);
    return _mm_movemask_epi8(result);
}

This should be fairly fast; fast compared to the bithack above (nextBitPermutation, which is fairly heavy, is used a lot there) and also compared to looping over all vertices and testing whether they are in range (even with builtin popcnt, that automatically takes at least 16 cycles and the above shouldn't, assuming everything is cached or even permanently in a register). The downside is the result is annoying to work with, since it's a mask of which vertices both exist and are in range of the queried point, not a list of them. It would combine well with doing some processing on data associated with the points though.

This also scales down to D=3 of course, just make none of the points >= 8 valid. D>4 can be done similarly but it takes more code then, and since this is really a brute force solution that is only fast due to parallelism it fundamentally gets slower exponentially in D.

Recommended topics

Hot tags