I need a fast, simple hash function that creates a unique identifier for a pair of uint32_t
values - so the same hash value for (2,7)
and (7,2)
.
Any idea?
I need a fast, simple hash function that creates a unique identifier for a pair of uint32_t
values - so the same hash value for (2,7)
and (7,2)
.
Any idea?
To answer my own question, the solution is:
uint64_t hash(uint32_t x, uint32_t y)
{
const uint64_t a = static_cast<uint64_t>(x);
const uint64_t b = static_cast<uint64_t>(y);
if (x < y) return (b << 32) | a;
else return (a << 32) | b;
}
Which can be improved to the branchless version
uint64_t hash(uint32_t x, uint32_t y)
{
const uint64_t a = static_cast<uint64_t>(x);
const uint64_t b = static_cast<uint64_t>(y);
const uint64_t h0 = (b << 32) | a;
const uint64_t h1 = (a << 32) | b;
return (x < y) ? h0 : h1; // conditional move (CMOV) instruction
}
These methods are perfect hash functions - they guarantee zero collisions. However, they have the disadvantage that you cannot hash values above 2^32 - 1
.
if (x>y)
when I had the exact same need as yours, but now that I think about it, hash(x) ^ hash(y)
would have done as well (with the issue that all pairs of identical elements (x, x)
have the same hash. That can be a problem for some applications) –
Keary max << 32
invokes undefined behaviour (because you are shifting by an amount >= datatype size). Only after the |
the result is guaranteed to be cast to uint64_t
. In practice your compiler is doing 64bit computation anyway, but this cannot be assumed for sure. –
Monteux constexpr uint32_t hash_max = ...;
constexpr uint32_t commutative_hash(uint32_t i, uint32_t j) {
return (i*j + (i*i)*(j*j) + (i*i*i)*(j*j*j)) % hash_max;
};
Extra parentheses are for compiler - it will be easier to optimize this expression.
Do not use any conditional instruction (or std::max
/std::min
)
which breaks CPU pipeline (and is slow) if you want to make a fast function.
© 2022 - 2024 — McMap. All rights reserved.