From the information theoretic view, you have 2^64
values to map into 2^63-1
values.
As such, mapping is trivial with the modulus operator, since it always has a non-negative result:
y = 1 + x % 0x7fffffffffffffff; // the constant is 2^63-1
This could be pretty expensive, so what else is possible?
The simple math 2^64 = 2 * (2^63 - 1) + 2
says we will have two source values mapping to one target value except in two special cases, where three will go to one. Think of these as two special 64-bit values, call them x1
and x2
, that each share a target with two other source values. In the mod
expression above, this occurs by "wrapping". The target values y=2^31-2
and y=2^31-3
have three mappings. All others have two. Since we have to use something more complex than mod
anyway, let's seek a way to map the special values wherever we like at low cost
For illustration let's work with mapping a 4-bit signed int x
in [-8..7] to y
in [1..7], rather than the 64-bit space.
An easy course is to have x
values in [1..7] map to themselves, then the problem reduces to mapping x
in [-8..0] to y
in [1..7]. Note there are 9 source values here and only 7 targets as discussed above.
There are obviously many strategies. At this point you can probably see a gazzilion. I'll describe only one that's particularly simple.
Let y = 1 - x
for all values except special cases x1 == -8
and x2 == -7
. The whole hash function thus becomes
y = x <= -7 ? S(x) : x <= 0 ? 1 - x : x;
Here S(x)
is a simple function that says where x1
and x2
are mapped. Choose S
based on what you know about the data. For example if you think high target values are unlikely, map them to 6 and 7 with S(x) = -1 - x
.
The final mapping is:
-8: 7 -7: 6 -6: 7 -5: 6 -4: 5 -3: 4 -2: 3 -1: 2
0: 1 1: 1 2: 2 3: 3 4: 4 5: 5 6: 6 7: 7
Taking this logic up to the 64-bit space, you'd have
y = (x <= Long.MIN_VALUE + 1) ? -1 - x : x <= 0 ? 1 - x : x;
Many other kinds of tuning are possible within this framework.
return (abs(x) == 0) ? 42 : abs(x)
– Hookah