While the template outlined in Jon Skeet's answer works well in general as a hash function family, the choice of the constants is important and the seed of 17
and factor of 31
as noted in the answer do not work well at all for common use cases. In most use cases, the hashed values are much closer to zero than int.MaxValue
, and the number of items being jointly hashed are a few dozen or less.
For hashing an integer tuple {x, y}
where -1000 <= x <= 1000
and -1000 <= y <= 1000
, it has an abysmal collision rate of almost 98.5%. For example, {1, 0} -> {0, 31}
, {1, 1} -> {0, 32}
, etc. If we expand the coverage to also include n-tuples where 3 <= n <= 25
, it does less terrible with a collision rate of about 38%. But we can do much better.
public static int CustomHash(int seed, int factor, params int[] vals)
{
int hash = seed;
foreach (int i in vals)
{
hash = (hash * factor) + i;
}
return hash;
}
I wrote a Monte Carlo sampling search loop that tested the method above with various values for seed and factor over various random n-tuples of random integers i
. Allowed ranges were 2 <= n <= 25
(where n
was random but biased toward the lower end of the range) and -1000 <= i <= 1000
. At least 12 million unique collision tests were performed for each seed and factor pair.
After about 7 hours running, the best pair found (where the seed and factor were both limited to 4 digits or less) was: seed = 1009
, factor = 9176
, with a collision rate of 0.1131%. In the 5- and 6-digit areas, even better options exist. But I selected the top 4-digit performer for brevity, and it peforms quite well in all common int
and char
hashing scenarios. It also seems to work fine with integers of much greater magnitudes.
It is worth noting that "being prime" did not seem to be a general prerequisite for good performance as a seed and/or factor although it likely helps. 1009
noted above is in fact prime, but 9176
is not. I explicitly tested variations on this where I changed factor
to various primes near 9176
(while leaving seed = 1009
) and they all performed worse than the above solution.
Lastly, I also compared against the generic ReSharper recommendation function family of hash = (hash * factor) ^ i;
and the original CustomHash()
as noted above seriously outperforms it. The ReSharper XOR style seems to have collision rates in the 20-30% range for common use case assumptions and should not be used in my opinion.