Will this hash function collide unusually frequently?
Asked Answered
R

3

5

I had the following code to generate a hash of an object:

public int GetHashCode(MyType obj)
{
   return (obj.Prop1.GetHashCode() + obj.Prop2.GetHashCode() + obj.Prop3.GetHashCode()).GetHashCode();
}

I.e. I add all the properties' hash codes and then take the hash of this.

In review, a coworker suggested that this will collide too frequently. I'm not sure that this is true because:

  1. Given that hash codes are chosen with equal frequency among positive and negative numbers and they wrap around, I don't think there's any additional information we gain about the likelihood of these numbers' sum as opposed to the numbers themselves
  2. To the extent that their sum is non-random, hash codes are designed to make numbers that are "close together" become "far apart", so feeding a non-uniformly-distributed value into the function shouldn't be an issue

Who is correct?

It is in C#, in case the answer is language-specific.

Roseleeroselia answered 8/6, 2011 at 21:57 Comment(1)
What was your coworker's reason?Swearingen
S
6

Yes.

Just suppose Prop1, Prop2 etc are of type int. Usually only the lower range of integers is used. Your sum approach will collide more often than necessary.

The HasCode of 7 is 7, which makes perfect sense when hashing int by it self. But with your code the tuples <7, 3>, <3, 7> and <8, 2> will all have the same Hash. The same with simple XOR instead of Addition.

The common approach is to add some (prime) numbers and shifting:

public int GetHashCode(MyType obj)
{
  int hash = 0;
  unchecked
  {         
     hash += 19 * obj.Prop1.GetHashCode();
     hash += 31 * obj.Prop2.GetHashCode();
     hash += 37 * obj.Prop3.GetHashCode();
  }
  return hash;
}

The numbers 19, 31, 37 are not too critical. And if you prefer you can use OR or XOR instead of + .

Spadiceous answered 8/6, 2011 at 22:1 Comment(1)
Prime numbers are good and are preferable to shifting, since a simple binning algorithm may well just take the lower N bits of the HashCode; if the properties are shifted, they may end up completely ignored.Advocation
R
2

XORing would be better:

public int GetHashCode(MyType obj)
{
   return obj.Prop1.GetHashCode() ^ 
          obj.Prop2.GetHashCode() ^ 
          obj.Prop3.GetHashCode();
}
Rayraya answered 8/6, 2011 at 22:1 Comment(1)
See Henk Holterman's reasoning. Mixing with shifts should provide better distribution if GetHashCode for some of the properties does not use whole range...Moskow
S
0

You can use a modified FNV HashCode generator, a very similar question has been answered (by me) here

Sizemore answered 28/6, 2011 at 2:24 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.