What are the rules I should follow to ensure GetHashCode() method returns unique value for an object?
Asked Answered
S

3

0

What are the rules I should follow to ensure GetHashCode() method returns unique value for an object?

For example:

  • Should I include some prive members for the calculation?
  • Should I multiply instead of sum?
  • Can I be sure that I am generating a uniqe hash code for a particular object graph? etc.
Suilmann answered 5/9, 2011 at 20:10 Comment(2)
Stop. How many instances of String are there? How many instances of int are there? What is the return type of String.GetHashCode? Do you see the problem?Grit
Based on your previous question you shoul leave GetHashCode() et al alone. First ask when to mess with it. And don't post a question, plenty already here.Judaize
R
6

You shouldn't even aim for GetHashCode() returning a unique value for each object. That's not the point of GetHashCode().

Eric Lippert has a great post about hash codes which you should read thoroughly. Basically you want to end up with something which will always return the same value for two equal objects (and you need to work out what you mean by equal) and is likely to return different values for two non-equal objects.

Personally I tend to use an implementation like this:

public override int GetHashCode()
{
    int hash = 17;
    hash = hash * 31 + field1.GetHashCode();
    hash = hash * 31 + field2.GetHashCode();
    hash = hash * 31 + field3.GetHashCode();
    ...
    return hash;
}

Things to watch out for:

  • If you have mutable objects, be careful! You shouldn't mutate an object after using it as the key in a hash map.
  • If your fields can be null, you need to check for that while calculating your hash. For example:

    hash = hash * 31 + (field2 == null ? 0 : field2.GetHashCode());
    
Resource answered 5/9, 2011 at 20:12 Comment(6)
I will definetely read the article; but the idea is to feed the logic with prime numbers as far as I see. Maybe a very stupid question but do you know if there is way to calculate the most safest prime number canditate for such a scenerio? Thanks!Suilmann
@pencilCake: Safest? Safest against what? If two objects create the same hash code, everything will still work - it'll just take slightly longer to differentiate between the keys.Resource
I will read the article first; seems I am not on the same page with you. Thanks Jon!Suilmann
@pencilCake: If you are worried about an attacker taking advantage of a weakness in your hash code then no prime number is the right prime number. What you have to do in that case is design a hashing algorithm that changes its own parameters in such a way that the attacker cannot craft a data set that causes many collisions. That's for advanced players only; if this is a situation you actually face then hire a professional who deals with such scenarios and knows how to do it right.Strip
@Eric Lipert: By the way, your article Jon mentioned was really useful for me! Thanks for that!Suilmann
@pencilCake: Glad it helped! To address your question about prime numbers: there is a bit of a black art to choosing good hash functions. My advice is to pick a hash function, generate a large body of realistic data, start making dictionaries of reasonable size, and see if the performance is acceptable. If not, try varying the algorithm parameters. No point in tuning a hash algorithm that is already good enough.Strip
S
1

You don't necessarily need a fool proof hashcode because you also need to override Equals for comparison. Usually what I do is take the values I know are different across objects, concatenate them into a string and return the hash for that.

Sugar answered 5/9, 2011 at 20:13 Comment(2)
I've always been curious about the performance characteristics of this form of hash generation vs the sort that Jon mentioned in his answer.Tractate
It really depends you would have to test it. I find this a "trivial" or easy way to do it if you just need a generic hash function that is probably going to be relatively close to unique. So I tend to just have a line like ("" + member_a + member_b + member_c).HashCode(); and I've never had issues with it but this is a quick and dirty way of doing it :PSugar
A
0

I think your answer is here: See Jon Skeet answer, generally pretty reliable way to calculate it. Proved by time :)

What is the best algorithm for an overridden System.Object.GetHashCode?

Andria answered 5/9, 2011 at 20:13 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.