Overriding GetHashCode()
Asked Answered
S

3

20

In this article, Jon Skeet mentioned that he usually uses this kind of algorithm for overriding GetHashCode().

public override int GetHashCode()
{
  unchecked // Overflow is fine, just wrap
  {
    int hash = 17;
    // Suitable nullity checks etc, of course :)
    hash = hash * 23 + Id.GetHashCode();
    return hash;
  }
}

Now, I've tried using this, but Resharper tells me that the method GetHashCode() should be hashing using only read-only fields (it compiles fine, though). What would be a good practice, because right now I can't really have my fields to be read-only?

I tried generating this method by Resharper, here's the result.

public override int GetHashCode()
{
  return base.GetHashCode();
}

This doesn't contribute much, to be honest...

Sarilda answered 14/7, 2012 at 5:27 Comment(4)
may be duplicate of: Overriding GetHashCodeMagnanimous
@Magnanimous Nowhere in this question people talk about read-only fields.Sarilda
You may want to refer to this article on Eric Lippert's blog. In particular, note the section "Rule: the integer returned by GetHashCode must never change while the object is contained in a data structure that depends on the hash code remaining stable." He writes that "It is permissible, though dangerous, to make an object whose hash code value can mutate as the fields of the object mutate."Enloe
Possible duplicate of What is the best algorithm for an overridden System.Object.GetHashCode?Collywobbles
W
18

If all your fields are mutable and you have to implement GetHashCode method, I am afraid this is the implementation you would need to have.

public override int GetHashCode() 
{ 
    return 1; 
} 

Yes, this is inefficient but this is at least correct.

The problem is that GetHashCode is being used by Dictionary and HashSet collections to place each item in a bucket. If hashcode is calculated based on some mutable fields and the fields are really changed after the object is placed into the HashSet or Dictionary, the object can no longer be found from the HashSet or Dictionary.

Note that with all the objects returning the same HashCode 1, this basically means all the objects are being put in the same bucket in the HashSet or Dictionary. So, there is always only one single bucket in the HashSet or Dictionary. When trying to lookup the object, it will do a equality check on each of the objects inside the only bucket. This is like doing a search in a linked list.

Somebody may argue that implementing the hashcode based on mutable fields can be fine if we can make sure fields are never changed after the objects added to HashCode or Dictionary collection. My personal view is that this is error-prone. Somebody taking over your code two years later might not be aware of this and breaks the code accidentally.

Whiteeye answered 14/7, 2012 at 6:18 Comment(9)
What about strucs? Do the fields have to be immutable also? I find it hard to change their fields while store in a Dictionary.Benoni
There is no need to override GetHashCode at all in this case, the default implementation will work much better. Although the hash will not depend on the object's contents, at least it will be different for different objects.Martguerita
@AntonTykhyy If we are using the default GetHashCode(), two objects with the same field Id=1 will have different HashCode and therefore look up from different bucket. I assume in this case, we want them to have the same Hashcode.Whiteeye
@ja72 struct is the same. You have to generate the Hashcode from immutable field.Whiteeye
@HarveyKwok: well, that really depends on whether the objects are equatable (where equality is based on equality of mutable fields) or not. If yes, then you are right, the default GetHashCode is not suitable. If not, the object's identity does not depend on the values of its (mutable) fields and the default GetHashCode is ok. However mutable equatable objects are not a good idea, for the same reason that calculating GetHashCode based on mutable fields is not a good idea.Martguerita
@ja72 Most people agree that a struct should be immutable. Therefore all its fields will be readonly (or at least private and certain not to be changed once the struct is created), and the hash algorithm should use all fields (which are relevant for Equals) in that case.Highball
This answer is slightly ridiculous. There are two relevant requirements: that equality implies hash equality, and that the hash code doesn't change while the object is in a hash-based collection (i.e. is a key in a Dictionary, or an item in a HashSet). Yes, you can completely destroy the whole point of a hash-based collection and return a constant hashcode. But you could also simply not mutate the mutable object while it's being used as a key. Immutability isn't required, simply temporary non-mutation.Ribwort
@Eamon: I don't think this is a good answer either. I personally don't do that in my project if look up performance is a concern. The problem is there is no easy way in C# to warrant the field is immutable while it's used in the key. The closest one that I know of is Freezable object in WPF. You may need to have something similar or something even more flexible than this. You don't want to say programmer will make sure the fields are immutable when they are being used. The code will be very error-prone.Whiteeye
It'd be nice to be practically able to enforce immutability, but the mere fact that it's not doesn't mean it's reasonable to throw up your hands and assume everything is mutated. There are lots of unprovable code-contracts you must satisfy to write a valid C# application, and a (fairly simply) contract to "never change after construction" isn't unreasonable. People make huge apps in entirely dynamic languages; lack of static verification is problematic, not fatal. The C#-esque solution here is to accept that immutability isn't practically enforcable in C# and get on with it.Ribwort
I
5

Please note that your GetHashCode must go hand in hand with your Equals method. And if you can just use reference equality (when you'd never have two different instances of your class that can be equal) then you can safely use Equals and GetHashCode that are inherited from Object. This would work much better than simply return 1 from GetHashCode.

Instructive answered 14/7, 2012 at 15:39 Comment(0)
N
-2

I personally tend to return a different numeric value for each implementation of GetHashCode() in a class which has no immutable fields. This means if I have a dictionary containing different implementing types, there is a chance the different instances of different types will be put in different buckets.

For example

public class A
{
    // TODO Equals override

    public override int GetHashCode()
    {
        return 21313;
    } 
}

public class B
{
    // TODO Equals override

    public override int GetHashCode()
    {
        return 35507;
    } 
}

Then if I have a Dictionary<object, TValue> containing instances of A, B and other types , the performance of the lookup will be better than if all the implementations of GetHashCode returned the same numeric value.

It should also be noted that I make use of prime numbers to get a better distribution.

As per the comments I have provided a LINQPad sample here which demonstrates the performance difference between using return 1 for different types and returning a different value for each type.

Neanderthal answered 29/10, 2013 at 9:54 Comment(11)
Can you expand on your last sentence?Andrus
Thinking back on this answer, that last sentence is not really applicable for this situation. When implementing a GetHashCode() method with readonly fields however, the use of prime numbers ensures you are more likely to achieve better distribution over the buckets of a dictionary or hashset. There are many articles online but this SO question should get you started.Neanderthal
An explanation of the downvote would be nice, I feel this is a reasonable approach to take when dealing with different types.Neanderthal
@Neanderthal Sorry, I usually do comment - forgot to this time!Ribwort
@Lukazoid: the issue with this solution is that it encourages O(N^2) behavior. The lack of clashes between different classes hardly helps, and it only helps if the code is mixing types and undermining the type system - i.e. code you really don't want to bother helping.Ribwort
@EamonNerbonne In what way does this encourage O(N^2) behavior? Mixing types in no way undermines the type system, consider a HashSet<IDoSomethingUnique>, that contain instances of many different types all of which implement IDoSomethingUnique. If every implementation of IDoSomethingUnique had no immutable fields, my implementation of GetHashCode() would ensure duplicates are found quicker than if I returned 1 for every implementation.Neanderthal
If your collection contains multiple objects of the same type - which is by far and away the most common scenario - you'll get O(N^2) behavior in N, the number of objects of that type. Even if there are multiple implementations, if any one implementation occurs several times, you'll get the same behavior for that implementation. Frankly, the issue here isn't whether your answer works, it's that it's a trap - anybody that can use this solution probably should not.Ribwort
Let me put it this way: this implementation is not significantly better than simply using a List<...> with the simple brute-force search implementations such as Remove and Contains. And that solution is likely faster and simpler for most cases, since it doesn't have the overhead of tracking the (pointless) hashcode, and doesn't require overriding GetHashCode at all.Ribwort
I'm still not sure where you are getting O(N^2) from, there is nothing here which would make it exponential. If the collection contained multiple objects of the same type, we would get O(N) performance as it checked each item for equality (same as when using return 1).Neanderthal
@EamonNerbonne here is a LINQPad sample, run it and look at the elapsed time, then uncomment the #define at the top of the file and run again. If it all goes to plan, the first run should have been quicker. This shows that returning values other than 1 can have a positive impact. I do agree with you about using a List<T> as it will be better, however this question was after an implementation of GetHashCode(), not advice on which type of collection to use.Neanderthal
Let us continue this discussion in chat.Ribwort

© 2022 - 2024 — McMap. All rights reserved.