Should GetHashCode be implemented for IEquatable<T> on mutable types?
Asked Answered
T

3

8

I'm implementing IEquatable<T>, and I am having difficulty finding consensus on the GetHashCode override on a mutable class.

The following resources all provide an implementation where GetHashCode would return different values during the object's lifetime if the object changes:

However, this link states that GetHashCode should not be implemented for mutable types for the reason that it could cause undesirable behaviour if the object is part of a collection (and this has always been my understanding also).

Interestingly, the MSDN example implements the GetHashCode using only immutable properties which is in line with my understanding. But I'm confused as to why the other resources don't cover this. Are they simply wrong?

And if a type has no immutable properties at all, the compiler warns that GetHashCode is missing when I override Equals(object). In this case, should I implement it and just call base.GetHashCode() or just disable the compiler warning, or have I missed something and GetHashCode should always be overridden and implemented? In fact, if the advice is that GetHashCode should not be implemented for mutable types, why bother implementing for immutable types? Is it simply to reduce collisions compared to the default GetHashCode implementation, or does it actually add more tangible functionality?

To summarise my Question, my dilemma is that using GetHashCode on mutable objects means it can return different values during the lifetime of the object if properties on it change. But not using it means that the benefit of comparing objects that might be equivalent is lost because it will always return a unique value and thus collections will always fall back to using Equals for its operations.

Having typed this Question out, another Question popped up in the 'Similar Questions' box that seems to address the same topic. The answer there seems to be quite explicit in that only immutable properties should be used in a GetHashCode implementation. If there are none, then simply don't write one. Dictionary<TKey, TValue> will still function correctly albeit not at O(1) performance.

Tisman answered 1/3, 2018 at 17:43 Comment(4)
It is hogwash. Only an issue if the object is used as the key in a dictionary or stored in a hashset. If the object mutates after storage then GetHashCode should not return a different value. Maybe something you'd have to enforce by actually providing a custom GetHashCode :)Overlie
@HansPassant I agree, but the custom GetHashCode must use immutable properties only. Otherwise, use the default one, which as per the last paragraph in my Question, would still allow a Dictionary<TKey, TValue> to function correctly, but without the O(1) performance it would normally have.Tisman
@Neo, no, with the default implementaton of GetHashCode() your dictionary will not function correctly. See this exampleEyespot
@GianPaolo Then either this is considered desired behaviour where the type has no immutable properties, or there is a solution to override GetHashCode. How would you implement GetHashCode in this example?Tisman
E
2

Mutable classes work quite bad with Dictionaries and other classes that relies on GetHashCode and Equals.

In the scenario you are describing, with mutable object, I suggest one of the following:

class ConstantHasCode: IEquatable<ConstantHasCode>
{
    public int SomeVariable;
    public virtual Equals(ConstantHasCode other)
    {
        return other.SomeVariable == SomeVariable;
    }

    public override int GetHashCode()
    {
        return 0;
    }
}

or

class ThrowHasCode: IEquatable<ThrowHasCode>
{
    public int SomeVariable;
    public virtual Equals(ThrowHasCode other)
    {
        return other.SomeVariable == SomeVariable;
    }

    public override int GetHashCode()
    {
        throw new ApplicationException("this class does not support GetHashCode and should not be used as a key for a dictionary");
    }
}

With the first, Dictionary works (almost) as expected, with performance penalty in lookup and insertion: in both cases, Equals will be called for every element already in the dictionary until a comparison return true. You are actually reverting to performance of a List

The second is a way to tell the programmers will use your class "no, you cannot use this within a dictionary". Unfortunately, as far as I know there is no method to detect it at compile time, but this will fail the first time the code adds an element to the dictionary, very likely quite early while developping, not the kind of bug happening only in production environment with an unpredicted set of input.

Last but not least, ignore the "mutable" problem and implement GetHashCode using member variables: now you have to be aware that you are not free to modify the class when it's used withing a Dictionary. In some scenario this can be acceptable, in other it's not

Eyespot answered 1/3, 2018 at 22:1 Comment(3)
Someone downvoted this answer. I upvoted, because I think it addresses my Question best. I found a very good summary here. And I particularly like this comment and the Answer it's commenting on. That made the penny drop for me. GetHashCode should still be used on types with no immutable properties, but it either shouldn't be used as a key, or it should not be changed once it is used as a key.Tisman
Throwing an exception in GetHashCode generally seems like a horrible idea, unless you have a very, very good reason to not allow the object to be contained in a dictionary. It's better to just return 0Pohai
@VapidLinus, I think there is no "perfect" solution for mutable objects: you will have to give up to at least one of the "expected" behaviours of a Dictionary. Returning a constant is a "legal" implementation for GetHashCode, but it hides to your caller that a Dictonary/Hashtable/HashSet/Whatever will not be as fast as he can expect. You need to make a choice depending on your current scenario, and throwing an exception is an option to consider in some case. By the way, returning 0 it's actually something I use quite often.Eyespot
T
0

It all depends of what kind of collection type you are talking about. For my answer I will assume you are talking about Hash Table based collections and in particular I will address it for .NET Dictionary and Key calculation.

So best way to identify what will happen if you modify key( given your key is a class which does custom HashCode calculation) is to look at the .NET source. From .NET source we can see that your key value pair is now wrapped into Entry struct which carries hashcode which was calculated on addition of your value. Meaning that if you change HashCode value after that time of your key was added, it will no longer be able to find a value in dictionary.

Code to prove it:

    static void Main()
    {
        var myKey = new MyKey { MyBusinessKey = "Ohai" };
        var dic = new Dictionary<MyKey, int>();
        dic.Add(myKey, 1);
        Console.WriteLine(dic[myKey]);
        myKey.MyBusinessKey = "Changing value";
        Console.WriteLine(dic[myKey]); // Key Not Found Exception.
    }

    public class MyKey
    {
        public string MyBusinessKey { get; set; }
        public override int GetHashCode()
        {
            return MyBusinessKey.GetHashCode();
        }
    }

.NET source reference.

So to answer your question. You want to have immutable values for which you base your hashcode calculation on.

Another point, hashcode for custom class if you do not override GetHashCode will be based on reference of the object. So concern of returning same hashcode for different object which are identical in underlying values could be mitigated by overriding GetHashCode method and calculating your HashCode depending on your business keys. For example you would have two string properties, to calculate hashcode you would concat strings and call base string GetHashCode method. This will guarantee that you will get same hashcode for same underlying values of the object.

Tenancy answered 1/3, 2018 at 18:13 Comment(3)
Thanks for your answer. I agree with you mostly, but I don't agree that you should add immutable properties purely for use in GetHashCode. In this case, you might as well just use base.GetHashCode which will always return the same value, and does not then require an override.Tisman
@Tisman Please take a look at addition to last part of the answer. By default GetHashCode on class is based on the reference of the object. If you override it you are in control what underlying values actually determines hashcode of the object. Let me know if last part directly addresses your question.Tenancy
I replied having read that last paragraph. I agree with your last comment, but that doesn't address what I said. I don't think you should just add immutable properties purely for the sake of GetHashCode. The default implementation will achieve the same end. You get a unique value.Tisman
T
-3

After much discussion and reading other SO answers on the topic, it was eventually this ReSharper help page that summarised it very well for me:

MSDN documentation of the GetHashCode() method does not explicitly require that your override of this method returns a value that never changes during the object's lifetime. Specifically, it says:

The GetHashCode method for an object must consistently return the same hash code as long as there is no modification to the object state that determines the return value of the object's Equals method.

On the other hand, it says that the hash code should not change at least when your object is in a collection:

*You can override GetHashCode for immutable reference types. In general, for mutable reference types, you should override GetHashCode only if:

  • You can compute the hash code from fields that are not mutable; or
  • You can ensure that the hash code of a mutable object does not change while the object is contained in a collection that relies on its hash code.*

But why do you need to override GetHashCode() in the first place? Normally, you will do it if your object is going to be used in a Hashtable, as a key in a dictionary, etc., and it's quite hard to predict when your object will be added to a collection and how long it will be kept there.

With all that said, if you want to be on the safe side make sure that your override of GetHashCode() returns the same value during the object's lifetime. ReSharper will help you here by pointing at each non-readonly field or non-get-only property in your implementation of GetHashCode(). If possible, ReSharper will also suggest quick-fixes to make these members read-only/get-only.

Of course, it doesn't suggest what to do if the quick-fixes are not possible. However, it does indicate that those quick-fixes should only be used "if possible" which implies that the the inspection could be suppressed. Gian Paolo's answer on this suggests to throw an exception which will prevent the class from being used as a key and would present itself early in development if it was inadvertently used as a key.

However, GetHashCode is used in other circumstances such as when an instance of your object is passed as a parameter to a mock method setup. Therefore, the only viable option is to implement GetHashCode using the mutable values and put the onus on the rest of the code to ensure the object is not mutated while it is being used as a key, or to not use it as a key at all.

Tisman answered 2/3, 2018 at 11:9 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.