What to return when overriding Object.GetHashCode() in classes with no immutable fields?
Asked Answered
H

5

13

Ok, before you get all mad because there are hundreds of similar sounding questions posted on the internet, I can assure you that I have just spent the last few hours reading all of them and have not found the answer to my question.

Background:

Basically, one of my large scale applications had been suffering from a situation where some Bindings on the ListBox.SelectedItem property would stop working or the program would crash after an edit had been made to the currently selected item. I initially asked the 'An item with the same key has already been added' Exception on selecting a ListBoxItem from code question here, but got no answers.

I hadn't had time to address that problem until this week, when I was given a number of days to sort it out. Now to cut a long story short, I found out the reason for the problem. It was because my data type classes had overridden the Equals method and therefore the GetHashCode method as well.

Now for those of you that are unaware of this issue, I discovered that you can only implement the GetHashCode method using immutable fields/properties. Using a excerpt from Harvey Kwok's answer to the Overriding GetHashCode() post to explain this:

The problem is that GetHashCode is being used by Dictionary and HashSet collections to place each item in a bucket. If hashcode is calculated based on some mutable fields and the fields are really changed after the object is placed into the HashSet or Dictionary, the object can no longer be found from the HashSet or Dictionary.

So the actual problem was caused because I had used mutable properties in the GetHashCode methods. When users changed these property values in the UI, the associated hash code values of the objects changed and then items could no longer be found in their collections.

Question:

So, my question is what is the best way of handling the situation where I need to implement the GetHashCode method in classes with no immutable fields? Sorry, let me be more specific, as that question has been asked before.

The answers in the Overriding GetHashCode() post suggest that in these situations, it is better to simply return a constant value... some suggest to return the value 1, while other suggest returning a prime number. Personally, I can't see any difference between these suggestions because I would have thought that there would only be one bucket used for either of them.

Furthermore, the Guidelines and rules for GetHashCode article in Eric Lippert's Blog has a section titled Guideline: the distribution of hash codes must be "random" which highlights the pitfalls of using an algorithm that results in not enough buckets being used. He warns of algorithms that decrease the number of buckets used and cause a performance problem when the bucket gets really big. Surely, returning a constant falls into this category.

I had an idea of adding an extra Guid field to all of my data type classes (just in C#, not the database) specifically to be used in and only in the GetHashCode method. So I suppose at the end of this long intro, my actual question is which implementation is better? To summarise:

Summary:

When overriding Object.GetHashCode() in classes with no immutable fields, is it better to return a constant from the GetHashCode method, or to create an additional readonly field for each class, solely to be used in the GetHashCode method? If I should add a new field, what type should it be and shouldn't I then include it in the Equals method?

While I am happy to receive answers from anyone, I am really hoping to receive answers from advanced developers with a sound knowledge on this subject.

Hoxha answered 31/10, 2013 at 15:4 Comment(12)
If you have a copy of Effective C# to hand, item number 7 is all about this, if you haven't read it already.Impotent
A simple workaround if the use is limited to one location is to wrap the type in another class that provides a unique immutable value, likely a Guid, and use that for getting a hash code. I would personally try not to add a Guid to a type just for use in a dictionary. Again alternatively, you could use something like an identity map to key for objects based on a detached immutable ID (again likely a Guid so in effect the same result). Or again alternatively, don't amend items in the dictionary. Key them out, remove them, amend them, re-add them.Carcanet
Could you not just ass another private field to the class and make that mutable (eg by using a guid as Adam suggests) and use that for your hash code?Retroflexion
Thanks guys. @JMK, I don't happen to have that book... care to enlighten me? Adam, one problem that I have is that I use WPF where all of my objects are data bound. To add code to every view model that would remove the selected item for editing and then re-insert it would take forever. David, I suggested that in my question... well, for a property anyway, but I meant field... I'll update the question.Hoxha
@Hoxha If I tried to enlighten you I would confuse you because it starts off with "This is the only item in this book dedicated to one function that you should avoid writing" and even though I've read it multiple times, I don't fully understand it. I'm gonna wait until somebody smarter comes along and then upvote!Impotent
Thanks @JMK, I'll see if I can find a copy of that book.... would that be the same book that I just found here?Hoxha
I found it on Google books however so you could give it a read there. I highly recommend buying a copy, it's a small book but packed with detail.Impotent
Ah, I found your starting quote @JMK, so I guess that it is the same book. Thanks again.Hoxha
Pretty unclear why overriding these methods was necessary at all. A good starting point is to completely remove the Equals and GetHashCode overrides, the default implementations inherited from Object are excellent and guarantee object uniqueness. You'll never get a "duplicate key" error from them.Chaudoin
@HansPassant, I also use the Equals methods to provide change notification in the application. By change notification I mean that I have a HasChanges property that uses it like this: return originalState != null && !this.Equals(originalState);. If I were to remove the Equals and GetHashCode implementations, then wouldn't I have to implement IEqualityComparer<T> for every data type class? I have around 100 or more, so maybe I could do that in my next project.Hoxha
Well, it doesn't have to be named Equals() does it? You can call it anything you like, Equals() only need to be overridden when it matters to .NET Framework code. It does matter here, WPF doesn't much care for the Equals() return value changing while the item is bound.Chaudoin
@HansPassant, how can I remove the Equals and GetHashCode overrides? On the IEquatable<T> Interface page on MSDN it says It should be implemented for any object that might be stored in a generic collection and then on the IEquatable<T>.Equals Method page it says If you implement Equals, you should also override the base class implementations of Object.Equals(Object) and GetHashCode so that their behavior is consistent with that of the IEquatable<T>.Hoxha
P
17

Go back to basics. You read my article; read it again. The two ironclad rules that are relevant to your situation are:

  • if x equals y then the hash code of x must equal the hash code of y. Equivalently: if the hash code of x does not equal the hash code of y then x and y must be unequal.
  • the hash code of x must remain stable while x is in a hash table.

Those are requirements for correctness. If you can't guarantee those two simple things then your program will not be correct.

You propose two solutions.

Your first solution is that you always return a constant. That meets the requirement of both rules, but you are then reduced to linear searches in your hash table. You might as well use a list.

The other solution you propose is to somehow produce a hash code for each object and store it in the object. That is perfectly legal provided that equal items have equal hash codes. If you do that then you are restricted such that x equals y must be false if the hash codes differ. This seems to make value equality basically impossible. Since you wouldn't be overriding Equals in the first place if you wanted reference equality, this seems like a really bad idea, but it is legal provided that equals is consistent.

I propose a third solution, which is: never put your object in a hash table, because a hash table is the wrong data structure in the first place. The point of a hash table is to quickly answer the question "is this given value in this set of immutable values?" and you don't have a set of immutable values, so don't use a hash table. Use the right tool for the job. Use a list, and live with the pain of doing linear searches.

A fourth solution is: hash on the mutable fields used for equality, remove the object from all hash tables it is in just before every time you mutate it, and put it back in afterwards. This meets both requirements: the hash code agrees with equality, and hashes of objects in hash tables are stable, and you still get fast lookups.

Philhellene answered 1/11, 2013 at 5:41 Comment(1)
+1 Thanks for taking the time to provide a solid understandable answer. However, I'd like to address some things that you said... I'm not using any Dictionary or HashTable anywhere in my code. I use WPF and I can only assume that the HashTable or Dictionary in question is used internally in the Framework. Looking at the StackTrace from the linked previous question, I see call to System.Windows.DependencyObject.OnPropertyChanged and later System.Windows.Controls.ListBoxItem.OnSelected. So you see, I have no control over that.Hoxha
W
4

I would either create an additional readonly field or else throw NotSupportedException. In my view the other option is meaningless. Let's see why.

Distinct (fixed) hash codes

Providing distinct hash codes is easy, e.g.:

class Sample
{
    private static int counter;
    private readonly int hashCode;

    public Sample() { this.hashCode = counter++; }

    public override int GetHashCode()
    {
        return this.hashCode;
    }

    public override bool Equals(object other)
    {
        return object.ReferenceEquals(this, other);
    }
}

Technically you have to look out for creating too many objects and overflowing the counter here, but in practice I think that's not going to be an issue for anyone.

The problem with this approach is that instances will never compare equal. However, that's perfectly fine if you only want to use instances of Sample as indexes into a collection of some other type.

Constant hash codes

If there is any scenario in which distinct instances should compare equal then at first glance you have no other choice than returning a constant. But where does that leave you?

Locating an instance inside a container will always degenerate to the equivalent of a linear search. So in effect by returning a constant you allow the user to make a keyed container for your class, but that container will exhibit the performance characteristics of a LinkedList<T>. This might be obvious to someone familiar with your class, but personally I see it as letting people shoot themselves in the foot. If you know from beforehand that a Dictionary won't behave as one might expect, then why let the user create one? In my view, better to throw NotSupportedException.

But throwing is what you must not do!

Some people will disagree with the above, and when those people are smarter than oneself then one should pay attention. First of all, this code analysis warning states that GetHashCode should not throw. That's something to think about, but let's not be dogmatic. Sometimes you have to break the rules for a reason.

However, that is not all. In his blog post on the subject, Eric Lippert says that if you throw from inside GetHashCode then

your object cannot be a result in many LINQ-to-objects queries that use hash tables internally for performance reasons.

Losing LINQ is certainly a bummer, but fortunately the road does not end here. Many (all?) LINQ methods that use hash tables have overloads that accept an IEqualityComparer<T> to be used when hashing. So you can in fact use LINQ, but it's going to be less convenient.

In the end you will have to weigh the options yourself. My opinion is that it's better to operate with a whitelist strategy (provide an IEqualityComparer<T> whenever needed) as long as it is technically feasible because that makes the code explicit: if someone tries to use the class naively they get an exception that helpfully tells them what's going on and the equality comparer is visible in the code wherever it is used, making the extraordinary behavior of the class immediately clear.

Weingarten answered 31/10, 2013 at 15:45 Comment(6)
There we go! +1 I was tempted to try and answer myself but thought better of it!Impotent
+1 Seconded! Thanks for taking the time to write such a complete answer @Jon. If I were to go with the 'add a field' idea, would returning (the equivalent value of) Guid.NewGuid().GetHashCode() be ok? Shouldn't I then use that field in the Equals method also?Hoxha
@Sheridan: It would be fine but perhaps overkill. Technically you would use that to compare inside Equals too, but since you know that it's designed to provide reference equality semantics you can also do that directly.Weingarten
Hi @Jon, after reading up further about the IEqualityComparer<T> interface, I'm a little confused... wouldn't implementing that interface, or extending the EqualityComparer<T> class as is recommended on MSDN, result in the same problem as overriding Equals when property values change for objects in a collection?Hoxha
@Sheridan: It would, there's no magic that will allow having the cake and eating it too. But the process of seeing that the class doesn't work as-is, creating a custom equality comparer and using it should be more than enough indication of "yeah, I know returning a constant will be bad for performance, but do it already". The idea is to make the user of your class opt-in explicitly rather than allowing them to blissfully "wander into a trap".Weingarten
Using ReferenceEquals and a never-equal hash code is no better than what object already does for you, so in that case you might as well not have implemented these at all. The question only makes sense in contexts where you don't want Equals to be reference equality, in which case the hash codes cannot be purely unique, as they're required to be equal if Equals returns true.Close
C
1

Where I want to override Equals, but there is no sensible immutable "key" for an object (and for whatever reason it doesn't make sense to make the whole object immutable), in my opinion there is only one "correct" choice:

  • Implement GetHashCode to hash the same fields as Equals uses. (This might be all the fields.)
  • Document that these fields must not be altered while in a dictionary.
  • Trust that users either don't put these objects in dictionaries, or obey the second rule.

(Returning a constant value compromises dictionary performance. Throwing an exception disallows too many useful cases where objects are cached but not modified. Any other implementation for GetHashCode would be wrong.)

Where this runs the user into trouble anyway, it's probably their fault. (Specifically: using a dictionary where they shouldn't, or using a model type in a context where they should be using a view-model type that uses reference equality instead.)

Or perhaps I shouldn't be overriding Equals in the first place.

Close answered 5/4, 2019 at 5:43 Comment(2)
But we want to avoid any users getting into trouble, regardless of fault.Hoxha
Users only get into trouble if they don't follow the rules. And I don't see any valid alternative, other than making the type immutable -- which is also a valid choice, but has other consequences that may not be desirable in some cases.Close
T
0

If the classes truly contain nothing constant on which a hash value can be calculated then I would use something simpler than a GUID. Just use a random number persisted in the class (or in a wrapper class).

Trawl answered 31/10, 2013 at 15:13 Comment(2)
Thanks @Dweeberly. can I ask what makes you think that that would be an improvement over a Guid?Hoxha
It's smaller and simpler to generate. Most hash usage mods the value with a prime to get the bucket address. GetHashCode returns a 32 bit int, so anything bigger than that is overkill.Trawl
K
0

A simple approach is to store the hashCode in a private member and generate it on the first use. If your entity doesn't change often, and you're not going to be using two different objects that are Equal (where your Equals method returns true) as keys in your dictionary, then this should be fine:

private int? _hashCode;

public override int GetHashCode() {
   if (!_hashCode.HasValue)
      _hashCode = Property1.GetHashCode() ^ Property2.GetHashCode() etc... based on whatever you use in your equals method
   return _hashCode.Value;
}

However, if you have, say, object a and object b, where a.Equals(b) == true, and you store an entry in your dictionary using a as the key (dictionary[a] = value).
If a does not change, then dictionary[b] will return value, however, if you change a after storing the entry in the dictionary, then dictionary[b] will most likely fail. The only workaround to this is to rehash the dictionary when any of the keys change.

Kuwait answered 31/10, 2013 at 15:23 Comment(4)
Thanks @MartinErnst. The problem that I have with this method, is that all of the class properties are used in the Equals method and they are all mutable. At first I thought that I could use the 'Id' and DateCreated properties in GetHashCode because they never change, but then I realised that even those properties are mutable when adding a new item.Hoxha
It seems like the keys you are using are entities - the simplest thing would be to use the id as the key in your dictionary, and don't add new entries until they have an id assigned if this is possible. Alternatively, you can use the above method and generate the hashcode based off the id property alone if the entity is not a new entity, and if it is a new entity (say the id is 0), then use base.GetHashCode(). If you are using new entities as keys in a dictionary that outlives the datacontext/session then you will need to rehash the dictionary when the entity is assigned a new id.Kuwait
Thanks for getting back to me on this... can you please explain what you mean by rehash the dictionary? Also, will that be possible for a collection that is data bound to a WPF ListBox?Hoxha
Basically remove the item from the dictionary and add it againKuwait

© 2022 - 2024 — McMap. All rights reserved.