Overriding GetHashCode for mutable objects?
Asked Answered
R

5

62

I've read about 10 different questions on when and how to override GetHashCode but there's still something I don't quite get. Most implementations of GetHashCode are based on the hash codes of the fields of the object, but it's been cited that the value of GetHashCode should never change over the lifetime of the object. How does that work if the fields that it's based on are mutable? Also what if I do want dictionary lookups etc to be based on reference equality not my overridden Equals?

I'm primarily overriding Equals for the ease of unit testing my serialization code which I assume serializing and deserializing (to XML in my case) kills the reference equality so I want to make sure at least it's correct by value equality. Is this bad practice to override Equals in this case? Basically in most of the executing code I want reference equality and I always use == and I'm not overriding that. Should I just create a new method ValueEquals or something instead of overriding Equals? I used to assume that the framework always uses == and not Equals to compare things and so I thought it was safe to override Equals since it seemed to me like its purpose was for if you want to have a 2nd definition of equality that's different from the == operator. From reading several other questions though it seems that's not the case.

EDIT:

It seems my intentions were unclear, what I mean is that 99% of the time I want plain old reference equality, default behavior, no surprises. For very rare cases I want to have value equality, and I want to explicitly request value equality by using .Equals instead of ==.

When I do this the compiler recommends I override GetHashCode as well, and that's how this question came up. It seemed like there's contradicting goals for GetHashCode when applied to mutable objects, those being:

  1. If a.Equals(b) then a.GetHashCode() should == b.GetHashCode().
  2. The value of a.GetHashCode() should never change for the lifetime of a.

These seem naturally contradicting when a mutable object, because if the state of the object changes, we expect the value of .Equals() to change, which means that GetHashCode should change to match the change in .Equals(), but GetHashCode should not change.

Why does there seem to be this contradiction? Are these recommendations not meant to apply to mutable objects? Probably assumed, but might be worth mentioning I'm referring to classes not structs.

Resolution:

I'm marking JaredPar as accepted, but mainly for the comments interaction. To sum up what I've learned from this is that the only way to achieve all goals and to avoid possible quirky behavior in edge cases is to only override Equals and GetHashCode based on immutable fields, or implement IEquatable. This kind of seems to diminish the usefulness of overriding Equals for reference types, as from what I've seen most reference types usually have no immutable fields unless they're stored in a relational database to identify them with their primary keys.

Roseberry answered 17/5, 2009 at 0:53 Comment(1)
And if your class doesn't have any immutable fields, and you're not using referential equality, then the hashcode should be ...gasp... a constant! Will that break the O(1) nature of Dictionary lookups? - most certainly. Will that at least be correct in dictionaries? - Yes.Erotic
R
24

How does that work if the fields that it's based on are mutable?

It doesn't in the sense that the hash code will change as the object changes. That is a problem for all of the reasons listed in the articles you read. Unfortunately this is the type of problem that typically only show up in corner cases. So developers tend to get away with the bad behavior.

Also what if I do want dictionary lookups etc to be based on reference equality not my overridden Equals?

As long as you implement an interface like IEquatable<T> this shouldn't be a problem. Most dictionary implementations will choose an equality comparer in a way that will use IEquatable<T> over Object.ReferenceEquals. Even without IEquatable<T>, most will default to calling Object.Equals() which will then go into your implementation.

Basically in most of the executing code I want reference equality and I always use == and I'm not overriding that.

If you expect your objects to behave with value equality you should override == and != to enforce value equality for all comparisons. Users can still use Object.ReferenceEquals if they actually want reference equality.

I used to assume that the framework always uses == and not Equals to compare things

What the BCL uses has changed a bit over time. Now most cases which use equality will take an IEqualityComparer<T> instance and use it for equality. In the cases where one is not specified they will use EqualityComparer<T>.Default to find one. At worst case this will default to calling Object.Equals

Renowned answered 17/5, 2009 at 1:7 Comment(6)
"If you expect your objects to behave with value equality you should override == and != to enforce value equality for all comparisons. Users can still use Object.ReferenceEquals if they actually want reference equality." But that's the thing, I don't expect them to behave with value equality, I expect them to behaving with reference equality except when I explicitly want them not to by using object.Equals, in every other circumstance I expect reference equality like most other classes.Roseberry
Edited the question to clarify the real question more.Roseberry
"Unfortunately this is the type of problem that typically only show up in corner cases. So developers tend to get away with the bad behavior." That mentions that people get away with the bad bahavior? How do you do it properly then? There's some suggestions for basing the hashcode on immutable fields, but what if there are no immutable fields?Roseberry
@Davy8, the best way is to use immutable fields. That way it works 100% of the time. If there are no immutable fields you may want to rethink your design a bit. Either make a subset of the fields immutable or do not implement equality via .Equals. Otherwise you will only have a mostly working solution.Renowned
So to sum up, best practice is to only override Equals (and GetHashCode) base solely on immutable fields, and mutable equality is desired then to implement a new method for comparing such equality?Roseberry
@Roseberry yes. That is the only way to reliably provide equality.Renowned
L
8

If you have a mutable object, there isn't much point in overriding the GetHashCode method, as you can't really use it. It's used for example by the Dictionary and HashSet collections to place each item in a bucket. If you change the object while it's used as a key in the collection, the hash code no longer matches the bucket that the object is in, so the collection doesn't work properly and you may never find the object again.

If you want the lookup not to use the GetHashCode or Equals method of the class, you can always provide your own IEqualityComparer implementation to use instead when you create the Dictionary.

The Equals method is intended for value equality, so it's not wrong to implement it that way.

Luciferin answered 17/5, 2009 at 1:21 Comment(13)
Maybe I've just been misusing mutable objects as keys then. I've always used them and expected them to use reference equality, like var a = new object(); var b = new object(); dict[a] = "hello"; dict[b] = "world";Roseberry
When an object is used as key in a hashed collection, the GetHashCode and Equals method are used. If they are not overridden, the default implementation (by Object) is to use reference equality.Luciferin
Right, and I guess it would be the path of least surprise that they always be based on == which is something which is recommended not be overridden rather than Equals which is more acceptable to overwrite. All of this being opinion of course.Roseberry
Well, basing it on == would produce rather weird results if you use primitive objects (such as Strings), because "a"+"b" != "ab", which would rather mess up lookups.Diandiana
One fundamental weakness in Java, which follows through in .NET, is that there is no distinction between a variable which is used to encapsulate the identity of a mutable object of type Foo, versus one which is used to encapsulate the state of an instance of Foo which will never be exposed to code that could mutate it. The Equals and GethashCode semantics for the latter should be different from the former, but there's no mechanism for Foo to expose different sets of methods for the two purposes.Circum
@supercat: I don't follow what you mean. There is no type that can be either a value type or reference type, every type is always one or the other.Luciferin
@Guffa: Suppose the only field in class Foo is int[] Arr. There exist two instances of Foo, each with a reference to a different 500-item array, but both arrays hold the same 500 values. Should the instances of Foo be considered equivalent or not? If Arr is used to identify an array which is used in outside code, and whose values Foo is supposed to increment every time its IncrementValues method is called, the instances of Foo are clearly not equivalent. If it is used to hold an unchanging set of 500 values, the two instances are clearly equivalent.Circum
@Guffa: If the two arrays will never be accessed by anything outside Foo, and if Foo can mutate them but only on request, then two instances of Foo which are held by code which will never ask it to mutate the arrays may be considered equivalent, but instances which are owned by code which could mutate them should be considered distinct. If I were writing the specs for methods Equivalent and EqualState, I would specify that x.Equivalent(y) should return true if an object has reason to believe that replacing any arbitrary references to x with references to y...Circum
...would not affect the behavior of any members of the type; x.EqualState(y) should return true if simultaneously replacing all references to x with references to y and vice versa would not affect the behavior of any members of the type. Object.Equivalent() would generally be expected to test reference identity for mutable types, or match Object.EqualState() for immutable ones.Circum
@supercat: In your example, neither case is not always clearly not equivalent, or always clearly equivalent. Even if you have two instances that reference the same array, you can't assume that they should always be considered equivalent in every possible situation. Having two different methods for comparison doesn't help. You can specify a dozen different methods for comparing objects in different aspects, but they will still not cover all possible situations.Luciferin
@Guffa: It may not cover absolutely all situations, but it will help with the fact that .NET has no concept of an immutable instance of a mutable type. Code which constructs a Thingie and never exposes it to any code that would mutate it can know that even though Thingie is mutable, that particular instance can never change [i.e. there is no execution sequence that would make it do so]. Equivalence should mean different things for mutable and immutable instances, but .NET provides no means of making a distinction.Circum
@supercat: A problem with introducing equality concepts like that is that there is no way to enforce the limitations that they rely on, so it would open up several problems that may be as severe as the problems that it tries to solve. A mutable object can't be made immutable other than being encapsulated, and there is no way to ensure that a comparison for "immutable instances" can't be used on instances that can be changed.Luciferin
@Guffa: There's no way the system can really "enforce" much of anything having to do with equality, but enough classes follow the rules to make classes which rely upon the rules (e.g. Dictionary), usable. As you note, making an immutable instance of a mutable type requires encapsulation, but encapsulation is a very common way to make effectively-immutable instances of mutable types. The question is whether the encapsulating class should be required to do the hashing and comparison itself, or should have some means of asking the encapsulated object to do it. I'd favor the later.Circum
D
5

Wow, that's actually several questions in one :-). So one after the other:

it's been cited that the value of GetHashCode should never change over the lifetime of the object. How does that work if the fields that it's based on are mutable?

This common advice is meant for the case where you want to use your object as a key in a HashTable/dictionary etc. . HashTables usually require the hash not to change, because they use it to decide how to store & retrieve the key. If the hash changes, the HashTable will probably no longer find your object.

To cite the docs of Java's Map interface:

Note: great care must be exercised if mutable objects are used as map keys. The behavior of a map is not specified if the value of an object is changed in a manner that affects equals comparisons while the object is a key in the map.

In general it's a bad idea to use any kind of mutable object as a key in a hash table: It's not even clear what should happen if a key changes after it's been added to the hash table. Should the hash table return the stored object via the old key, or via the new key, or via both?

So the real advice is: Only use immutable objects as keys, and make sure their hashcode never changes either (which is usually automatic if the object is immutable).

Also what if I do want dictionary lookups etc to be based on reference equality not my overridden Equals?

Well, find a dictionary implementation that works like that. But the standard library dictionaries use the hashcode&Equals, and there's no way to change that.

I'm primarily overriding Equals for the ease of unit testing my serialization code which I assume serializing and deserializing (to XML in my case) kills the reference equality so I want to make sure at least it's correct by value equality. Is this bad practice to override Equals in this case?

No, I'd find that perfectly acceptable. However, you should not use such objects as keys in a dictionary/hashtable, as they're mutable. See above.

Diandiana answered 17/5, 2009 at 1:13 Comment(1)
So what you're saying is that the point about hashcodes not changing isn't so important because you shouldn't be using mutable objects are dictionary keys in the first place?Roseberry
A
2

I don't know about C#, being a relative noob to it, but in Java, if you override equals() you need to also override hashCode() to maintain the contract between them (and vice-versa)... And java also has the same catch 22; basically forcing you use immutable fields... But this is an issue only for classes which are used as a hash-key, and Java has alternate implementations for all hash-based collections... which maybe not as fast, but they do effecitely allow you to use a mutable object as a key... it's just (usually) frowned up as a "poor design".

And I feel the urge to point out that this fundamental problem is timeless... It's been around since Adam was a lad.

I've worked on fortran code which is older than I am (I'm 36) which breaks when a username is changed (like when a girl gets married, or divorced ;-) ... Thus is engineering, The adopted solution was: The GetHashCode "method" remembers the previously calculated hashCode, recalculates the hashCode (i.e. a virtual isDirty marker) and if the keyfields have changed it returns null. This causes the cache to delete the "dirty" user (by calling another GetPreviousHashCode) and then the cache returns null, causing the user to re-read from the database. An interesting and worthwhile hack; even if I do say so myself ;-)

I'll trade-off mutability (only desirable in corner cases) for O(1) access (desirable in all cases). Welcome to engineering; the land of the informed compromise.

Cheers. Keith.

Avant answered 17/5, 2009 at 6:57 Comment(1)
What do you mean by classe which "allow you to use a mutable object as a key"? Java does allow that, it's just that it doesn't (i.e. cannot) work if the key actually changes, because it's simply not clear what it should do.Diandiana
S
1

The underlying topic here is how to best uniquely identify objects. You mention serialization/deserialization which is important because referential integrity is lost in that process.

The short answer, Is that objects should be uniquely identified by the smallest set of immutable fields that can be used to do so. These are the fields you should use when overrideing GetHashCode and Equals.

For testing it's perfectly reasonable to define whatever assertions you need, usually these are not defined on the type itself but rather as utility methods in the test suite. Maybe a TestSuite.AssertEquals(MyClass, MyClass) ?

Note that GetHashCode and Equals should work together. GetHashCode should return the same value for two objects if they are equal. Equals should return true if and only if two objects have the same hash code. (Note that it's possible that two object may not be equal but may return the same hash code). There are plenty of webpage that tackle this topic head-on, just google away.

Shipp answered 17/5, 2009 at 1:35 Comment(3)
And what if the object has zero immutable fields?Roseberry
I've googled and searched SO and found many answers, that don't seem to directly address my problem. See my edit for more info.Roseberry
If you don't have any immutable fields you can always create a synthetic one, this could be either a sequence from a DB or a globally unique ID (GUID) if you do not have a DB. Override hash/equals to use this guid value and make sure you always include the synthetic ID when you serialize/deserialize. Also when you create new objects of this type you may need to run a full scan across existing objects of that type calling Equals on each and see if you already have the exact same object with and ID, if so the new object should have the same ID as the object it matched...Shipp

© 2022 - 2024 — McMap. All rights reserved.