When Should a .NET Class Override Equals()? When Should it Not?
Asked Answered
M

4

16

The VS2005 documentation Guidelines for Overloading Equals() and Operator == (C# Programming Guide) states in part

Overriding operator == in non-immutable types is not recommended.

The newer .NET Framework 4 documentation Guidelines for Implementing Equals and the Equality Operator (==) omits that statement, although one post in Community Content repeats the assertion and references the older documentation.

It seems that it is reasonable to override Equals() at least for some trivial mutable classes, such as

public class ImaginaryNumber
{
    public double RealPart { get; set; }
    public double ImaginaryPart { get; set; }
}

In math, two imaginary numbers that have the same real part and the same imaginary part are in fact equal at the point in time that equality is tested. It is incorrect to assert that they are not equal, which would happen if separate objects with the same RealPart and ImaginaryPart were Equals() not overridden.

On the other hand, if one overrides Equals() one should also override GetHashCode(). If an ImaginaryNumber that overrides Equals() and GetHashCode() is placed in a HashSet, and a mutable instance changes its value, that object would no longer be found in the HashSet.

Was MSDN incorrect to remove the guideline about not overriding Equals() and operator== for non-immutable types?

Is it reasonable to override Equals() for mutable types where "in the real world" equivalence of all properties means that the objects themselves are equal (as with ImaginaryNumber)?

If it is reasonable, how does one best deal with potential mutability while an object instance is participating in a HashSet or something else that relies on GetHashCode() not changing?

UPDATE

Just came across this in MSDN

Typically, you implement value equality when objects of the type are expected to be added to a collection of some sort, or when their primary purpose is to store a set of fields or properties. You can base your definition of value equality on a comparison of all the fields and properties in the type, or you can base the definition on a subset. But in either case, and in both classes and structs, your implementation should follow the five guarantees of equivalence:

Mackay answered 14/3, 2012 at 19:46 Comment(12)
I don't think it's reasonable for ImaginaryNumber to be a mutable type.Diseased
In reality, you should probably implement ImaginaryNumber as an immutable value type (struct).Affricative
Really? Why? All other numbers are mutable.Mackay
int, long, double etc... are not mutable. A 5 is a 5 - you can't change what it is.Starnes
Really? When you add 1 to 1, does 1 suddenly equal 2?Diseased
So... int i = 0; i = 42; will not mutate the value of i?Mackay
@Oded: Certainly numbers themselves are not mutable, but variables that hold numbers (of type int, double, float) are mutable. Instances of the types are mutable, whereas the suggestion from Damien is that instances of ImaginaryNumber should not be mutable.Mackay
No. The value of i is not mutated. Instead, i is assigned a new value (which may be based on the previous value of i)Diseased
You have a misunderstanding here. i is a variable. 0 is a literal - it is immutable. When you reassign 42 to the variable i, the variable now has a different value. 42 and 0 have not changed.Starnes
@EricJ. The value of the variable i is changed, but the Int32 0 is still 0.Pauperism
No, variables of the type can hold the different instances of the type (for immutable types). I agree that an ImaginaryNumber type should be modelled as an immutable type.Starnes
An example of why a ComplexNumber should be immutable: Consider "ComplexNumber a, b, c;". The expectation for Numbers is that a, b, c are INDEPENDENT. "a = new ComplexNumber(1, 2); b = a;" Then "a.RealPart = -1;" this will also change b, unless define ComplexNumber as struct. Or make RealPart read-only, disallow setting of a.RealPart. Then change to a is done by "a = new ComplexNumber(-1, a.ImaginaryPart);" Could make a convenience method public ComplexNumber SetReal(double value) { return new ComplexNumber(value, this.ImaginaryPart); }. Call: "a = a.SetReal(-1);"Lothario
M
16

I came to realize that I wanted Equals to mean two different things, depending on the context. After weighing the input here as well as here, I have settled on the following for my particular situation:

I'm not overriding Equals() and GetHashCode(), but rather preserving the common but by no means ubiquitous convention that Equals() means identity equality for classes, and that Equals() means value equality for structs. The largest driver of this decision is the behavior of objects in hashed collections (Dictionary<T,U>, HashSet<T>, ...) if I stray from this convention.

That decision left me still missing the concept of value equality (as discussed on MSDN)

When you define a class or struct, you decide whether it makes sense to create a custom definition of value equality (or equivalence) for the type. Typically, you implement value equality when objects of the type are expected to be added to a collection of some sort, or when their primary purpose is to store a set of fields or properties.

A typical case for desiring the concept of value equality (or as I'm terming it "equivalence") is in unit tests.

Given

public class A
{
    int P1 { get; set; }
    int P2 { get; set; }
}

[TestMethod()]
public void ATest()
{
    A expected = new A() {42, 99};
    A actual = SomeMethodThatReturnsAnA();
    Assert.AreEqual(expected, actual);
}

the test will fail because Equals() is testing reference equality.

The unit test certainly could be modified to test each property individually, but that moves the concept of equivalence out of the class into the test code for the class.

To keep that knowledge encapsulated in the class, and to provide a consistent framework for testing equivalence, I defined an interface that my objects implement

public interface IEquivalence<T>
{
    bool IsEquivalentTo(T other);
}

the implementation typically follows this pattern:

public bool IsEquivalentTo(A other)
{
    if (object.ReferenceEquals(this, other)) return true;

    if (other == null) return false;

    bool baseEquivalent = base.IsEquivalentTo((SBase)other);

    return (baseEquivalent && this.P1 == other.P1 && this.P2 == other.P2);
}

Certainly, if I had enough classes with enough properties, I could write a helper that builds an expression tree via reflection to implement IsEquivalentTo().

Finally, I implemented an extension method that tests the equivalence of two IEnumerable<T>:

static public bool IsEquivalentTo<T>
    (this IEnumerable<T> first, IEnumerable<T> second)

If T implements IEquivalence<T> that interface is used, otherwise Equals() is used, to compare elements of the sequence. Allowing the fallback to Equals() lets it work e.g. with ObservableCollection<string> in addition to my business objects.

Now, the assertion in my unit test is

Assert.IsTrue(expected.IsEquivalentTo(actual));
Mackay answered 17/3, 2012 at 17:30 Comment(5)
this is so much lazier than thinking through hashcode generation and worrying about collisions and side effects and blah blah blah. and that's a good thing :)Major
I wish .NET had defined two sets of virtual equality tests (or included a parameter to indicate what type of equality should be tested). Having an object encapsulate data by holding a mutable-type reference to an object instance that will never be exposed to anything that might mutate it is a very common pattern; the object to which the reference is held should supply an equality test which would be meaningful in that scenario.Airsickness
@Airsickness and EricJ I agree. Seems to be a missing feature in .Net.Lothario
@ToolmakerSteve: It took me awhile to figure out a good definition for the two equality tests which was limited to traits universal to all objects. What I finally settled upon was X.Identical(Y) should mean replacing SOME references to X with refs to Y should have no semantic effect, while X.EquivState(Y) would mean that simultaneously swapping all refs to X with refs to Y and vice versa would have no semantic effect other than possibly changing the values of X.IdentityHash() and Y.IdentityHash().Airsickness
I've found that objects fall neatly into two categories: entities (mutable, reference equality), and values (immutable, structural equality, implements IEquatable). What do you need a mutable type with structural equality for? That's a smell to me.Dagostino
C
12

The MSDN documentation about not overloading == for mutable types is wrong. There is absolutely nothing wrong for mutable types to implement equality semantics. Two items can be equal now even if they will change in the future.

The dangers around mutable types and equality generally show up when they are used as a key in a hash table or allow mutable members to participate in the GetHashCode function.

Caleb answered 14/3, 2012 at 19:51 Comment(2)
Because == actually represents two different operators in C#, it's debatable whether any class type should really overload it, since it may not always be clear whether it represents a reference equality test or a call to an == overload. I would further suggest that the only notion of equality which is universally applicable to all objects is equivalence, and distinct class objects can only be equivalent if there's no means by which they might ever differ. While some "mutable" types meet that criterion, things which can be independently mutated do not.Airsickness
Re "Two items can be equal now even if they will change in the future." The problem is that if you change == for a mutable type, no one can safely use an object of that type as a dictionary key. This is a BIG problem. Its bitten me, even for types I wrote myself, so I should know better. This is a serious design flaw by Microsoft. The only safe answer for mutables, as EricJ and supercat discuss in other answer, is to leave Equals alone, and define your own equality function, with a different name. Annoying, because now you have to use a different function when comparing your objects.Lothario
A
7

Check out the Guidelines and rules for GetHashCode by Eric Lippert.

Rule: the integer returned by GetHashCode must never change while the object is contained in a data structure that depends on the hash code remaining stable

It is permissible, though dangerous, to make an object whose hash code value can mutate as the fields of the object mutate.

Agape answered 14/3, 2012 at 20:46 Comment(1)
I had actually read that before, and it's well worth reading for those who have not. Actually I think this line answers my question best: If you have such an object and you put it in a hash table then the code which mutates the object and the code which maintains the hash table are required to have some agreed-upon protocol that ensures that the object is not mutated while it is in the hash table. What that protocol looks like is up to you.Mackay
D
0

I don't understand your concerns about GetHashCode in regards to HashSet. GetHashCode just returns a number that helps HashSet internally store and lookup values. If the hash code for an object changes the object doesn't get removed from HashSet, it just won't be stored in the most optimal position.

EDIT

Thanks to @Erik J I see the point.

The HashSet<T> is a performance collection and to achieve that performance it relies completely on GetHashCode being constant for the life of the collection. If you want this performance then you need to follow these rules. If you can't then you'll have to switch to something else like List<T>

Displace answered 14/3, 2012 at 20:14 Comment(1)
If you change the hash code of an object after it was added to the HashSet, myHashSet.Contains(myMutatedObject) returns false. However, the HashSet itself still contains the (now mutated) object. So, allowing the hash value to change breaks the HashSet contract (Contains breaks).Mackay

© 2022 - 2024 — McMap. All rights reserved.