Why isn't .Except (LINQ) comparing things properly? (using IEquatable)
Asked Answered
P

5

31

I have two collections of my own reference-type objects that I wrote my own IEquatable.Equals method for, and I want to be able to use LINQ methods on them.

So,

List<CandyType> candy = dataSource.GetListOfCandy();
List<CandyType> lollyPops = dataSource.GetListOfLollyPops();
var candyOtherThanLollyPops = candy.Except( lollyPops );

According to the documentation of .Except, not passing an IEqualityComparer should result in EqualityComparer.Default being used to compare objects. And the documentation for the Default comparer is this:

"The Default property checks whether type T implements the System.IEquatable generic interface and if so returns an EqualityComparer that uses that implementation. Otherwise it returns an EqualityComparer that uses the overrides of Object.Equals and Object.GetHashCode provided by T."

So, because I implement IEquatable for my object, it should use that and work. But, it doesn't. It doesn't work until I override GetHashCode. In fact, if I set a break point, my IEquatable.Equals method never gets executed. This makes me think that it's going with plan B according to its documentation. I understand that overriding GetHashCode is a good idea, anyway, and I can get this working, but I am upset that it is behaving in a way that isn't in line with what its own documentation stated.

Why isn't it doing what it said it would? Thank you.

Palmitate answered 29/10, 2009 at 19:7 Comment(2)
TRy using the EqualityComparer.Default directly and see if the mismatch is in that implementation, or with the Linq method, for starters. Then open Reflector and check the source and add a comment to the MSDN docs?Omeara
As a side node, things not behaving as documented are considered bugs, so I encourage you to submit it as such as Microsoft Connect. For the record, I've had smaller documentation bugs submitted via that channel fixed in the past.Govern
P
26

After investigation, it turns out things aren't quite as bad as I thought. Basically, when everything is implemented properly (GetHashCode, etc.) the documentation is correct, and the behavior is correct. But, if you try to do something like implement IEquatable all by itself, then your Equals method will never get called (this seems to be due to GetHashCode not being implemented properly). So, while the documentation is technically wrong, it's only wrong in a fringe situation that you'd never ever want to do (if this investigation has taught me anything, it's that IEquatable is part of a whole set of methods you should implement atomically (by convention, not by rule, unfortunately)).

Good sources on this are:

Palmitate answered 1/11, 2009 at 19:59 Comment(1)
The Except statement specifically uses the GetHashCode method to check if two objects should be considered the same, that is why it would fail if you don't fully implement IEquatableCull
R
12

The interface IEqualityComparer<T> has these two methods:

bool Equals(T x, T y);
int GetHashCode(T obj);

A good implementation of this interface would thus implement both. The Linq extension method Except relies on the hash code in order to use a dictionary or set lookup internally to figure out which objects to skip, and thus requires that proper GetHashCode implementation.

Unfortunately, when you use EqualityComparer<T>.Default, that class does not provide a good GetHashCode implementation by itself, and relies on the object in question, the type T, to provide that part, when it detects that the object implements IEquatable<T>.

The problem here is that IEquatable<T> does not in fact declare GetHashCode so it's much easier to forget to implement that method properly, contrasted with the Equals method that it does declare.

So you have two choices:

  • Provide a proper IEqualityComparer<T> implementation that implements both Equals and GetHashCode
  • Make sure that in addition to implementing IEquatable<T> on your object, implement a proper GetHashCode as well
Righthand answered 29/10, 2009 at 19:34 Comment(1)
To be clear, I did not implement IEqualityComparer<T> at all - only IEquatable<T>. As for choice 1, my understanding is that if I created my own comparer that implements IEqualityComparer<T>, I would then have to pass that comparer into every LINQ method I ever use, all over the code. That is not attractive. I think everything you've said is true, but it doesn't address why things don't work like the documentation says (specifically, my IEquatable<T>.Equals method never gets called).Palmitate
M
0

Hazarding a guess, are these different classes? I think by default IEquatable only works with the same class. So it might by falling back to the Object.Equal method.

Muna answered 29/10, 2009 at 19:15 Comment(1)
What are different classes? I edited the question to be more explicit about the types of objects in the collections. Hopefully that helps.Palmitate
M
0

I wrote a GenericEqualityComparer to be used on the fly for these types of methods: Distinct, Except, Intersect, etc.

Use as follows:

var results = list1.Except(list2, new GenericEqualityComparer<MYTYPE>((a, b) => a.Id == b.Id // OR SOME OTHER COMPARISON RESOLVING TO BOOLEAN));

Here's the class:

public class GenericEqualityComparer<T> : EqualityComparer<T>
{
    public Func<T, int> HashCodeFunc { get; set; }

    public Func<T, T, Boolean> EqualityFunc { get; set; }

    public GenericEqualityComparer(Func<T, T, Boolean> equalityFunc)
    {
        EqualityFunc = equalityFunc;
        HashCodeFunc = null;
    }

    public GenericEqualityComparer(Func<T, T, Boolean> equalityFunc, Func<T, int> hashCodeFunc) : this(equalityFunc)
    {
        HashCodeFunc = hashCodeFunc;
    }

    public override bool Equals(T x, T y)
    {
        return EqualityFunc(x, y);
    }

    public override int GetHashCode(T obj)
    {
        if (HashCodeFunc == null)
        {
            return 1;
        }
        else
        {
            return HashCodeFunc(obj);
        }
    }
}
Measurable answered 17/5, 2019 at 13:46 Comment(0)
A
0

I ran into this same problem, and debugging led me to a different answer than most. Most people point out that GetHashCode() must be implemented.

However, in my case - which was LINQ's SequenceEqual() - GetHashCode() was never called. And, despite the fact that every object involved was typed to a specific type T, the underlying problem was that SequenceEqual() called T.Equals(object other), which I had forgotten to implement, rather than calling the expected T.Equals(T other).

Alar answered 12/10, 2020 at 6:55 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.