Intersect with a custom IEqualityComparer using Linq
Asked Answered
R

2

14

Long story short: I have 2 collections of objects. One contains good values (Let's call it "Good"), the other default values (Mr. "Default"). I want the Intersect of the Union between Good and Default, and Default. In other words: Intersect(Union(Good, Default), Default). One might think it resolves as Default, but here is where it gets tricky : I use a custom IEqualityComparer.

I got the following classes :

class MyClass
{
    public string MyString1;
    public string MyString2;
    public string MyString3;
}

class MyEqualityComparer : IEqualityComparer<MyClass>
{
    public bool Equals(MyClass item1, MyClass item2)
    {
        if(item1 == null && item2 == null)
            return true;
        else if((item1 != null && item2 == null) ||
                (item1 == null && item2 != null))
            return false;

        return item1.MyString1.Equals(item2.MyString1) &&
               item1.MyString2.Equals(item2.MyString2);
    }

    public int GetHashCode(MyClass item)
    {
        return new { item.MyString1, item.MyString2 }.GetHashCode();
    }
}

Here are the characteristic of my collections Good and Default collections :

Default : It's a large set, containing all the wanted { MyString1, MyString2 } pairs, but the MyString3 values are, as you can guess, default values.

Good : It's a smaller set, containing mostly items which are in the Default set, but with some good MyString3 values. It also has some { MyString1, MyString2 } that are outside of the wanted set.

What I want to do is this : Take only the items from Good that are in Default, but add the other items in Default to that.

Here is, what I think is, my best try :

HalfWantedResult = Good.Union(Default, new MyEqualityComparer());
WantedResult= HalfWantedResult.Intersect(Good, new MyEqualityComparer());

I taught it should have worked, but the result I get is basically only the good { MyString1, MyString2 } pairs set, but all coming from the Default set, so I have the default value all across. I also tried switching the Default and Good of the last Intersect, but I get the same result.

Risa answered 2/12, 2010 at 21:44 Comment(2)
Your Equals implementation is really bad. There will be hash collisions which shouldn't be there. Why not use the same projection (new { item.MyString1, item.MyString2 }) but call Equals?Pulsimeter
I should look into this, it might be part of the problem. The Union uses GetHashCode, and the Intersects uses the Equals. I haven't really put any taught into that part. ashamedRisa
C
21

First of all this is wrong:

public bool Equals(MyClass item1, MyClass item2)
{
    return GetHashCode(item1) == GetHashCode(item2);
}

If the hashcode's are different for sure the corresponding 2 items are different, but if they're equal is not guaranteed that the corresponding 2 items are equal.

So this is the correct Equals implementation:

public bool Equals(MyClass item1, MyClass item2)
{
    if(object.ReferenceEquals(item1, item2))
        return true;
    if(item1 == null || item2 == null)
        return false;
    return item1.MyString1.Equals(item2.MyString1) &&
           item1.MyString2.Equals(item2.MyString2);
}

As Slacks suggested (anticipating me) the code is the following:

var Default = new List<MyClass>
{
    new MyClass{MyString1="A",MyString2="A",MyString3="-"},
    new MyClass{MyString1="B",MyString2="B",MyString3="-"},
    new MyClass{MyString1="X",MyString2="X",MyString3="-"},
    new MyClass{MyString1="Y",MyString2="Y",MyString3="-"},
    new MyClass{MyString1="Z",MyString2="Z",MyString3="-"},

};
var Good = new List<MyClass>
{
    new MyClass{MyString1="A",MyString2="A",MyString3="+"},
    new MyClass{MyString1="B",MyString2="B",MyString3="+"},
    new MyClass{MyString1="C",MyString2="C",MyString3="+"},
    new MyClass{MyString1="D",MyString2="D",MyString3="+"},
    new MyClass{MyString1="E",MyString2="E",MyString3="+"},
};
var wantedResult = Good.Intersect(Default, new MyEqualityComparer())
                       .Union(Default, new MyEqualityComparer());

// wantedResult:
// A A +
// B B +
// X X -
// Y Y -
// Z Z -
Colotomy answered 2/12, 2010 at 21:48 Comment(8)
+1 for helping me a lot with my Equals, but not accepted since it doesn't solve the problem. (I wish I could +2! :-P )Risa
No prob. I was writing the right solution for your intersection problem, but Slacks anticipated me. His is the answer ;-)Colotomy
I feel so stupid... I misunderstood the tooltip about the intersect method, and there was an error in my underlying data so no match could occur. Since I consider this post to be the most helpful, with the implementation of equals and the examples, I accepted it. Thanks again.Risa
You're welcome ;-). By the way, if your Comparer is called a very big amount of times, and performace really matters for you, you should think to rewrite also your GetHashCode() to avoid the creation of a new instance of an anonymous type for each invocation. Look at this answer to get a fast and effective implementation --> #263900Colotomy
I will do that. Not that the performance will really be an issue with my project (only checking a million times or so, once a day) but I strive for better programming! (I'm not an actual programmer :-P )Risa
Actually, using anonymous type implementation to calculate HasCode for multiple fields is a really smart idea, and makes your code short and clean. But obviously, it's not bad to know what happend behind the scenes, and the code in the answer I posted above is the same used internally by anonymous types' GetHashCode method ;-)Colotomy
That implemention of Equals is suboptimal. The preamble is very hard to read. Instead consider if (object.ReferenceEquals(item1,item2)) return true; for the first line. That handles the case of both being null, speeds up the case of both being the same reference. Then the rest of the null check is simply if(item1==null||item2==null) return false; which is shorter and easier to read, and is correct since if they are both null, you already returned true.Dilettantism
@KevinCathcart: Yes, thanks you're right. Edited. I probably wrote that without thinking too much to performances...Colotomy
M
10

You need to check for actual equality, not just hashcode equality.

GetHashCode() is not (and cannot be) collision free, which is why the Equals method is required in the first place.

Also, you can do this much more simply by writing

WantedResult = Good.Concat(Default).Distinct();

The Distinct method will return the first item of each pair of duplicates, so this will return the desired result.

EDIT: That should be

WantedResult = Good.Intersect(Default, new MyEqualityComparer())
                   .Union(Default, new MyEqualityComparer());
Mimamsa answered 2/12, 2010 at 21:49 Comment(4)
If I use Concat & Distinct, I'll have the result I want, but I'll also have some items from the Good set which weren't in the wanted scope (Default).Risa
The Union and Intersect operations are commutative, so it doesn't change the result, but yes, it avoids using a middle-man variable.Risa
Oh, wait, Union and Intersect are not commutative: (ABC u ABD) n AE = A while (ABC n AE) u ABD = ABDColotomy
Oh, sorry, I was thinking about my case only: (ABC u ABD) n ABC = ABC and ABC u (ABD n ABC) = ABC. But yes, you're right, they are not. And in my case, (ABC u ABD) != (ABD u ABC), since it's not a true union.Risa

© 2022 - 2024 — McMap. All rights reserved.