Wrap a delegate in an IEqualityComparer
Asked Answered
P

15

131

Several Linq.Enumerable functions take an IEqualityComparer<T>. Is there a convenient wrapper class that adapts a delegate(T,T)=>bool to implement IEqualityComparer<T>? It's easy enough to write one (if your ignore problems with defining a correct hashcode), but I'd like to know if there is an out-of-the-box solution.

Specifically, I want to do set operations on Dictionarys, using only the Keys to define membership (while retaining the values according to different rules).

Poor answered 18/9, 2008 at 23:34 Comment(0)
K
46

Ordinarily, I'd get this resolved by commenting @Sam on the answer (I've done some editing on the original post to clean it up a bit without altering the behavior.)

The following is my riff of @Sam's answer, with a [IMNSHO] critical fix to the default hashing policy:-

class FuncEqualityComparer<T> : IEqualityComparer<T>
{
    readonly Func<T, T, bool> _comparer;
    readonly Func<T, int> _hash;

    public FuncEqualityComparer( Func<T, T, bool> comparer )
        : this( comparer, t => 0 ) // NB Cannot assume anything about how e.g., t.GetHashCode() interacts with the comparer's behavior
    {
    }

    public FuncEqualityComparer( Func<T, T, bool> comparer, Func<T, int> hash )
    {
        _comparer = comparer;
        _hash = hash;
    }

    public bool Equals( T x, T y )
    {
        return _comparer( x, y );
    }

    public int GetHashCode( T obj )
    {
        return _hash( obj );
    }
}
Kamalakamaria answered 15/9, 2010 at 16:13 Comment(21)
As far as I'm concerned this is the correct answer. Any IEqualityComparer<T> that leaves GetHashCode out is just straight-up broken.Aurlie
@Dan Tao: I might be missing something here, but the implementation that you gave in your answer delegates to T.GetHashCode, and this is not guaranteed to be a good hash code. So are the other implementations more correct by providing an explicit hash function? Also, why do you need the equality comparison at all, since if you had a hash function, you could test equality by comparing the hashes and not the object keys themselves?Ferryboat
@Joshua Frank: It's not valid to use hash equality to imply equality - only the inverse is true. In short, @Dan Tao is completely correct in what he says, and this answer is simply the application of this fact to a previously incomplete answerKamalakamaria
@Ruben Bartelink: Thanks for clarifying. But I still don't understand your hashing policy of t => 0. If all objects always hash to the same thing (zero), then isn't that even more broken than using obj.GetHashCode, per @Dan Tao's point? Why not always force the caller to provide a good hash function?Ferryboat
@Joshua Frank: Firstly, the chances are you're understanding things quite well. Nobody has found a hole in the computer science around hashing functions. The point was that @Sam's response used t=>t.GetHashCode() as the hashing function. The only time that makes sense is if every single time Equals<T> returns true, the underlying values are guaranteed to return the same hash code. If this is not the case, the miscomputed hash code results in the search failing incorrectly.Kamalakamaria
Thus it is not reasonable to assume that an arbitrary algorithm in a Func its been supplied cannot possibly return true despite the hash codes being different. Your point that returning zero all the time is just not hashing is true. That's why there's an overload that takes the hashing Func for when the profiler tells us searches are not sufficiently efficient. The only point in all of this is that if you're going to have a default hashing algorithm, it should be one that works 100% of the time and doesnt have dangerous superficially correct behavior. And then we can work on the performance!Kamalakamaria
Anyway, I suspect that your point is "well thats not a good hashing function". If it is, my response is that you're correct, but quick and wrong doesnt beat slow and correct, esp. on an uber-trusted resource like SO :P If you actually feel that I'm incorrect in my correctness (not efficiency) claims, I suggest having another read of @Dan Tao's fantastic answer.Kamalakamaria
Re-reading your last comment, its clear you do understand it just fine. The word broken threw me. Broken describes the overload in @Sams' answer - couldnt possibly work. For me, the situation where someone wants to use an arbitrary Func just isnt the case where they are in a position or have a desire to come up with a correct associated hashing function that is tailored to be guaranteed to include every subtlety of the inputs to outputs transformation involved in the Func. Clearly there are order of magnitude efficiency differences with large data sets, but the profiler will show them.Kamalakamaria
@Ruben: Thanks for your valuable thoughts on this while I've been silent (read: lazy)! @Joshua: Here's the thing. The OP is asking about creating a custom IEqualityComparer<T> implementation using a single delegate to use for the Equals method, right? There are actually two flavors of response here, which I think may have gotten you confused. One is that this idea needs to be supplemented with a custom GetHashCode function; that is, no IEqualityComparer<T> implementation is truly functional without Equals and GetHashCode, for reasons explained already. (continued)Aurlie
@Joshua: The other type of response being offered is that there's a similar but different way to approach this problem altogether: in many cases, rather than specify Equals and GetHashCode separately, it makes sense to simply supply a "key" which will be used to determine uniqueness instead. This is the concept behind methods such as GroupBy and ToDictionary: in this case, what is needed is simply a Func<T, TKey> to select the key, which should be a type that already implements Equals and GetHashCode in a manner appropriate for use as a key, e.g., string, int. (continued)Aurlie
@Joshua: So really, these are two separate ideas. But the fact is that you're right to have spotted that something is missing: really, in my (original) answer, I should have suggested an overload for the KeyEqualityComparer<T, TKey> constructor that would accept an IEqualityComparer<TKey>. This would allow a developer to combine both flavors of response offered to this question. But in fact, I think the reasoning behind the KeyEqualityComparer<T, TKey> idea is that, if you're using it, you're going to use something reasonable for TKey like string or int, so you should be fine.Aurlie
In other words, since you are using a custom comparer it has nothing to do with the object's default hash code related to the default comparer, thus you cannot use it.Clementius
@Peet Brits: Extremely well summarised - 2000 of my words in 20!Kamalakamaria
Loved the thread... any example of use especially with regards to hash?Quiescent
@Kevin In what context ? Something like Assert.Equal( expectedResult, actualResult, new FuncEqualityComparer<MyResult>( (x,y) => StringComparer.Equal( x.Name, y.Name, StringComparison.OrdinalIgnoreCase))) ? It often makes more sense for a list (see the SequenceEqual extensions in @Sam's answer), or you might have a Factory Method with an Intention Revealing Name return your comparison. BTW these days when implementing Test Specific Equality, I generally reach for Ploeh.SemanticComparison's Likeness stuff, e.g., https://mcmap.net/q/48652/-applying-autofixture-semanticcomparison-oflikeness-to-sequences-collections-arrays-ienumerableKamalakamaria
@RubenBartelink: Sorry it took so long, but I've finally gotten around to reviewing the massive amount of answer/comment activity on this question, and decided that I agree with DanTao: this the correct answer.Poor
@MarceloCantos: I'm honoured. Have to hand it to @DanTao though - he's covered stuff pretty completely. BTW since this answer, I've found I tend to mostly apply @orip's answer in practice i.e., a .Select(x=>Tuple.Create(x...,x....) and lean on the Equals (and of course the GetHashCode) of Tuple. And if anyone else has read down this far, I can highly recommend the last module of @Mark Seemann's excellent Advanced Unit Testing PluralSight course (and SemanticComparison library) with regard to thinking and impling Test Specific Equality which I guess is why people arrive at this questionKamalakamaria
Woha! So many comments! :P Anyhow... see my update of the implementation to .Net4.0 as an answer to a similar question here - in short the comparer doesn't need to be generic anymore thanks to contravariance on the interface! (Which simplifies usage, adds the non-generic IEqualityComparer naturally and saves a tiny bit of memory.)Langelo
@Langelo Interesting; I wasn't aware of the other question (I've no idea if I've abused the above for reference equality on the fly (likely enough I guess!)Kamalakamaria
Even better, I'd say, instead of t => 0, write t => { throw new NotImplementedException(); }. I wouldn't want any unforeseen usage of this 'default hash function' to go unnoticed.Uterine
@Uterine Not so sure - that may technically be correct, however for many common cases people just want something quick and not wrong for a non-perf-critical scenario. Bottom line is that one either needs to read Dan's answer or use some mechanism leaning on Structural Equality (I rarely use end up using this construct on a permanent basis)Kamalakamaria
A
178

On the importance of GetHashCode

Others have already commented on the fact that any custom IEqualityComparer<T> implementation should really include a GetHashCode method; but nobody's bothered to explain why in any detail.

Here's why. Your question specifically mentions the LINQ extension methods; nearly all of these rely on hash codes to work properly, because they utilize hash tables internally for efficiency.

Take Distinct, for example. Consider the implications of this extension method if all it utilized were an Equals method. How do you determine whether an item's already been scanned in a sequence if you only have Equals? You enumerate over the entire collection of values you've already looked at and check for a match. This would result in Distinct using a worst-case O(N2) algorithm instead of an O(N) one!

Fortunately, this isn't the case. Distinct doesn't just use Equals; it uses GetHashCode as well. In fact, it absolutely does not work properly without an IEqualityComparer<T> that supplies a proper GetHashCode. Below is a contrived example illustrating this.

Say I have the following type:

class Value
{
    public string Name { get; private set; }
    public int Number { get; private set; }

    public Value(string name, int number)
    {
        Name = name;
        Number = number;
    }

    public override string ToString()
    {
        return string.Format("{0}: {1}", Name, Number);
    }
}

Now say I have a List<Value> and I want to find all of the elements with a distinct name. This is a perfect use case for Distinct using a custom equality comparer. So let's use the Comparer<T> class from Aku's answer:

var comparer = new Comparer<Value>((x, y) => x.Name == y.Name);

Now, if we have a bunch of Value elements with the same Name property, they should all collapse into one value returned by Distinct, right? Let's see...

var values = new List<Value>();

var random = new Random();
for (int i = 0; i < 10; ++i)
{
    values.Add("x", random.Next());
}

var distinct = values.Distinct(comparer);

foreach (Value x in distinct)
{
    Console.WriteLine(x);
}

Output:

x: 1346013431
x: 1388845717
x: 1576754134
x: 1104067189
x: 1144789201
x: 1862076501
x: 1573781440
x: 646797592
x: 655632802
x: 1206819377

Hmm, that didn't work, did it?

What about GroupBy? Let's try that:

var grouped = values.GroupBy(x => x, comparer);

foreach (IGrouping<Value> g in grouped)
{
    Console.WriteLine("[KEY: '{0}']", g);
    foreach (Value x in g)
    {
        Console.WriteLine(x);
    }
}

Output:

[KEY = 'x: 1346013431']
x: 1346013431
[KEY = 'x: 1388845717']
x: 1388845717
[KEY = 'x: 1576754134']
x: 1576754134
[KEY = 'x: 1104067189']
x: 1104067189
[KEY = 'x: 1144789201']
x: 1144789201
[KEY = 'x: 1862076501']
x: 1862076501
[KEY = 'x: 1573781440']
x: 1573781440
[KEY = 'x: 646797592']
x: 646797592
[KEY = 'x: 655632802']
x: 655632802
[KEY = 'x: 1206819377']
x: 1206819377

Again: didn't work.

If you think about it, it would make sense for Distinct to use a HashSet<T> (or equivalent) internally, and for GroupBy to use something like a Dictionary<TKey, List<T>> internally. Could this explain why these methods don't work? Let's try this:

var uniqueValues = new HashSet<Value>(values, comparer);

foreach (Value x in uniqueValues)
{
    Console.WriteLine(x);
}

Output:

x: 1346013431
x: 1388845717
x: 1576754134
x: 1104067189
x: 1144789201
x: 1862076501
x: 1573781440
x: 646797592
x: 655632802
x: 1206819377

Yeah... starting to make sense?

Hopefully from these examples it's clear why including an appropriate GetHashCode in any IEqualityComparer<T> implementation is so important.


Original answer

Expanding on orip's answer:

There are a couple of improvements that can be made here.

  1. First, I'd take a Func<T, TKey> instead of Func<T, object>; this will prevent boxing of value type keys in the actual keyExtractor itself.
  2. Second, I'd actually add a where TKey : IEquatable<TKey> constraint; this will prevent boxing in the Equals call (object.Equals takes an object parameter; you need an IEquatable<TKey> implementation to take a TKey parameter without boxing it). Clearly this may pose too severe a restriction, so you could make a base class without the constraint and a derived class with it.

Here's what the resulting code might look like:

public class KeyEqualityComparer<T, TKey> : IEqualityComparer<T>
{
    protected readonly Func<T, TKey> keyExtractor;

    public KeyEqualityComparer(Func<T, TKey> keyExtractor)
    {
        this.keyExtractor = keyExtractor;
    }

    public virtual bool Equals(T x, T y)
    {
        return this.keyExtractor(x).Equals(this.keyExtractor(y));
    }

    public int GetHashCode(T obj)
    {
        return this.keyExtractor(obj).GetHashCode();
    }
}

public class StrictKeyEqualityComparer<T, TKey> : KeyEqualityComparer<T, TKey>
    where TKey : IEquatable<TKey>
{
    public StrictKeyEqualityComparer(Func<T, TKey> keyExtractor)
        : base(keyExtractor)
    { }

    public override bool Equals(T x, T y)
    {
        // This will use the overload that accepts a TKey parameter
        // instead of an object parameter.
        return this.keyExtractor(x).Equals(this.keyExtractor(y));
    }
}
Aurlie answered 15/9, 2010 at 16:39 Comment(8)
Your StrictKeyEqualityComparer.Equals method appears to be the same as KeyEqualityComparer.Equals. Does the TKey : IEquatable<TKey> constraint make TKey.Equals work differently?Circumjacent
@JustinMorgan: Yes--in the first case, since TKey may be any arbitrary type, the compiler will use the virtual method Object.Equals which will require boxing of value type parameters, e.g., int. In the latter case, however, since TKey is constrained to implement IEquatable<TKey>, the TKey.Equals method will be used which will not require any boxing.Aurlie
Very interesting, thanks for the info. I had no idea GetHashCode had these LINQ implications until seeing these answers. Great to know for future use.Circumjacent
It's sad that such beautiful constructs (as this answer) can't be typed without specifying the types (in methods as arguments). May be encapsulating the new IEqualityComparer<> expression in a static method would help it a bit.Closeknit
@DanTao: Wouldn't it make sense to use return EqualityComparer<TKey>.Default.Equals(keyExtractor(x), keyExtractor(y)); inside the Equals method? I mean, to handle null values and such?Ravin
@JohannesH: Probably! Would have eliminated the need for StringKeyEqualityComparer<T, TKey> too.Aurlie
+1 @DanTao: Belated thanks for a great exposition of why one should never ignore hash codes when defining equality in .Net.Poor
Thank you for this good answer! I've changed it in my code and resolved the inheritance; i. e. made two separate classes, one KeyEqualityComparer<T> (where yours would be used as <T, object>) and another KeyEqualityComparer<T, TKey> which can now have the same name.Nike
G
120

When you want to customize equality checking, 99% of the time you're interested in defining the keys to compare by, not the comparison itself.

This could be an elegant solution (concept from Python's list sort method).

Usage:

var foo = new List<string> { "abc", "de", "DE" };

// case-insensitive distinct
var distinct = foo.Distinct(new KeyEqualityComparer<string>( x => x.ToLower() ) );

The KeyEqualityComparer class:

public class KeyEqualityComparer<T> : IEqualityComparer<T>
{
    private readonly Func<T, object> keyExtractor;

    public KeyEqualityComparer(Func<T,object> keyExtractor)
    {
        this.keyExtractor = keyExtractor;
    }

    public bool Equals(T x, T y)
    {
        return this.keyExtractor(x).Equals(this.keyExtractor(y));
    }

    public int GetHashCode(T obj)
    {
        return this.keyExtractor(obj).GetHashCode();
    }
}
Gilud answered 6/8, 2009 at 14:41 Comment(9)
This is much better than aku's answer.Jarlath
Definitely the right approach. There are a couple improvements that can be made, in my opinion, which I've mentioned in my own answer.Aurlie
This is very elegant code, but it doesn't answer the question, which is why I accepted @aku's answer instead. I wanted a wrapper for Func<T, T, bool> and I have no requirement to extract a key, since the key is already separated out in my Dictionary.Poor
@Marcelo: That's fine, you can do that; but be aware that if you're going to take @aku's approach, you really should add a Func<T, int> to supply the hash code for a T value (as has been suggested in, e.g., Ruben's answer). Otherwise the IEqualityComparer<T> implementation you're left with is quite broken, especially with regards to its usefulness in LINQ extension methods. See my answer for a discussion on why this is.Aurlie
This is nice but if the key being selected was a value type there would be unnecessary boxing. Perhaps would be better to have a TKey for defining the key.Almond
@Jarlath not only that this is better, the other is just wrong. Blame OP for accepting that as answer!Closeknit
Why not generic Tkey instead of boxed object?Closeknit
@Closeknit you're right, of course, and Dan Tao's answer fixes that.Gilud
I cannot understand how to use this in the case of a dictionary eg ``` select new { key = pair.First().UrlDecode(), value = pair.Length > 1 ? pair.Second().UrlDecode() : "" }).Distinct(//?).ToDictionary(k => k.key, v => v.value); ```Karttikeya
B
47

I'm afraid there is no such wrapper out-of-box. However it's not hard to create one:

class Comparer<T>: IEqualityComparer<T>
{
    private readonly Func<T, T, bool> _comparer;

    public Comparer(Func<T, T, bool> comparer)
    {
        if (comparer == null)
            throw new ArgumentNullException("comparer");

        _comparer = comparer;
    }

    public bool Equals(T x, T y)
    {
        return _comparer(x, y);
    }

    public int GetHashCode(T obj)
    {
        return obj.ToString().ToLower().GetHashCode();
    }
}

...

Func<int, int, bool> f = (x, y) => x == y;
var comparer = new Comparer<int>(f);
Console.WriteLine(comparer.Equals(1, 1));
Console.WriteLine(comparer.Equals(1, 2));
Backblocks answered 18/9, 2008 at 23:52 Comment(6)
Thats absolutely fantastic, cleans so much code up its ridiculous!Chibcha
However, be careful with that implementation of GetHashCode. If you're actually going to be using it in some sort of hash table you'll want something a bit more robust.Skipper
this code has a serious problem! it is easy to come up with a class that has two objects that are equal in terms of this comparer but have different hash codes.Rescue
To remedy this, the class needs another member private readonly Func<T, int> _hashCodeResolver that must also be passed in the constructor and be used in the GetHashCode(...) method.Blender
I'm curious: Why are you using obj.ToString().ToLower().GetHashCode() instead of obj.GetHashCode()?Circumjacent
The places in the framework that take an IEqualityComparer<T> invariably use hashing behind the scenes (e.g., LINQ's GroupBy, Distinct, Except, Join, etc) and the MS contract regarding hashing is broken in this implementation. Here's MS's documentation excerpt: "Implementations are required to ensure that if the Equals method returns true for two objects x and y, then the value returned by the GetHashCode method for x must equal the value returned for y." See: msdn.microsoft.com/en-us/library/ms132155Clathrate
K
46

Ordinarily, I'd get this resolved by commenting @Sam on the answer (I've done some editing on the original post to clean it up a bit without altering the behavior.)

The following is my riff of @Sam's answer, with a [IMNSHO] critical fix to the default hashing policy:-

class FuncEqualityComparer<T> : IEqualityComparer<T>
{
    readonly Func<T, T, bool> _comparer;
    readonly Func<T, int> _hash;

    public FuncEqualityComparer( Func<T, T, bool> comparer )
        : this( comparer, t => 0 ) // NB Cannot assume anything about how e.g., t.GetHashCode() interacts with the comparer's behavior
    {
    }

    public FuncEqualityComparer( Func<T, T, bool> comparer, Func<T, int> hash )
    {
        _comparer = comparer;
        _hash = hash;
    }

    public bool Equals( T x, T y )
    {
        return _comparer( x, y );
    }

    public int GetHashCode( T obj )
    {
        return _hash( obj );
    }
}
Kamalakamaria answered 15/9, 2010 at 16:13 Comment(21)
As far as I'm concerned this is the correct answer. Any IEqualityComparer<T> that leaves GetHashCode out is just straight-up broken.Aurlie
@Dan Tao: I might be missing something here, but the implementation that you gave in your answer delegates to T.GetHashCode, and this is not guaranteed to be a good hash code. So are the other implementations more correct by providing an explicit hash function? Also, why do you need the equality comparison at all, since if you had a hash function, you could test equality by comparing the hashes and not the object keys themselves?Ferryboat
@Joshua Frank: It's not valid to use hash equality to imply equality - only the inverse is true. In short, @Dan Tao is completely correct in what he says, and this answer is simply the application of this fact to a previously incomplete answerKamalakamaria
@Ruben Bartelink: Thanks for clarifying. But I still don't understand your hashing policy of t => 0. If all objects always hash to the same thing (zero), then isn't that even more broken than using obj.GetHashCode, per @Dan Tao's point? Why not always force the caller to provide a good hash function?Ferryboat
@Joshua Frank: Firstly, the chances are you're understanding things quite well. Nobody has found a hole in the computer science around hashing functions. The point was that @Sam's response used t=>t.GetHashCode() as the hashing function. The only time that makes sense is if every single time Equals<T> returns true, the underlying values are guaranteed to return the same hash code. If this is not the case, the miscomputed hash code results in the search failing incorrectly.Kamalakamaria
Thus it is not reasonable to assume that an arbitrary algorithm in a Func its been supplied cannot possibly return true despite the hash codes being different. Your point that returning zero all the time is just not hashing is true. That's why there's an overload that takes the hashing Func for when the profiler tells us searches are not sufficiently efficient. The only point in all of this is that if you're going to have a default hashing algorithm, it should be one that works 100% of the time and doesnt have dangerous superficially correct behavior. And then we can work on the performance!Kamalakamaria
Anyway, I suspect that your point is "well thats not a good hashing function". If it is, my response is that you're correct, but quick and wrong doesnt beat slow and correct, esp. on an uber-trusted resource like SO :P If you actually feel that I'm incorrect in my correctness (not efficiency) claims, I suggest having another read of @Dan Tao's fantastic answer.Kamalakamaria
Re-reading your last comment, its clear you do understand it just fine. The word broken threw me. Broken describes the overload in @Sams' answer - couldnt possibly work. For me, the situation where someone wants to use an arbitrary Func just isnt the case where they are in a position or have a desire to come up with a correct associated hashing function that is tailored to be guaranteed to include every subtlety of the inputs to outputs transformation involved in the Func. Clearly there are order of magnitude efficiency differences with large data sets, but the profiler will show them.Kamalakamaria
@Ruben: Thanks for your valuable thoughts on this while I've been silent (read: lazy)! @Joshua: Here's the thing. The OP is asking about creating a custom IEqualityComparer<T> implementation using a single delegate to use for the Equals method, right? There are actually two flavors of response here, which I think may have gotten you confused. One is that this idea needs to be supplemented with a custom GetHashCode function; that is, no IEqualityComparer<T> implementation is truly functional without Equals and GetHashCode, for reasons explained already. (continued)Aurlie
@Joshua: The other type of response being offered is that there's a similar but different way to approach this problem altogether: in many cases, rather than specify Equals and GetHashCode separately, it makes sense to simply supply a "key" which will be used to determine uniqueness instead. This is the concept behind methods such as GroupBy and ToDictionary: in this case, what is needed is simply a Func<T, TKey> to select the key, which should be a type that already implements Equals and GetHashCode in a manner appropriate for use as a key, e.g., string, int. (continued)Aurlie
@Joshua: So really, these are two separate ideas. But the fact is that you're right to have spotted that something is missing: really, in my (original) answer, I should have suggested an overload for the KeyEqualityComparer<T, TKey> constructor that would accept an IEqualityComparer<TKey>. This would allow a developer to combine both flavors of response offered to this question. But in fact, I think the reasoning behind the KeyEqualityComparer<T, TKey> idea is that, if you're using it, you're going to use something reasonable for TKey like string or int, so you should be fine.Aurlie
In other words, since you are using a custom comparer it has nothing to do with the object's default hash code related to the default comparer, thus you cannot use it.Clementius
@Peet Brits: Extremely well summarised - 2000 of my words in 20!Kamalakamaria
Loved the thread... any example of use especially with regards to hash?Quiescent
@Kevin In what context ? Something like Assert.Equal( expectedResult, actualResult, new FuncEqualityComparer<MyResult>( (x,y) => StringComparer.Equal( x.Name, y.Name, StringComparison.OrdinalIgnoreCase))) ? It often makes more sense for a list (see the SequenceEqual extensions in @Sam's answer), or you might have a Factory Method with an Intention Revealing Name return your comparison. BTW these days when implementing Test Specific Equality, I generally reach for Ploeh.SemanticComparison's Likeness stuff, e.g., https://mcmap.net/q/48652/-applying-autofixture-semanticcomparison-oflikeness-to-sequences-collections-arrays-ienumerableKamalakamaria
@RubenBartelink: Sorry it took so long, but I've finally gotten around to reviewing the massive amount of answer/comment activity on this question, and decided that I agree with DanTao: this the correct answer.Poor
@MarceloCantos: I'm honoured. Have to hand it to @DanTao though - he's covered stuff pretty completely. BTW since this answer, I've found I tend to mostly apply @orip's answer in practice i.e., a .Select(x=>Tuple.Create(x...,x....) and lean on the Equals (and of course the GetHashCode) of Tuple. And if anyone else has read down this far, I can highly recommend the last module of @Mark Seemann's excellent Advanced Unit Testing PluralSight course (and SemanticComparison library) with regard to thinking and impling Test Specific Equality which I guess is why people arrive at this questionKamalakamaria
Woha! So many comments! :P Anyhow... see my update of the implementation to .Net4.0 as an answer to a similar question here - in short the comparer doesn't need to be generic anymore thanks to contravariance on the interface! (Which simplifies usage, adds the non-generic IEqualityComparer naturally and saves a tiny bit of memory.)Langelo
@Langelo Interesting; I wasn't aware of the other question (I've no idea if I've abused the above for reference equality on the fly (likely enough I guess!)Kamalakamaria
Even better, I'd say, instead of t => 0, write t => { throw new NotImplementedException(); }. I wouldn't want any unforeseen usage of this 'default hash function' to go unnoticed.Uterine
@Uterine Not so sure - that may technically be correct, however for many common cases people just want something quick and not wrong for a non-perf-critical scenario. Bottom line is that one either needs to read Dan's answer or use some mechanism leaning on Structural Equality (I rarely use end up using this construct on a permanent basis)Kamalakamaria
I
26

Same as Dan Tao's answer, but with a few improvements:

  1. Relies on EqualityComparer<>.Default to do the actual comparing so that it avoids boxing for value types (structs) that has implemented IEquatable<>.

  2. Since EqualityComparer<>.Default used it doesn't explode on null.Equals(something).

  3. Provided static wrapper around IEqualityComparer<> which will have a static method to create the instance of comparer - eases calling. Compare

     Equality<Person>.CreateComparer(p => p.ID);
    

    with

     new EqualityComparer<Person, int>(p => p.ID);
    
  4. Added an overload to specify IEqualityComparer<> for the key.

The class:

public static class Equality<T>
{
    public static IEqualityComparer<T> CreateComparer<V>(Func<T, V> keySelector)
    {
        return CreateComparer(keySelector, null);
    }

    public static IEqualityComparer<T> CreateComparer<V>(Func<T, V> keySelector, 
                                                         IEqualityComparer<V> comparer)
    {
        return new KeyEqualityComparer<V>(keySelector, comparer);
    }

    class KeyEqualityComparer<V> : IEqualityComparer<T>
    {
        readonly Func<T, V> keySelector;
        readonly IEqualityComparer<V> comparer;

        public KeyEqualityComparer(Func<T, V> keySelector, 
                                   IEqualityComparer<V> comparer)
        {
            if (keySelector == null)
                throw new ArgumentNullException(nameof(keySelector));

            this.keySelector = keySelector;
            this.comparer = comparer ?? EqualityComparer<V>.Default;
        }

        public bool Equals(T x, T y)
        {
            return comparer.Equals(keySelector(x), keySelector(y));
        }

        public int GetHashCode(T obj)
        {
            return comparer.GetHashCode(keySelector(obj));
        }
    }
}

you may use it like this:

var comparer1 = Equality<Person>.CreateComparer(p => p.ID);
var comparer2 = Equality<Person>.CreateComparer(p => p.Name);
var comparer3 = Equality<Person>.CreateComparer(p => p.Birthday.Year);
var comparer4 = Equality<Person>.CreateComparer(p => p.Name, StringComparer.CurrentCultureIgnoreCase);

Person is a simple class:

class Person
{
    public int ID { get; set; }
    public string Name { get; set; }
    public DateTime Birthday { get; set; }
}
Irisirisa answered 2/8, 2011 at 14:25 Comment(2)
+1 for providing an implementation that lets you provide a comparer for the key. Besides giving more flexibility this also avoids boxing value types for both the comparisons and also the hashing.Clathrate
This is the most fleshed out answer here. I added a null check as well. Complete.Closeknit
S
11
public class FuncEqualityComparer<T> : IEqualityComparer<T>
{
    readonly Func<T, T, bool> _comparer;
    readonly Func<T, int> _hash;

    public FuncEqualityComparer( Func<T, T, bool> comparer )
        : this( comparer, t => t.GetHashCode())
    {
    }

    public FuncEqualityComparer( Func<T, T, bool> comparer, Func<T, int> hash )
    {
        _comparer = comparer;
        _hash = hash;
    }

    public bool Equals( T x, T y )
    {
        return _comparer( x, y );
    }

    public int GetHashCode( T obj )
    {
        return _hash( obj );
    }
}

With extensions :-

public static class SequenceExtensions
{
    public static bool SequenceEqual<T>( this IEnumerable<T> first, IEnumerable<T> second, Func<T, T, bool> comparer )
    {
        return first.SequenceEqual( second, new FuncEqualityComparer<T>( comparer ) );
    }

    public static bool SequenceEqual<T>( this IEnumerable<T> first, IEnumerable<T> second, Func<T, T, bool> comparer, Func<T, int> hash )
    {
        return first.SequenceEqual( second, new FuncEqualityComparer<T>( comparer, hash ) );
    }
}
Stalingrad answered 6/11, 2008 at 20:44 Comment(1)
@Stalingrad (who no longer exists as of this comment): Cleaned up code without anjusting behavior (and +1'd). Added Riff at #98533Kamalakamaria
S
6

orip's answer is great.

Here a little extension method to make it even easier:

public static IEnumerable<T> Distinct<T>(this IEnumerable<T> list, Func<T, object>    keyExtractor)
{
    return list.Distinct(new KeyEqualityComparer<T>(keyExtractor));
}
var distinct = foo.Distinct(x => x.ToLower())
Settling answered 27/5, 2011 at 9:44 Comment(0)
P
2

I'm going to answer my own question. To treat Dictionaries as sets, the simplest method seems to be to apply set operations to dict.Keys, then convert back to Dictionaries with Enumerable.ToDictionary(...).

Poor answered 19/9, 2008 at 0:9 Comment(0)
D
2

The implementation at (german text) Implementing IEqualityCompare with lambda expression cares about null values and uses extension methods to generate IEqualityComparer.

To create an IEqualityComparer in a Linq union your just have to write

persons1.Union(persons2, person => person.LastName)

The comparer:

public class LambdaEqualityComparer<TSource, TComparable> : IEqualityComparer<TSource>
{
  Func<TSource, TComparable> _keyGetter;
    
  public LambdaEqualityComparer(Func<TSource, TComparable> keyGetter)
  {
    _keyGetter = keyGetter;
  }
    
  public bool Equals(TSource x, TSource y)
  {
    if (x == null || y == null) return (x == null && y == null);
    return object.Equals(_keyGetter(x), _keyGetter(y));
  }
   
  public int GetHashCode(TSource obj)
  {
    if (obj == null) return int.MinValue;
    var k = _keyGetter(obj);
    if (k == null) return int.MaxValue;
    return k.GetHashCode();
  }
}

You also need to add an extension method to support type inference

public static class LambdaEqualityComparer
{
       // source1.Union(source2, lambda)
        public static IEnumerable<TSource> Union<TSource, TComparable>(
           this IEnumerable<TSource> source1, 
           IEnumerable<TSource> source2, 
            Func<TSource, TComparable> keySelector)
        {
            return source1.Union(source2, 
               new LambdaEqualityComparer<TSource, TComparable>(keySelector));
       }
   }
Diligence answered 29/6, 2013 at 11:26 Comment(0)
P
1

Just one optimization: We can use the out-of-the-box EqualityComparer for value comparisions, rather than delegating it.

This would also make the implementation cleaner as actual comparision logic now stays in GetHashCode() and Equals() which you may have already overloaded.

Here is the code:

public class MyComparer<T> : IEqualityComparer<T> 
{ 
  public bool Equals(T x, T y) 
  { 
    return EqualityComparer<T>.Default.Equals(x, y); 
  } 

  public int GetHashCode(T obj) 
  { 
    return obj.GetHashCode(); 
  } 
} 

Don't forget to overload GetHashCode() and Equals() methods on your object.

This post helped me: c# compare two generic values

Sushil

Pierpont answered 29/6, 2010 at 15:17 Comment(2)
NB same issue as identified in comment on #98533 - CANT assume obj.GetHashCode() makes senseKamalakamaria
I don't get the purpose of this one. You created an equality comparer that's equivalent to the default equality comparer. So why don't you use it directly?Fatidic
P
1

orip's answer is great. Expanding on orip's answer:

i think that the solution's key is use "Extension Method" to transfer the "anonymous type".

    public static class Comparer 
    {
      public static IEqualityComparer<T> CreateComparerForElements<T>(this IEnumerable<T> enumerable, Func<T, object> keyExtractor)
      {
        return new KeyEqualityComparer<T>(keyExtractor);
      }
    }

Usage:

var n = ItemList.Select(s => new { s.Vchr, s.Id, s.Ctr, s.Vendor, s.Description, s.Invoice }).ToList();
n.AddRange(OtherList.Select(s => new { s.Vchr, s.Id, s.Ctr, s.Vendor, s.Description, s.Invoice }).ToList(););
n = n.Distinct(x=>new{Vchr=x.Vchr,Id=x.Id}).ToList();
Parochial answered 6/4, 2012 at 6:52 Comment(0)
G
0
public static Dictionary<TKey, TValue> Distinct<TKey, TValue>(this IEnumerable<TValue> items, Func<TValue, TKey> selector)
  {
     Dictionary<TKey, TValue> result = null;
     ICollection collection = items as ICollection;
     if (collection != null)
        result = new Dictionary<TKey, TValue>(collection.Count);
     else
        result = new Dictionary<TKey, TValue>();
     foreach (TValue item in items)
        result[selector(item)] = item;
     return result;
  }

This makes it possible to select a property with lambda like this: .Select(y => y.Article).Distinct(x => x.ArticleID);

Gera answered 30/5, 2011 at 13:10 Comment(0)
U
0
public class DelegateEqualityComparer<T>: IEqualityComparer<T>
{
    private readonly Func<T, T, bool> _equalsDelegate;
    private readonly Func<T, int>     _getHashCodeDelegate;

    public DelegateEqualityComparer(Func<T, T, bool> equalsDelegate, Func<T, int> getHashCodeDelegate)
    {
        _equalsDelegate      = equalsDelegate      ?? ((tx, ty) => object.Equals(tx, ty));
        _getHashCodeDelegate = getHashCodeDelegate ?? (t => t.GetSafeHashCode());
    }

    public bool Equals(T x, T y) => _equalsDelegate(x, y);

    public int GetHashCode(T obj) => _getHashCodeDelegate(obj);
}
Unsnap answered 6/10, 2022 at 20:52 Comment(1)
Please don't post code-only answers. The main audience, future readers, will be grateful to see explained why this answers the question instead of having to infer it from the code. Also, since this is an old question, please explain how it complements the other answers.Casefy
E
0

Adding nullable reference types annotation's to ldp615's answer.

public static class Equality<T> where T : class
{
    public static IEqualityComparer<T> CreateComparer<V>(Func<T, V> keySelector) where V : notnull
    {
        return CreateComparer(keySelector, EqualityComparer<V>.Default);
    }

    public static IEqualityComparer<T> CreateComparer<V>(
        Func<T, V> keySelector,
        IEqualityComparer<V> comparer)
        where V : notnull
    {
        return new KeyEqualityComparer<V>(keySelector, comparer);
    }

    class KeyEqualityComparer<V> : IEqualityComparer<T> where V : notnull
    {
        readonly Func<T, V> keySelector;
        readonly IEqualityComparer<V> comparer;

        public KeyEqualityComparer(
            Func<T, V> keySelector,
            IEqualityComparer<V> comparer)
        {
            if (keySelector == null)
                throw new ArgumentNullException(nameof(keySelector));
            if (comparer == null)
                throw new ArgumentNullException(nameof(comparer));

            this.keySelector = keySelector;
            this.comparer = comparer;
        }

        public bool Equals(T? x, T? y)
        {
            if (x == null || y == null)
            {
                if(x == y)
                {
                    return true;
                }

                return false;
            }

            return comparer.Equals(keySelector(x), keySelector(y));
        }

        public int GetHashCode(T obj)
        {
            return comparer.GetHashCode(keySelector(obj));
        }
    }
}
Exorcist answered 19/11, 2023 at 13:18 Comment(0)
D
-2

I don't know of an existing class but something like:

public class MyComparer<T> : IEqualityComparer<T>
{
  private Func<T, T, bool> _compare;
  MyComparer(Func<T, T, bool> compare)
  {
    _compare = compare;
  }

  public bool Equals(T x, Ty)
  {
    return _compare(x, y);
  }

  public int GetHashCode(T obj)
  {
    return obj.GetHashCode();
  }
}

Note: I haven't actually compiled and run this yet, so there might be a typo or other bug.

Delectation answered 18/9, 2008 at 23:55 Comment(1)
NB same issue as identified in comment on #98533 - CANT assume obj.GetHashCode() makes senseKamalakamaria

© 2022 - 2024 — McMap. All rights reserved.