Value vs. Reference equality in generic List<T>.Contains()
Asked Answered
B

2

6

Attempt #3 to simplify this question:

A generic List<T> can contain any type - value or reference. When checking to see if a list contains an object, .Contains() uses the default EqualityComparer<T> for type T, and calls .Equals() (is my understanding). If no EqualityComparer has been defined, the default comparer will call .Equals(). By default, .Equals() calls .ReferenceEquals(), so .Contains() will only return true if the list contains the exact same object.

Until you need to override .Equals() to implement value equality, at which point the default comparer says two objects are the same if they have the same values. I can't think of a single case where that would be desirable for a reference type.

What I'm hearing from @Enigmativity is that implementing IEqualityComparer<StagingDataRow> will give my typed DataRow a default equality comparer that will be used instead of the default comparer for Object – allowing me to implement value equality logic in StagingDataRow.Equals().

Questions:

  1. Am I understanding that correctly?
  2. Am I guaranteed that everything in the .NET framework will call EqualityComparer<StagingDataRow>.Equals() instead of StagingDataRow.Equals()?
  3. What should IEqualityComparer<StagingDataRow>.GetHashCode(StagingDataRow obj) hash against, and should it return the same value as StagingDataRow.GetHashCode()?
  4. What is passed to IEqualityComparer<StagingDataRow>.GetHashCode(StagingDataRow obj)? The object I'm looking for or the object in the list? Both? It would be strange to have an instance method accept itself as a parameter...

In general, how does one separate value equality from reference equality when overriding .Equals()?



The original line of code spurring this question:

//  For each ID, a collection of matching rows
Dictionary<string, List<StagingDataRow>> stagingTableDictionary;


StagingTableMatches.AddRange(stagingTableDictionary[perNr].Where(row => !StagingTableMatches.Contains(row)));

.

Binnings answered 15/5, 2016 at 21:11 Comment(14)
Your question is really hard to understand. Are you planning to inherit from DataRow and override the equality methods? Or do you plan to create a custom IEqualityComparer? Where are you changing the data in the DataRow? Where is the code that needs to to compare DataRows based on values?Thrombosis
It really just sounds like you need to create your own IEqualityComparer<DataRow> instance and use that in your queries.Refill
Yeah, I know, but I can't win... if I try to simplify questions, I get slammed for not having enough info. But I would have preferred to skip everything in the middle and just ask the question. I'll try to summarize it up front.Binnings
@YacoubMassad, I did my best to summarize the question up front. If it's still not clear, ask away. Thanks for looking at it.Binnings
It's still pretty unclear what you're asking here. I think @Refill is probably correct - you don't need to (or want to) override Equals here. As Yacoub asked, where is the code that compares these? It doesn't seem any of the code you've included is relevant to your problem. A minimal reproducible example would be useful.Sphacelus
I'm not sure how I would use a custom IEqualityComparer in a linq query. The code that compares these would just be an if(row.equals(otherrow)). The issue is the .Where(match => !DatafeedRow.StagingTableMatches.Contains(match)). If linq would use the custom IEqualityComparer for that, then I could override .Equals(). But that seems to be asking for trouble - I'd have to guarantee that any code written against those DataRows would use my custom comparer.Binnings
I did update the question to make it clearer I'm working with a typed DataRow - i.e. yes, I'm inheriting from DataRow.Binnings
@JamesKing, if there are multiple ways of comparing two data rows, then you cannot put these two ways into the same DataRowSubClass class. You have to put them into two different IEqualityComparers. Or you can put one way of comparing into DataRowSubClass and another one into its own IEqualityComparer. It is a rule that GetHashCode should return the same value for two objects that Equals return true for.Thrombosis
I have completely rewritten this question... I think I understand what @Refill is driving at, but need to be sure.Binnings
Why not use the Enumerable.Contains( that takes in a equality comparer, then it does not matter what the default comparer is.Connaught
I added the line of code that started this question (the original version had a very long explanation of what I need to do and why). The short answer (if I understand your question) is that doing so would break encapsulation. If everyone needs to know (and remember) to always use a custom equality comparer, and that custom comparer is already defined/provided/not interchangeable, it should just be part of my object, true?Binnings
@JamesKing to your last point; if that definition of "equality" should be the default, then override Equals and GetHashCode and implement IEquatable , not IEqualityComparer.Rector
@JamesKing - With regard to your "Attempt #3" there is one glaring issues that I want to point out. You cannot override .Equals without also overriding .GetHashCode - if you don't you will break equality. However, the value of GetHashCode mustn't change throughout the usage of the object if it is used in any structure or query that calls .GetHashCode. Therefore, you can only safely use immutable values to compute the hash code. So your only safe choice is to implement your own IEqualityComparer<DataRow> to handle this situation.Refill
As an aside, it looks like StagingTableMatches should be a HashSet<T> rather than a list. You can specify an equality comparer if required.Sphacelus
J
4

Ok, let's handle a few misconceptions first:

By default, .Equals() calls .ReferenceEquals(), so .Contains() will only return true if the list contains the exact same object.

This is true, but only for reference types. Value types will implement a very slow reflection-based Equals function by default, so it's in your best interest to override that.

I can't think of a single case where that would be desirable for a reference type.

Oh I'm sure you can... String is a reference type for instance :)

What I'm hearing from @Enigmativity is that implementing IEqualityComparer<StagingDataRow> will give my typed DataRow a default equality comparer that will be used instead of the default comparer for Object – allowing me to implement value equality logic in StagingDataRow.Equals().

Err... No.

IEqualityComaprer<T> is an interface which lets you delegate equality comparison to a different object. If you want a different default behavior for your class, you implement IEquatable<T>, and also delegate object.Equals to that for consistency. Actually, overriding object.Equals and object.GetHashCode is sufficient to change the default equality comparison behavior, but also implementing IEquatable<T> has additional benefits:

  • It makes it more obvious that your type has custom equality comparison logic - think self documenting code.
  • It improves performance for value types, since it avoids unnecessary boxing (which happens with object.Equals)

So, for your actual questions:

Am I understanding that correctly?

You still seem a bit confused about this, but don't worry :)

Enigmativity actually suggested that you create a different type which implements IEqualityComparer<T>. Looks like you misunderstood that part.

Am I guaranteed that everything in the .NET framework will call EqualityComparer<StagingDataRow>.Equals() instead of StagingDataRow.Equals()

By default, the (properly written) framework data structures will delegate equality comparison to EqualityComparer<StagingDataRow>.Default, which will in turn delegate to StagingDataRow.Equals.

What should IEqualityComparer<StagingDataRow>.GetHashCode(StagingDataRow obj) hash against, and should it return the same value as StagingDataRow.GetHashCode()

Not necessarily. It should be self-consistent: if myEqualitycomaprer.Equals(a, b) then you must ensure that myEqualitycomaprer.GetHashCode(a) == myEqualitycomaprer.GetHashCode(b).

It can be the same implementation than StagingDataRow.GetHashCode, but not necessarily.

What is passed to IEqualityComparer<StagingDataRow>.GetHashCode(StagingDataRow obj)? The object I'm looking for or the object in the list? Both? It would be strange to have an instance method accept itself as a parameter...

Well, by now I hope you've understood that the object which implements IEqualityComparer<T> is a different object, so this should make sense.


Please read my answer on Using of IEqualityComparer interface and EqualityComparer class in C# for more in-depth information.

Jackdaw answered 16/5, 2016 at 18:35 Comment(4)
Thanks for the further clarification... I get what Enigmativity was saying now, that IEqualityComparer belongs to a separate class (and is therefore not the tool I'm looking for). I'm not worried about value types... in fact, this issue is specific to reference types (string doesn't count, I can't inherit from it :) ). But even if I implement/override IEquatable in my class, I can only pick one "equals" - value or reference. The assumption that these are interchangeable seems like a flawed design to me.Binnings
You have to decide which semantics to give to your reference type. I agree that most of the time you should use reference equality semantics (principle of least astonishment), but for other cases value type semantics may make more sense (as for string for instance). Remember that you can always force reference semantics by explicitly using ReferenceEquals, and if you need to handle several different equality comparison types (such as canse sensitive/insensitive string comparison), you'll have to use an IEqualityComparer<T> so you can tell which one you want.Jackdaw
Yeah, where I'm coming down in all this is that I have to leave the default .Equals() in place because the framework expects reference equality by default. Therefore my value equality tests have to be in a separate method. Which works, just... poop. Thanks.Binnings
The framework makes no assumptions whatsoever from any custom type. You don't have to leave reference equality, there's no such guideline. If the rest of your code assumes so, then it's a different matter of course, but the framework just... doesn't care.Jackdaw
R
2

Am I understanding that correctly?

Partially - the "default" IEqualityComparer will use either (in order):

  1. The implementation of IEquatable<T>
  2. An overridden Equals(object)
  3. the base object.Equals(object), which is reference equality for reference types.

I think you are confusing two different methods of defining "equality" in a custom type. One is by implementing IEquatable<T> Which allows an instance of a type to determine if it's "equal" to another instance of the same type.

The other is IEqualityComparer<T> which is an independent interface that determines if two instance of that type are equal.

So if your definition of Equals should apply whenever you are comparing two instances, then implement IEquatable, as well as overriding Equals (which is usually trivial after implementing IEquatable) and GetHashCode.

If your definition of "equal" only applies in a particular use case, then create a different class that implements IEqualityComparer<T>, then pass an instance of it to whatever class or method you want that definition to apply to.

Am I guaranteed that everything in the .NET framework will call EqualityComparer<StagingDataRow>.Equals() instead of StagingDataRow.Equals()?

No - only types and methods that accept an instance of IEqualityComparer as a parameter will use it.

What should IEqualityComparer<StagingDataRow>.GetHashCode(StagingDataRow obj) hash against, and should it return the same value as StagingDataRow.GetHashCode()?

It will compute the hash code for the object that's passed in. It doesn't "compare" the hash code to anything. It does not necessarily have to return the same value as the overridden GetHashCode, but it must follow the rules for GetHashCode, particularly that two "equal" objects must return the same hash code.

It would be strange to have an instance method accept itself as a parameter...

Which is why IEqualityComparer is generally implemented on a different class. Note that IEquatable<T> doesn't have a GetHashCode() method, because it doesn't need one. It assumes that GetHashCode is overridden to match the override of object.Equals, which should match the strongly-typed implementation of IEquatable<T>

Bottom Line

If you want your definition of "equal" to be the default for that type, implement IEquatable<T> and override Equals and GetHashCode. If you want a definition of "equal" that is just for a specific use case, then create a different class that implements IEqualityComparer<T> and pass an instance of it to whatever types or methods need to use that definition.

Also, I would note that you very rarely call these methods directly (except Equals). They are usually called by the methods that use them (like Contains) to determine if two objects are "equal" or to get the hash code for an item.

Rector answered 16/5, 2016 at 18:30 Comment(8)
Got it. Or at least half of it. I feel like I'm futilely flopping around trying to explain the issue, which is really just that overriding .Equals() to give me value equality breaks everything that assumes .Equals() means the same object. Two instances of a class can have value equality and be in the same List<T>, but .Contains() should not consider them as equal. I implement IEquatable, but I don't want IEquatable.Equals() to be used for lookups that should be matching the instances themselves - not their values.Binnings
"I don't want IEquatable.Equals() to be used for lookups that should be matching the instances themselves" - Then don't override Equals or implement IEquatable. Let reference equality be the default, and create a class that implements IEqualityComparer and pass that in when you want to consider value equality.Rector
Then everyone who uses my class needs to know they must use my custom comparer everywhere comparisons are required, and if they forget, will have a subtle and intermittent bug to pin down. I'm deliberately building unintended side effects into the design.Binnings
But you just said you don't want to change the default since it will break existing code! One has to be the default - you can override Equals to be value equality, and have a custom comparer that goes back to reference equality, or you can leave reference equality as the default and have a custom comparer that uses value equality. There's no way to have both as the "default".Rector
I get what you're saying, and probably 12 hours ago I was ready to say 'screw it' and just implement an 'Equals' function with a different name for value equality, but I was really hoping there was a cleaner solution... I can't think of a single case where I would want to override .Equals() and want to use value equality to retrieve different instances from a collection. I have to be missing something major - this can't be an uncommon scenario.Binnings
@JamesKing That's where I'm confused - in your original question you say you do want to compare objects by their values, so apparently you do have a valid use case. The bottom line is you can either change the default or implement it in a separate class. Pick your poison...Rector
Yeah, that's what I'm getting - one or the other :( It's hard to be clear without a code dump... When I'm adding objects to a collection, I'm comparing by matching values (user records in more than one datafeed). But I have three unique identifiers, none of which are required, and all of which can change during a user's employment... so I have to say "add to collection where the row doesn't already exist" (because the same row can match on more than one ID). I'm basically in a straight jacket with one foot stapled to the back of my head. : | Can repost a bigger code snippet if you want.Binnings
It really sounds like you just want a custom IEqualityComparer for that scenario - I'm not sure what the the concern is. If you want to make it more obvious to possible users you can make it a nested class so it's inseparable from the class definition.Rector

© 2022 - 2024 — McMap. All rights reserved.