Implement GetHashCode on a class that has wildcard Equatability
Asked Answered
U

9

9

Suppose I want to be able to compare 2 lists of ints and treat one particular value as a wild card.

e.g. If -1 is a wild card, then

{1,2,3,4} == {1,2,-1,4} //returns true

And I'm writing a class to wrap all this logic, so it implements IEquatable and has the relevant logic in public override bool Equals()

But I have always thought that you more-or-less had to implement GetHashCode if you were overriding .Equals(). Granted it's not enforced by the compiler, but I've been under the impression that if you don't then you're doing it wrong.

Except I don't see how I can implement .GetHashCode() without either breaking its contract (objects that are Equal have different hashes), or just having the implementation be return 1.

Thoughts?

Unthinkable answered 22/7, 2014 at 15:37 Comment(6)
You break the concept of equality, it is not transitive any longer. In your case, A==B and B==C does not mean that A==C.Override
Maybe you need IEqualityComparer?Can
@Override that should be an answer!Liquorish
I think you mean "objects that are Equal have equal hashes".Overcapitalize
@Liquorish If you insist :). Thanks!Override
@Overcapitalize No, I meant what I said - I was specifying the manner in which the GetHashCode contract was being broken :)Unthinkable
B
10

This implementation of Equals is already invalid, as it is not transitive. You should probably leave Equals with the default implementation, and write a new method like WildcardEquals (as suggested in the other answers here).

In general, whenever you have changed Equals, you must implement GetHashCode if you want to be able to store the objects in a hashtable (e.g. a Dictionary<TKey, TValue>) and have it work correctly. If you know for certain that the objects will never end up in a hashtable, then it is in theory optional (but it would be safer and clearer in that case to override it to throw a "NotSupportedException" or always return 0).

The general contract is to always implement GetHashCode if you override Equals, as you can't always be sure in advance that later users won't put your objects in hashtables.

Basilisk answered 22/7, 2014 at 15:49 Comment(2)
If you override object.Equals(object) and do not think hash codes will be needed, either do public override int GetHashCode() { throw new NotSupportedException(); } or do public override int GetHashCode() { return 0; }. Not overriding at all (despite compiler warning), or returning base.GetHashCode() will lead to strange behavior that will confuse you if, some day, the class is actually used in a way where the hash code matters.Antenatal
Thanks Rich - this is what I've ended up doing. I had two scenarios: - one scenario where I would never have wildcards, but needed IEquatable and GetHashCode (for use in GroupBy) - and another scenario where I might have wildcards, but I didn't need IEquatable, I just needed to get a bool back to indicate whether the two were equal. So I've implemented IEquatable with a proper Equivalence test, and have left the wild cards to be handled by a WildCardEquals methodUnthinkable
S
6

In this case, I would create a new or extension method, WildcardEquals(other), instead of using the operators.

I wouldn't recommend hiding this kind of complexity.

Stopper answered 22/7, 2014 at 15:43 Comment(0)
O
4

From a logical point of view, we break the concept of equality. It is not transitive any longer. So in case of wildcards, A==B and B==C does not mean that A==C.

From a technical pount of view, returning the same value from GetHashCode() is not somenting unforgivable.

Override answered 22/7, 2014 at 15:51 Comment(0)
P
3

The only possible idea I see is to exploit at least the length, e.g.:

public override int GetHashCode()
{
    return this.Length.GetHashCode()
}
Pettis answered 22/7, 2014 at 15:41 Comment(3)
That seems weird.... For example - adding two of these lists to a HashSet would collide, if they had the same length, with completely different elements.Stopper
@DaveBish yes, but collisions only affect performance, not correctness. The answer is quite safe for this kind of thing (if ever OP wants to put this collection in hash tables).Liquorish
This is a good idea - if I'd ended up needing a HashCode implementation, I would have used this! But as per chosen answer the correct design was in fact not to do it like this at all.Unthinkable
C
2

It's recommended, but not mandatory at all. If you don't need that custom implementation of GetHashCode, just don't do it.

Chane answered 22/7, 2014 at 15:41 Comment(0)
T
2

GetHashCode is generally only important if you're going to be storing elements of your class in some kind of collection, such as a set. If that's the case here then I don't think you're going to be able to achieve consistent semantics since as @AlexD points out equality is no longer transitive.

For example, (using string globs rather than integer lists) if you add the strings "A", "B", and "*" to a set, your set will end up with either one or two elements depending on the order you add them in.

If that's not what you want then I'd recommend putting the wildcard matching into a new method (e.g. EquivalentTo()) rather than overloading equality.

Tract answered 22/7, 2014 at 15:48 Comment(0)
I
2

Having GetHashCode() always return a constant is the only 'legal' way of fulfilling the equals/hashcode constraint.

It'll potentially be inefficient if you put it in a hashmap, or similar, but that might be fine (non-equal hashcodes imply non-equality, but equal hashcodes imply nothing).

I think this is the only possible valid option there. Hashcodes essentially exist as keys to look things up by quickly, and since your wildcard must match every item, its key for lookup must equal every item's key, so they must all be the same.

As others have noted though, this isn't what equals is normally for, and breaks assumptions that many other things may use for equals (such as transitivity - EDIT: turns out this is actually contractual requirement, so no-go), so it's definitely worth at least considering comparing these manually, or with an explicitly separate equality comparer.

Isomorphism answered 22/7, 2014 at 15:49 Comment(2)
"breaks assumptions" -- it breaks the "equals" contract, so all bets are offBasilisk
Ah indeed, hadn't realised Equals actually specified that explicitly.Isomorphism
U
1

Since you've changed what "equals" means (adding in wildcards changes things dramatically) then you're already outside the scope of the normal use of Equals and GetHashCode. It's just a recommendation and in your case it seems like it doesn't fit. So don't worry about it.

That said, make sure you're not using your class in places that might use GetHashCode. That can get you in a load of trouble and be hard to debug if you're not watching for it.

Uncovered answered 22/7, 2014 at 15:43 Comment(0)
D
1

It is generally expected that Equals(Object) and IEquatable<T>.Equals(T) should implement equivalence relations, such that if X is observed to be equal to Y, and Y is observed to be equal to Z, and none of the items have been modified, X may be assumed to be equal to Z; additionally, if X is equal to Y and Y does not equal Z, then X may be assumed not to equal Z either. Wild-card and fuzzy comparison methods are do not implement equivalence relations, and thus Equals should generally not be implemented with such semantics.

Many collections will kinda-sorta work with objects that implement Equals in a way that doesn't implement an equivalence relation, provided that any two objects that might compare equal to each other always return the same hash code. Doing this will often require that many things that would compare unequal to return the same hash code, though depending upon what types of wildcard are supported it may be possible to separate items to some degree.

For example, if the only wildcard which a particular string supports represents "arbitrary string of one or more digits", one could hash the string by converting all sequences of consecutive digits and/or string-of-digit wildcard characters into a single "string of digits" wildcard character. If # represents any digit, then the strings abc123, abc#, abc456, and abc#93#22#7 would all be hashed to the same value as abc#, but abc#b, abc123b, etc. could hash to a different value. Depending upon the distribution of strings, such distinctions may or may not yield better performance than returning a constant value.

Note that even if one implements GetHashCode in such a fashion that equal objects yield equal hashes, some collections may still get behave oddly if the equality method doesn't implement an equivalence relation. For example, if a collection foo contains items with keys "abc1" and "abc2", attempts to access foo["abc#"] might arbitrarily return the first item or the second. Attempts to delete the key "abc#" may arbitrarily remove one or both items, or may fail after deleting one item (its expected post-condition wouldn't be met, since abc# would be in the collection even after deletion).

Rather than trying to jinx Equals to compare hash-code equality, an alternative approach is to have a dictionary which holds for each possible wildcard string that would match at least one main-collection string a list of the strings it might possibly match. Thus, if there are many strings which would match abc#, they could all have different hash codes; if a user enters "abc#" as a search request, the system would look up "abc#" in the wild-card dictionary and receive a list of all strings matching that pattern, which could then be looked up individually in the main dictionary.

Dither answered 22/7, 2014 at 20:41 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.