C# .NET GetHashCode function question
Asked Answered
C

4

10

Hi I have a class with 6 string properties. A unique object will have different values for atleast one of these fields

To implement IEqualityComparer's GetHashCode function, I am concatenating all 6 properties and calling the GetHashCode on the resultant string.

I had the following doubts:

  1. Is it necessary to call the GetHashcode on a unique value?
  2. Will the concatenation operation on the six properties make the comparison slow?
  3. Should I use some other approach?
Colander answered 5/9, 2011 at 14:8 Comment(2)
Are you planning comparing your objects somewhere such as sorting them in an array or some such? That will change wether or not you need to implement GetHashCodeSweetsop
hi mydogisbox, I am using it for the List.Contains method and passing the comparer object to it. I have already implemented Equals and dont know the right approach for GetHashcodeColander
V
3

GetHashCode does not need to return unequal values for "unequal" objects. It only needs to return equal values for equal objects (it also must return the same value for the lifetime of the object).

This means that:

  1. If two objects compare as equal with Equals, then their GetHashCode must return the same value.
  2. If some of the 6 string properties are not strictly read-only, they cannot take part in the GetHashCode implementation.

If you cannot satisfy both points at the same time, you should re-evaluate your design because anything else will leave the door open for bugs.

Finally, you could probably make GetHashCode faster by calling GetHashCode on each of the 6 strings and then integrating all 6 results in one value using some bitwise operations.

Vahe answered 5/9, 2011 at 14:14 Comment(9)
Hi Jon, none of my properties are readonly. However, They all have private setters though so can be modified only from the constructor. Will this affect their usage in GetHashCode?Colander
@ganeshran: Then they are effectively read-only over the lifetime of the object (i.e. they could be implemented with readonly backing fields if you wanted to), which is sufficient. You 'll be OK.Vahe
@Vahe Reading your conditions for the GetHashCode() implementation, I see that it is a problem with conflicting requirements. In fact, you're only partly correct. The real requirements for GetHashCode() method are: 1. It should return same values for equal objects. 2. It should return the same value for the same object while it isn't modified. 3. The GetHashCode() should be fast. Agree, that there are some other recommendations for it. For example this one: For the best performance, a hash function should generate a random distribution for all input.Normally
This sentence answered it for me "GetHashCode does not need to return unequal values for "unequal" objects. It only needs to return equal values for equal objects (it also must return the same value for the lifetime of the object)." I missed it initially.Colander
@ganeshran: Note that it is desirable to return unequal values for unequal objects; it's just not required (and indeed, it is impossible to achieve mathematically). return 42; is a "correct" implementation in the technical sense, but it will make your performance suffer if you put these objects into dictionaries.Vahe
@Vahe - Yes now I understand this after reading the responses. My initial misconception was that it was a mandatory requirement to return unequal values for unequal objects. Thanks!Colander
@Jon: If the GetHashCode() would force us to use the read-only fields then it could be impossible for some classes. For example imagine a class Node<T> which is used in the LinkedList<T> class. The only fields it contains are: reference to the next node and the stored value. Both of these fields are mutable (non read-only). The Equals() method is defined on list class. It simply compares all stored items of two lists and returns true if each pair of items is the same. So, how can we write the GetHashCode() for the list if we don't have any immutable field?Normally
@keykeeper: It would depend on how you define "equality" when we 're talking about lists. In this particular case I would simply suggest to not implement IEquatable for the list class; the most common notion of collection equality is already implemented in LINQ asEnumerable.SequenceEquals.Vahe
@Jon: Well, I didn't know about the Enumerable.SequenceEquals which can be applied in the described case perfectly. But there is still no way to implement the GetHashCode for my imaginary LinkedList. And to be honest, your idea about evasion of IEquatable interface implementation doesn't look great for me: The Enumerable.SequenceEquals isn't well known for my nearest colleagues in contrast to Equals.Normally
K
4

If your string fields are named a-f and known not to be null, this is ReSharper's proposal for your GetHashCode()

public override int GetHashCode() {
  unchecked {
    int result=a.GetHashCode();
    result=(result*397)^b.GetHashCode();
    result=(result*397)^c.GetHashCode();
    result=(result*397)^d.GetHashCode();
    result=(result*397)^e.GetHashCode();
    result=(result*397)^f.GetHashCode();
    return result;
  }
}
Kovacs answered 5/9, 2011 at 14:30 Comment(0)
G
3

GetHashCode() should return the same hash code for all objects that return true if you call Equals() on those objects. This means, for example, that you can return zero as the hash code regardless of what the field values are. But that would make your object very inefficient when stored in data structures such as hash tables.

Combining the strings is one option, but note that you could for example combine just two of the stringsfor the hash code (while still comparing all the strings in equals!).

You can also combine the hashes of the six separate strings, rather than computing a single hash for a combined string. See for example Quick and Simple Hash Code Combinations

I'm not sure if this will be significantly faster than concatenating the string.

Groveman answered 5/9, 2011 at 14:13 Comment(4)
Thanks Anders, I am using it only for Contains method comparison. If I combine only two strings for the hashcode, wouldnt the hashcodes be same if the two values in objects are same? Would this mess up the comparisons, or does the GetHashCode have no affect on the comparison itself, and only impacts the performanceColander
Others have already made this point, but I'd like to address it in a different way because it seems your intuition is stuck. Observe that GetHashCode() returns an int, which can take on only 2^32 different values. Your object, comprising 6 strings of arbitrary length, can obviously take on a far large number of values. Through this example we can easily see that GetHashCode() cannot possibly be a unique value for all possible values of your object. It must only satisfy this property: "if a.Equals(b) then a.GetHashCode()==b.GetHashCode()"; and notice that the "if" does NOT go both ways.Kovacs
Practical considerations for GetHashCode() are that it be "good" and "fast". To make it "fast" I would try to avoid all memory allocation and string copying; to make it "good" is a subject of some nuance but in practice it suffices to swizzle together the GetHashCode() values of the sub-objects as @Vahe suggests. I'll post ReSharper's suggestion as an "answer" so I can get code formatting.Kovacs
Thanks Corey, I get it now. Sorry If my responses seemed noobish - as I didnt really get the actual way the GetHashCode works and is used in comparisons. I got it now.Colander
V
3

GetHashCode does not need to return unequal values for "unequal" objects. It only needs to return equal values for equal objects (it also must return the same value for the lifetime of the object).

This means that:

  1. If two objects compare as equal with Equals, then their GetHashCode must return the same value.
  2. If some of the 6 string properties are not strictly read-only, they cannot take part in the GetHashCode implementation.

If you cannot satisfy both points at the same time, you should re-evaluate your design because anything else will leave the door open for bugs.

Finally, you could probably make GetHashCode faster by calling GetHashCode on each of the 6 strings and then integrating all 6 results in one value using some bitwise operations.

Vahe answered 5/9, 2011 at 14:14 Comment(9)
Hi Jon, none of my properties are readonly. However, They all have private setters though so can be modified only from the constructor. Will this affect their usage in GetHashCode?Colander
@ganeshran: Then they are effectively read-only over the lifetime of the object (i.e. they could be implemented with readonly backing fields if you wanted to), which is sufficient. You 'll be OK.Vahe
@Vahe Reading your conditions for the GetHashCode() implementation, I see that it is a problem with conflicting requirements. In fact, you're only partly correct. The real requirements for GetHashCode() method are: 1. It should return same values for equal objects. 2. It should return the same value for the same object while it isn't modified. 3. The GetHashCode() should be fast. Agree, that there are some other recommendations for it. For example this one: For the best performance, a hash function should generate a random distribution for all input.Normally
This sentence answered it for me "GetHashCode does not need to return unequal values for "unequal" objects. It only needs to return equal values for equal objects (it also must return the same value for the lifetime of the object)." I missed it initially.Colander
@ganeshran: Note that it is desirable to return unequal values for unequal objects; it's just not required (and indeed, it is impossible to achieve mathematically). return 42; is a "correct" implementation in the technical sense, but it will make your performance suffer if you put these objects into dictionaries.Vahe
@Vahe - Yes now I understand this after reading the responses. My initial misconception was that it was a mandatory requirement to return unequal values for unequal objects. Thanks!Colander
@Jon: If the GetHashCode() would force us to use the read-only fields then it could be impossible for some classes. For example imagine a class Node<T> which is used in the LinkedList<T> class. The only fields it contains are: reference to the next node and the stored value. Both of these fields are mutable (non read-only). The Equals() method is defined on list class. It simply compares all stored items of two lists and returns true if each pair of items is the same. So, how can we write the GetHashCode() for the list if we don't have any immutable field?Normally
@keykeeper: It would depend on how you define "equality" when we 're talking about lists. In this particular case I would simply suggest to not implement IEquatable for the list class; the most common notion of collection equality is already implemented in LINQ asEnumerable.SequenceEquals.Vahe
@Jon: Well, I didn't know about the Enumerable.SequenceEquals which can be applied in the described case perfectly. But there is still no way to implement the GetHashCode for my imaginary LinkedList. And to be honest, your idea about evasion of IEquatable interface implementation doesn't look great for me: The Enumerable.SequenceEquals isn't well known for my nearest colleagues in contrast to Equals.Normally
C
0

You can use the behavior from:

http://moh-abed.com/2011/07/13/entities-and-value-objects/

Chowchow answered 5/9, 2011 at 20:20 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.