Why use GetHashCode() over Equals()?
Asked Answered
B

5

11

HashSet<T>.Add first compares the results of GetHashCode. If those are equal, it calls Equals.

Now, my understanding is in order to implement GetHashCode, something must be done with the fields of an object. A simple example implementation can be found at What is the best algorithm for an overridden System.Object.GetHashCode?.

In my test comparing both on 1.000.000 pairs of objects filled with random data, performance is more or less equal between the two. GetHashCode is implemented as in the linked example, Equals simply calls Equals on all fields. So why would one want to use GetHashCode over Equals ?

Bismuthinite answered 10/6, 2011 at 10:53 Comment(2)
For a large object with many readonly fields, the hash code may be computed in the constructor and stored as an additional field in the object. In this case the Equals method would use the hash code field as an optimization, before comparing other field values. If the hash code is different, we can be sure the remaining fields are different as well. This works whether used in a hash table, or simply used in equality comparisons. GetHashCode returns the pre-computed hash code field. If some fields are also immutable objects, this can extend down many levels and speed up comparisons.Ballata
+1 for a good question. This is a good variant of infinite similar questions on the topic of GetHashCode on SO.Fernferna
R
20

For some types, an Equals test may be relatively expensive. It typically has to compare every field of the class. In other words, it takes linear time in the size of the class. Bigger classes are more expensive to compare for equality.

Now, what happens if you need to need to compare one object against 1000 others? Calling Equals 1000 times might get expensive. You need to do N*2000 field accesses, if N is the size of the class

GetHashCode instead generates a "mostly unique" integer based on the contents of the class. In other words, the class fields are accessed once. And once you have that, you can compare this integer to the 1000 integers that make up the other objects' hash codes.

Even in such a naive use case, we now only need N*1000 field accesses.

But what if we store the hash code? When we insert an object into a hash set, its hash code is computed once. Now, any time we want to do a lookup in the hash set, we just need to compute one hash code (that of the external object), and then you just have to compare simple integers. So N class field accesses (for the new object whose hash code we need to compute), plus a number of integer comparisons, which vary, depending on the algorithm, but are 1) relatively few, and 2) cheap.

Rebbeccarebe answered 10/6, 2011 at 11:12 Comment(4)
I hadn't thought of the posibility to store the hash. Clear explanation.Bismuthinite
@Stijn: in reality, hash tables are quite a bit more complex than this, but yeah, the basic idea is to avoid recomputing the hashRebbeccarebe
actually, you need to do N*2*1000 field access only if all the 1001 objects are Equals; getting the hash code, conversely, will surely hit every field in every object. In the end, you're gaining a factor 2 wrt to field comparisons in the worst case, loosing some time to get those hashes.Lesseps
Not to forget that if Equals is expensive, it's exact equivalent GetHashCode will be equally or more expensive. The real advantage starts when you compute it only once. Good point +1.Fernferna
K
8

Because if an algorithm wants to test if 1 object is already in a set of 1.000.000 objects, it has to call Equals 1.000.000 times, but GetHashCode() just once (and a few calls to Equals to eliminate objects which are different though having the same hash code).

Kamchatka answered 10/6, 2011 at 10:58 Comment(0)
L
2

GetHashCode lets you put thing into buckets - multiple objects can have the same hash code. Equals is then used to find matches within a bucket. This let's you find things in large collections very quickly

Lukash answered 10/6, 2011 at 10:57 Comment(1)
great answer :-). simple and the best oneGriffey
P
1

GetHashCode() gets you an integral value that you can use for hashtables. That hash code is one reason why hashtables are so performant. However there can be more than one objects with the same hashcode. That's why Equals() is called. If the objects are not equal, they can go into the same bucket, if they are equal, then it is already in the hashtable and does not need to be added.

Pickax answered 10/6, 2011 at 10:57 Comment(0)
M
1

The essential aspect of GetHashCode is that an observation that two objects' hashcodes differ constitutes not only an observation that the objects are different, but an observation of something much more powerful: if the hashcodes of all the items in one set have a property lacked by those of all the objects in another, then the sets have no items in common.

For example, if one puts into one set all objects where GetHashCode returns an even number, and into a different set all objects where GetHashCode returns an odd number, then is then given an object to look for, calling GetHashCode will allow one to instantly eliminate from consideration all the objects in one of the sets. If instead of using two sets, one has used twenty, one would be able to eliminate everything from nineteen sets. If 256 sets, one can eliminate 255. In many cases, if one adjusts the number of sets based upon the number of items one has, it will be possible to eliminate all but a handful of objects without having to look at any of them.

Looking at two objects' hashcodes to see if they might be equal will seldom be faster than merely checking the objects directly for equality. On the other hand, being able to know that one object isn't equal to 999,990 others without looking at them is apt to be a lot faster than looking at the, no matter how fast the equality comparison would otherwise be.

Moss answered 18/12, 2013 at 5:9 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.