There is one case where using hashcodes as a short-cut on equality comparisons makes sense.
Consider the case where you are building a hashtable or hashset. In fact, let's just consider hashsets (hashtables extend that by also holding a value, but that isn't relevant).
There are various different approaches one can take, but in all of them you have a small number of slots the hashed values can be placed in, and we take either the open or closed approach (which just for fun, some people use the opposite jargon for to others); if we collide on the same slot for two different objects we can either store them in the same slot (but having a linked list or such for where the objects are actually stored) or by re-probing to pick a different slot (there are various strategies for this).
Now, with either approach, we're moving away from the O(1) complexity we want with a hashtable, and towards an O(n) complexity. The risk of this is inversely proportional to the number of slots available, so after a certain size we resize the hashtable (even if everything was ideal, we'd eventually have to do this if the number of items stored were greater than the number of slots).
Re-inserting the items on a resize will obviously depend on the hash codes. Because of this, while it rarely makes sense to memoise GetHashCode()
in an object (it just isn't called often enough on most objects), it certainly does make sense to memoise it within the hash table itself (or perhaps, to memoise a produced result, such as if you re-hashed with a Wang/Jenkins hash to reduce the damage caused by bad GetHashCode()
implementations).
Now, when we come to insert our logic is going to be something like:
- Get hash code for object.
- Get slot for object.
- If slot is empty, place object in it and return.
- If slot contains equal object, we're done for a hashset and have the position to replace the value for a hashtable. Do this, and return.
- Try next slot according to collision strategy, and return to item 3 (perhaps resizing if we loop this too often).
So, in this case we have to get the hash code before we compare for equality. We also have the hash code for existing objects already pre-computed to allow for resize. The combination of these two facts means that it makes sense to implement our comparison for item 4 as:
private bool IsMatch(KeyType newItem, KeyType storedItem, int newHash, int oldHash)
{
return ReferenceEquals(newItem, storedItem) // fast, false negatives, no false positives (only applicable to reference types)
||
(
newHash == oldHash // fast, false positives, no fast negatives
&&
_cmp.Equals(newItem, storedItem) // slow for some types, but always correct result.
);
}
Obviously, the advantage of this depends on the complexity of _cmp.Equals
. If our key type was int
then this would be a total waste. If our key type where string and we were using case-insensitive Unicode-normalised equality comparisons (so it can't even shortcut on length) then the saving could well be worth it.
Generally memoising hash codes doesn't make sense because they aren't used often enough to be a performance win, but storing them in the hashset or hashtable itself can make sense.