Since I couldn't find an answer that explains why we should override GetHashCode
and Equals
for custom structs and why the default implementation "is not likely to be suitable for use as a key in a hash table", I'll leave a link to this blog post, which explains why with a real-case example of a problem that happened.
I recommend reading the whole post, but here is a summary (emphasis and clarifications added).
Reason the default hash for structs is slow and not very good:
The way the CLR is designed, every call to a member defined in System.ValueType
or System.Enum
types [may] cause a boxing allocation [...]
An implementer of a hash function faces a dilemma: make a good distribution of the hash function or to make it fast. In some cases, it's possible to achieve them both, but it is hard to do this generically in ValueType.GetHashCode
.
The canonical hash function of a struct "combines" hash codes of all the fields. But the only way to get a hash code of a field in a ValueType
method is to use reflection. So, the CLR authors decided to trade speed over the distribution and the default GetHashCode
version just returns a hash code of a first non-null field and "munges" it with a type id [...] This is a reasonable behavior unless it's not. For instance, if you're unlucky enough and the first field of your struct has the same value for most instances, then a hash function will provide the same result all the time. And, as you may imagine, this will cause a drastic performance impact if these instances are stored in a hash set or a hash table.
[...] Reflection-based implementation is slow. Very slow.
[...] Both ValueType.Equals
and ValueType.GetHashCode
have a special optimization. If a type does not have "pointers" and is properly packed [...] then more optimal versions are used: GetHashCode
iterates over an instance and XORs blocks of 4 bytes and Equals
method compares two instances using memcmp
. [...] But the optimization is very tricky. First, it is hard to know when the optimization is enabled [...] Second, a memory comparison will not necessarily give you the right results. Here is a simple example: [...] -0.0
and +0.0
are equal but have different binary representations.
Real-world issue described in the post:
private readonly HashSet<(ErrorLocation, int)> _locationsWithHitCount;
readonly struct ErrorLocation
{
// Empty almost all the time
public string OptionalDescription { get; }
public string Path { get; }
public int Position { get; }
}
We used a tuple that contained a custom struct with default equality implementation. And unfortunately, the struct had an optional first field that was almost always equals to [empty string]. The performance was OK until the number of elements in the set increased significantly causing a real performance issue, taking minutes to initialize a collection with tens of thousands of items.
So, to answer the question "in what cases I should pack my own and in what cases I can safely rely on the default implementation", at least in the case of structs, you should override Equals
and GetHashCode
whenever your custom struct might be used as a key in a hash table or Dictionary
.
I would also recommend implementing IEquatable<T>
in this case, to avoid boxing.
As the other answers said, if you're writing a class, the default hash using reference equality is usually fine, so I wouldn't bother in this case, unless you need to override Equals
(then you would have to override GetHashCode
accordingly).
GetHashCode()
has been overridden) by usingSystem.Runtime.CompilerServices.RuntimeHelpers.GetHashCode(obj)
– Burch