Default implementation for Object.GetHashCode()

S

7

198

How does the default implementation for GetHashCode() work? And does it handle structures, classes, arrays, etc. efficiently and well enough?

I am trying to decide in what cases I should pack my own and in what cases I can safely rely on the default implementation to do well. I don't want to reinvent the wheel, if at all possible.

Subjectivism answered 6/4, 2009 at 3:25 Comment(5)

Have a look at the comment I left on the article: https://mcmap.net/q/49271/-gethashcode-extension-method – Shaum 11/7, 2009 at 10:6

See also https://mcmap.net/q/49272/-object-gethashcode – Colier 17/7, 2009 at 0:46

Aside: you can obtain the default hashcode (even when GetHashCode() has been overridden) by using System.Runtime.CompilerServices.RuntimeHelpers.GetHashCode(obj) – Burch 2/11, 2011 at 12:10

@MarcGravell thank you for contributing this, I was searching for exactly this answer. – Tranche 24/7, 2013 at 23:58

@MarcGravell But how would I do this with other method? – Conation 7/3, 2014 at 17:50

G

94

namespace System {
    public class Object {
        [MethodImpl(MethodImplOptions.InternalCall)]
        internal static extern int InternalGetHashCode(object obj);

        public virtual int GetHashCode() {
            return InternalGetHashCode(this);
        }
    }
}

InternalGetHashCode is mapped to an ObjectNative::GetHashCode function in the CLR, which looks like this:

FCIMPL1(INT32, ObjectNative::GetHashCode, Object* obj) {  
    CONTRACTL  
    {  
        THROWS;  
        DISABLED(GC_NOTRIGGER);  
        INJECT_FAULT(FCThrow(kOutOfMemoryException););  
        MODE_COOPERATIVE;  
        SO_TOLERANT;  
    }  
    CONTRACTL_END;  

    VALIDATEOBJECTREF(obj);  

    DWORD idx = 0;  

    if (obj == 0)  
        return 0;  

    OBJECTREF objRef(obj);  

    HELPER_METHOD_FRAME_BEGIN_RET_1(objRef);        // Set up a frame  

    idx = GetHashCodeEx(OBJECTREFToObject(objRef));  

    HELPER_METHOD_FRAME_END();  

    return idx;  
}  
FCIMPLEND

The full implementation of GetHashCodeEx is fairly large, so it's easier to just link to the C++ source code.

Greenfield answered 6/4, 2009 at 3:43 Comment(9)

That documentation quote must have come from a very early version. It is no longer written like this in current MSDN articles, probably because it is quite wrong. – Brumbaugh 21/7, 2010 at 18:43

They changed the wording, yes, but it still says basically the same thing: "Consequently, the default implementation of this method must not be used as a unique object identifier for hashing purposes." – Greenfield 21/7, 2010 at 23:31

Why does the documentation claim that implementation is not particularly useful for hashing? If an object is equal to itself and nothing else, any hash code method which will always return the same value for a given object instance, and will generally return different values for different instances, what's the problem? – Elviselvish 4/1, 2013 at 23:33

@Elviselvish Also, two objects that represent the same value have the same hash code only if they are the exact same object. Consider if you used this hash for strings (disregard string interning for the moment): new string('x', 5).GetHashCode() != new string('x', 5).GetHashCode() because these two strings are not the exact same object. Same value, different object. If you put one of them into a hash set (e.g. as a Dictionary key) you'd never be able to look them up again unless by coincidence: using the hashcode, myDictionary["xxxxx"] would likely look in the wrong hash bucket. – Emmanuelemmeline 25/4, 2013 at 1:28

@ta.speot.is: If what you want is to determine whether a particular instance has already been added into a dictionary, reference equality is perfect. With strings, as you note, one is usually more interested in whether a string containing the same sequence of characters has already been added. That's why string overrides GetHashCode. On the other hand, suppose you want to keep a count of how many times various controls process Paint events. You could use a Dictionary<Object, int[]> (every int[] stored would hold exactly one item). – Elviselvish 25/4, 2013 at 14:58

@ta.speot.is: When you get a Paint event from one of the controls you're watching, you could use MyControlCounts[Sender][0]++; (or some variation with TryGetValue). Even if the controls happened to define some form of value equality, that wouldn't be what you were interested in. You'd want to use reference equality along with the default (reference-based) hash code. – Elviselvish 25/4, 2013 at 15:2

@Elviselvish If the reference is the value you want to track, I imagine the default implementation of GetHashCode is fine. – Emmanuelemmeline 26/4, 2013 at 4:5

@It'sNotALie. Then thank Archive.org for having a copy ;-) – Cabrales 6/11, 2013 at 22:43

matrix transform vector – Zela 14/6, 2014 at 20:2

B

108

For a class, the defaults are essentially reference equality, and that is usually fine. If writing a struct, it is more common to override equality (not least to avoid boxing), but it is very rare you write a struct anyway!

When overriding equality, you should always have a matching Equals() and GetHashCode() (i.e. for two values, if Equals() returns true they must return the same hash-code, but the converse is not required) - and it is common to also provide ==/!= operators, and often to implement IEquatable<T> too.

These days, when generating a hash, the HashCode utility type is very useful; for example:

return HashCode.Combine(field1, field2); // multiple overloads available here

When that isn't available:

For generating the hash code, it is common to use a factored sum, as this avoids collisions on paired values - for example, for a basic 2 field hash:

unchecked // disable overflow, for the unlikely possibility that you
{         // are compiling with overflow-checking enabled
    int hash = 27;
    hash = (13 * hash) + field1.GetHashCode();
    hash = (13 * hash) + field2.GetHashCode();
    return hash;
}

This has the advantage that:

the hash of {1,2} is not the same as the hash of {2,1}
the hash of {1,1} is not the same as the hash of {2,2}

etc - which can be common if just using an unweighted sum, or xor (^), etc.

Burch answered 6/4, 2009 at 4:29 Comment(5)

Excellent point about the benefit of a factored-sum algorithm; something I had not realised before! – Programmer 24/7, 2013 at 1:58

Won't the factored sum (as written above) cause overflow exceptions occasionally? – Suzy 5/11, 2013 at 14:32

@Suzy yes, it should be performed unchecked. Fortunately, unchecked is the default in C#, but it would be better to make it explicit; edited – Burch 5/11, 2013 at 14:34

Could someone elaborate on the choice of 27 and 13? – Uttica 21/10, 2022 at 8:35

@Uttica pretty arbitrary, to be honest; these days, I would say "use HashCode.Combine instead"; in fact, I'll edit the answer – Burch 21/10, 2022 at 8:57

G

94

namespace System {
    public class Object {
        [MethodImpl(MethodImplOptions.InternalCall)]
        internal static extern int InternalGetHashCode(object obj);

        public virtual int GetHashCode() {
            return InternalGetHashCode(this);
        }
    }
}

InternalGetHashCode is mapped to an ObjectNative::GetHashCode function in the CLR, which looks like this:

FCIMPL1(INT32, ObjectNative::GetHashCode, Object* obj) {  
    CONTRACTL  
    {  
        THROWS;  
        DISABLED(GC_NOTRIGGER);  
        INJECT_FAULT(FCThrow(kOutOfMemoryException););  
        MODE_COOPERATIVE;  
        SO_TOLERANT;  
    }  
    CONTRACTL_END;  

    VALIDATEOBJECTREF(obj);  

    DWORD idx = 0;  

    if (obj == 0)  
        return 0;  

    OBJECTREF objRef(obj);  

    HELPER_METHOD_FRAME_BEGIN_RET_1(objRef);        // Set up a frame  

    idx = GetHashCodeEx(OBJECTREFToObject(objRef));  

    HELPER_METHOD_FRAME_END();  

    return idx;  
}  
FCIMPLEND