A glance at the source code for string.GetHashCode
using Reflector reveals the following (for mscorlib.dll version 4.0):
public override unsafe int GetHashCode()
{
fixed (char* str = ((char*) this))
{
char* chPtr = str;
int num = 0x15051505;
int num2 = num;
int* numPtr = (int*) chPtr;
for (int i = this.Length; i > 0; i -= 4)
{
num = (((num << 5) + num) + (num >> 0x1b)) ^ numPtr[0];
if (i <= 2)
{
break;
}
num2 = (((num2 << 5) + num2) + (num2 >> 0x1b)) ^ numPtr[1];
numPtr += 2;
}
return (num + (num2 * 0x5d588b65));
}
}
Now, I realize that the implementation of GetHashCode
is not specified and is implementation-dependent, so the question "is GetHashCode
implemented in the form of X or Y?" is not really answerable. I'm just curious about a few things:
- If Reflector has disassembled the DLL correctly and this is the implementation of
GetHashCode
(in my environment), am I correct in interpreting this code to indicate that astring
object, based on this particular implementation, would not cache its hash code? - Assuming the answer is yes, why would this be? It seems to me that the memory cost would be minimal (one more 32-bit integer, a drop in the pond compared to the size of the string itself) whereas the savings would be significant, especially in cases where, e.g., strings are used as keys in a hashtable-based collection like a
Dictionary<string, [...]>
. And since thestring
class is immutable, it isn't like the value returned byGetHashCode
will ever even change.
What could I be missing?
UPDATE: In response to Andras Zoltan's closing remark:
There's also the point made in Tim's answer(+1 there). If he's right, and I think he is, then there's no guarantee that a string is actually immutable after construction, therefore to cache the result would be wrong.
Whoa, whoa there! This is an interesting point to make (and yes it's very true), but I really doubt that this was taken into consideration in the implementation of GetHashCode
. The statement "therefore to cache the result would be wrong" implies to me that the framework's attitude regarding strings is "Well, they're supposed to be immutable, but really if developers want to get sneaky they're mutable so we'll treat them as such." This is definitely not how the framework views strings. It fully relies on their immutability in so many ways (interning of string literals, assignment of all zero-length strings to string.Empty
, etc.) that, basically, if you mutate a string, you're writing code whose behavior is entirely undefined and unpredictable.
I guess my point is that for the author(s) of this implementation to worry, "What if this string instance is modified between calls, even though the class as it is publicly exposed is immutable?" would be like for someone planning a casual outdoor BBQ to think to him-/herself, "What if someone brings an atomic bomb to the party?" Look, if someone brings an atom bomb, party's over.
Dictionary<TKey, TValue>
, for example, could leverage string interning. It does need to callGetHashCode
to figure out which bucket to put a string in, right? And if it needs to callGetHashCode
, whether the string is interned or not, it seems to me this calculation needs to be performed. But, as I have said, I feel like I'm missing something here. – Advance