string.GetHashCode() returns different values in debug vs release, how do I avoid this?
Asked Answered
T

3

6

To my surprise the folowing method produces a different result in debug vs release:

int result = "test".GetHashCode();

Is there any way to avoid this?

I need a reliable way to hash a string and I need the value to be consistent in debug and release mode. I would like to avoid writing my own hashing function if possible.

Why does this happen?

FYI, reflector gives me:

[ReliabilityContract(Consistency.WillNotCorruptState, Cer.MayFail), SecuritySafeCritical]
public override unsafe int GetHashCode()
{
    fixed (char* str = ((char*) this))
    {
        char* chPtr = str;
        int num = 0x15051505;
        int num2 = num;
        int* numPtr = (int*) chPtr;
        for (int i = this.Length; i > 0; i -= 4)
        {
            num = (((num << 5) + num) + (num >> 0x1b)) ^ numPtr[0];
            if (i <= 2)
            {
                break;
            }
            num2 = (((num2 << 5) + num2) + (num2 >> 0x1b)) ^ numPtr[1];
            numPtr += 2;
        }
        return (num + (num2 * 0x5d588b65));
    }
}
Tiemroth answered 23/9, 2011 at 19:59 Comment(3)
If you need the hash code to remain consistent, you're using it wrong. If I recall correctly, they explicitly force it to be inconsistent in debug mode so that nobody relies on it internally at Microsoft.Gang
For more, see: Eric Lippert's post on guidelines for GetHashCodeGang
Also GetHashCode returns different values on .NET 32bit vs .NET 64bit.Brannen
B
9

GetHashCode() is not what you should be using to hash a string, almost 100% of the time. Without knowing what you're doing, I recommend that you use an actual hash algorithm, like SHA-1:

using(System.Security.Cryptography.SHA1Managed hp = new System.Security.Cryptography.SHA1Managed()) {
    // Use hp.ComputeHash(System.Text.Encoding.ASCII (or Unicode, UTF8, UTF16, or UTF32 or something...).GetBytes(theString) to compute the hash code.
}

Update: For something a little bit faster, there's also SHA1Cng, which is significantly faster than SHA1Managed.

Berar answered 23/9, 2011 at 20:4 Comment(7)
I already have a lot of code expecting an int, it's also performance critical, thus the reason why I wanted to use the internal method. Can you create a fast hash that returns an int.. I will package it into an extender method such as GetHashCodeStable()Tiemroth
@Joe: It's performance-critical? What exactly is your situation? If it just needs to be somewhat speedy, hashing functions are still pretty fast. Maybe try MD5. (Anyways, the result can easily be converted to an int, just take the last 4 bytes or something.)Berar
Somewhat speedy is OK, I was always under the assumption that SHA1, MD5, etc were slow relative to some simple loop like the decompiled GetHashCodeTiemroth
@Joe: It's essentially one more loop :) But you can also create your own method that hashes to an int if you test it and performance is unacceptable; there are several algorithms online. One I just found is the last post in: linuxquestions.org/questions/programming-9/…Berar
I ended up using a modified version of the release GetHashCode implantation and called it GetHashcodeStabe() I'm giving you the correct answer because I think your solution is really the right way to go, I only used a different method because of performance requirements, although as stated this method is not very slowTiemroth
What say you to the comments in #16840 to the effect that it is overkill to use a cryptographic hash?Symmetrize
@JasonPlutext: That's a different context; in this case, it sounded like the OP wanted a hashing solution that's consistent across platforms. Of course, if performance matters and you want a consistent hash, by all means do what Joe ended up doing :) I just used a cryptographic hash for convenience.Berar
R
3

Here's a better approach that is much faster than SHA and you can replace the modified GetHasCode with it: C# fast hash murmur2

There are several implementations with different levels of "unmanaged" code, so if you need fully managed it's there and if you can use unsafe it's there too.

Rochellerochemont answered 1/8, 2012 at 17:55 Comment(0)
K
0
    /// <summary>
    /// Default implementation of string.GetHashCode is not consistent on different platforms (x32/x64 which is our case) and frameworks. 
    /// FNV-1a - (Fowler/Noll/Vo) is a fast, consistent, non-cryptographic hash algorithm with good dispersion. (see http://isthe.com/chongo/tech/comp/fnv/#FNV-1a)
    /// </summary>
    private static int GetFNV1aHashCode(string str)
    {
        if (str == null)
            return 0;
        var length = str.Length;
        // original FNV-1a has 32 bit offset_basis = 2166136261 but length gives a bit better dispersion (2%) for our case where all the strings are equal length, for example: "3EC0FFFF01ECD9C4001B01E2A707"
        int hash = length;
        for (int i = 0; i != length; ++i)
            hash = (hash ^ str[i]) * 16777619;
        return hash;
    }

I guess this implementation is slower than the unsafe one posted here. But it's much simpler and safe. Works good in case super speed is not needed.

Kobold answered 21/8, 2013 at 8:6 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.