Bug in Array.IStructuralEquatable.GetHashCode?
Asked Answered
A

3

6

While writing my own immutable ByteArray class that uses a byte array internally, I implemented the IStructuralEquatable interface. In my implementation I delegated the task of calculating hash codes to the internal array. While testing it, to my great surprise, I found that my two different arrays had the same structural hash code, i.e. they returned the same value from GetHashCode. To reproduce:

IStructuralEquatable array11 = new int[] { 1, 1 };
IStructuralEquatable array12 = new int[] { 1, 2 };
IStructuralEquatable array22 = new int[] { 2, 2 };

var comparer = EqualityComparer<int>.Default;
Console.WriteLine(array11.GetHashCode(comparer));     // 32
Console.WriteLine(array12.GetHashCode(comparer));     // 32
Console.WriteLine(array22.GetHashCode(comparer));     // 64

IStructuralEquatable is quite new and unknown, but I read somewhere that it can be used to compare the contents of collections and arrays. Am I wrong, or is my .Net wrong?

Note that I am not talking about Object.GetHashCode!

Edit: So, I am apparently wrong as unequal objects may have equal hash codes. But isn't GetHashCode returning a somewhat randomly distributed set of values a requirement? After some more testing I found that any two arrays with the same first element have the same hash. I still think this is strange behavior.

Adahadaha answered 29/7, 2012 at 23:55 Comment(5)
In addition to answers which point to duplicate hashcodes as is documented behavior, some reasoning and reflection would also lead you to the same conclusion. Consider that there are only ~4.2 billion different hashcodes. Can you create more than this many different objects of the type on which GetHashCode is called? In this case it is easy to see the answer is "yes". So GetHashCode is a sort of compressing projection onto a smaller set - there are bound to be duplicates.Lorelle
What might also surprise you is that Array.IStructuralEquatable.GetHashCode only uses the last 8 elements to compute the hash code.Coniferous
Actually there is a bug. See my edit.Coniferous
@MichaelGraczyk - while this might be true for a given version of .Net on a certain platform, I'm compelled to issue the standard warning not to rely on the values of hashcodes or how they are computed, since it is not guaranteed to be the same across updates or platforms.Lorelle
@Lorelle Yes I will clarify that I am talking about MSFT .NET.Coniferous
C
15

What you have described is not a bug. GetHashCode() does not guarantee unique hashes for nonequal objects.

From MSDN:

If two objects compare as equal, the GetHashCode method for each object must return the same value. However, if two objects do not compare as equal, the GetHashCode methods for the two object do not have to return different values.

EDIT

While the MSFT .NET implementation of GetHashCode() for Array.IStructuralEquatable obeys the principles in the above MSDN documentation, it appears that the authors did not implement it as intended.

Here is the code from "Array.cs":

    int IStructuralEquatable.GetHashCode(IEqualityComparer comparer) { 
        if (comparer == null)
            throw new ArgumentNullException("comparer"); 
        Contract.EndContractBlock();

        int ret = 0;

        for (int i = (this.Length >= 8 ? this.Length - 8 : 0); i < this.Length; i++) {
            ret = CombineHashCodes(ret, comparer.GetHashCode(GetValue(0))); 
        } 

        return ret; 
    }

Notice in particular this line:

ret = CombineHashCodes(ret, comparer.GetHashCode(GetValue(0)));

Unless I am mistaken, that 0 was intended to be i. Because of this, GetHashCode() always returns the same value for arrays with the same max(0, n-8th) element, where n is the length of the array. This isn't wrong (doesn't violate documentation), but it is clearly not as good as it would be if 0 were replaced with i. Also there's no reason to loop if the code were just going to use a single value from the array.

Coniferous answered 29/7, 2012 at 23:59 Comment(18)
Agreed, looks like a bug. Consider filing a Connect issue? Is this .Net 4.0?Lorelle
@Lorelle Yes, .NET 4.0. I don't have .NET 4.5 on this machine. Care to check if it has been fixed in 4.5 (if you have VS 2012)?Coniferous
This is why it's a good idea to write tests which compute the distribution of hashcodes for GetHashCode overrides.Lorelle
Looks like the code is the same for any updates which 4.5 apply to the runtime.Lorelle
Yep, it's already been filed, postponed, and removed from connect. Here's the connect link: connect.microsoft.com/VisualStudio/feedback/details/529636 and here is the cached copy: webcache.googleusercontent.com/…Coniferous
Postponed I'm used to, but removed? That's rare. WTF?!?Lorelle
You plan on making a new issue? If so, let me know; I'll vote it up.Lorelle
Yes I will file it tomorrow when I can repro on .NET 4.5.Coniferous
let us continue this discussion in chatLorelle
@MichaelGraczyk: Thanks for pointing out that my bug was removed from Connect. I filed the original issue, and received no notification that it was deleted. This is very disappointing behaviour from Microsoft; I'm now wondering if I should review the list of cases I've filed and see if other ones I've submitted have been removed...Lyle
@BradleyGrainger - my reaction would be to apply Hanlon's razor and chalk it up to a mistake. I had a popular SQL Server issue "disappear" a while back, but it turns out that it was just a permissions error they took years to correct, even after a good deal of followup by me.Lorelle
@MichaelGraczyk: Good advice. Now that you mention it, I seem to remember that I had a bunch that were "lost" during some internal bug DB migration, but they all came back eventually.Lyle
I refiled the bug just in case. You can find it here: connect.microsoft.com/VisualStudio/feedback/details/755995/…Coniferous
I posted a followup on chat; I don't think it notifies by default.Lorelle
bug still exists in 4.5.1.Mass
I'm not sure this bug can be fixed now as it would lead to different hashes for the same values. Since hashes must remain the same no matter what we're probably stuck with this.Muscovado
@CameronMacFarland Hashes are not guaranteed to remain the same between versions of the framework, even for identical objects. In fact, hashes are not even guaranteed to be the same across different instances of the .NET runtime. See msdn.microsoft.com/en-us/library/…Coniferous
Almost all days I end up with an unsolvable MS bug like this one :(Dean
D
5

This bug has been fixed, at least as of .NET 4.6.2. You can see it through Reference Source.

ret = CombineHashCodes(ret, comparer.GetHashCode(GetValue(i)));
Dorthydortmund answered 29/9, 2016 at 20:8 Comment(0)
S
0

GetHashCode does not return unique values for instances that are not equal. However, instances that are equal will always return the same hash code.

To quote from Object.GetHashCode method:

If two objects compare as equal, the GetHashCode method for each object must return the same value. However, if two objects do not compare as equal, the GetHashCode methods for the two object do not have to return different values.

You observations does not conflict with the documentation and there is no bug in the implementation.

Subcommittee answered 29/7, 2012 at 23:58 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.