Guid & GetHashCode uniqueness
Asked Answered
T

5

35

Given the following key:

int key = Guid.NewGuid().GetHashCode();

Is this key unique as the uniqueness of Guid?

Term answered 6/9, 2011 at 21:52 Comment(0)
V
57

The pigeonhole principle says no. A GUID has 16 bytes of information* - 128 bits. An int has only 32 bits of information.

There are 2128 possible GUIDs, and 232 possible hash codes - so you can't possibly have a different hash code for each GUID.

There's more than that though - GetHashCode() is never meant to represent uniqueness. If it can, then that's great - but it doesn't have to, even when there are enough int values available to do so.

It would be entirely valid for int.GetHashCode() to return (say) the value divided by two... so -1, 0 and 1 would all get a hash code of 0; 3 and 4 would get a hash code of 2 etc. It wouldn't be good (and it would be slower than just returning the value) - but it would be a valid implementation. It would satisfy all the constraints of GetHashCode - namely that if you call it on two equal values, it will return the same hash code.

In fact, returning a constant for all values is a valid implementation - although a pretty useless one, in that it renders the normally-fast lookup of a hash table into an O(N) operation.


* - To clarify due to comments, the .NET GUID will allow these 128 bits to be set arbitrarily as far as I'm aware; randomly generated GUIDs follow a stricter pattern so there aren't 2128 different values which would be randomly generated. Still more than 232 though.

Venery answered 6/9, 2011 at 21:56 Comment(5)
@Joey: Randomly generated ones, sure - but I don't think there's anything stopping you from a 16-byte array into the Guid constructor with any values you want. Do you have any evidence to the contrary?Venery
@Joey: Well we're specifically talking about the .NET Guid type, and I still maintain that there are 2^128 possible values for that type.Venery
@Joey: Sure, and you don't normally end up with strings that have values unassigned within Unicode - but they're still values which can easily be created.Venery
What use is GetHashCode if two different instances can have the same value? It can't be used reliably either to confirm that 2 things are the same, or that 2 things are different!Pinebrook
@Jez: It can be used to confirm that two things are different - if they have different hash codes, they can't be equal (assuming a proper implementation). If they have the same hash code, they might be equal. The point is that if you have a million keys in a map, and you're trying to find one of them, you can very rapidly narrow it down to "just the keys that have the right hash code" and then you can call Equals on all of those candidates to find out which key is actually the right one.Venery
I
19

Just today I've noticed another problem of the Guid.GetHashCode(): in Microsoft .NET implementation, not every "byte" of the Guid is hashed: there are 6 bytes of the Guid that aren't hashed, so any change to one of them won't ever change the hash code.

We can see it in the reference source:

return _a ^ (((int)_b << 16) | (int)(ushort)_c) ^ (((int)_f << 24) | _k);

so the _d, _e, _g, _h, _i, _j bytes aren't hashed. This has an important impact with "sequential" Guids, like:

c482fbe1-9f16-4ae9-a05c-383478ec9d13
c482fbe1-9f16-4ae9-a05c-383478ec9d14
c482fbe1-9f16-4ae9-a05c-383478ec9d15
...
c482fbe1-9f16-4ae9-a05c-383478ec9dff
c482fbe1-9f16-4ae9-a05c-383478ec9e00
c482fbe1-9f16-4ae9-a05c-383478ec9e01

with Guid like these, the number of different hashes generated is very small (256 different values), because the 3478ec9d/3478ec9e won't be hashed.

Illegible answered 23/7, 2015 at 12:51 Comment(4)
Wow, very interesting observation.Not sure why MS wouldn't hash the whole GUID, but this is something to be aware of....Agripinaagrippa
For version 1 UUID's the fields included in GetHashCode() are the 60 bit timestamp and parts of the MAC address. For version 4 UUID's (which you get from Guid.NewGuid()) almost all bytes of the GUID are random. So in these cases the algorithm seems OK.Stockdale
I ran into this issue in production with GUIDs generated on an OracleDB. Can somebody explain why it does not hash all the values? Hard for me to believe that hashing additional 6 bytes would be a perf issue?Coreencorel
@PhilippAumayr Has been changed in .NET Core... Now all the bits are hased (see github.com/dotnet/runtime/blob/master/src/libraries/…) The commit is from 2016:Illegible
S
13

GetHashCode() returns an integer - it cannot be as unique as a Guid, so no - there might be collisions and uniqueness is not guaranteed.

The point of a hash code is that it should distribute evenly across the hash range so that collisions should be generally rare, you always have a chance of collision though and have to accommodate for that.

Sparrow answered 6/9, 2011 at 21:54 Comment(3)
it should also be noted that GUIDs are not guaranteed to be uniqueDeck
A duplicate GUID is rumored to be generated on December 21st, 2012.Cinerama
@HansPassant I'm sorry to disappoint you. The rumour was false.Segno
F
5

I was having exactly the problem xanatos describes in another answer. I have a class where two Guid values are used to distinguish different objects, and I found I was getting a horrible number of collisions (my Guids are not randomly generated). Here is the code I used to solve the problem. Guid1 and Guid2 are the properties of type Guid that distinguish the objects. The code follows the approach described by Jon Skeet here.

    public override int GetHashCode()
    {
        int hash = 173;
        foreach (Byte b in Guid1.ToByteArray().Concat(Guid2.ToByteArray()))
        {
            hash = hash * 983 + b;
        }
        return hash;
    }
Fonville answered 22/1, 2016 at 23:34 Comment(1)
This helped me. I was looking for a simple "good enough" algorithm that could translate a GUID to an integer with low-ish likelihood of collisions. The uniqueness of the value in my case doesn't need to be global -- it is unique in a very small context/dataset. Good find here which I've validated with some unit tests.Ionosphere
P
4

A Guid is a 128-bit number. An int is a 32-bit number, so it can't be "as unique" as the Guid.

Besides, GetHashCode returns ... a hash code, it's not meant to be unique in any way. See other discussions here on SO about why GetHashCode() exists.

Pleo answered 6/9, 2011 at 21:56 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.