How to handle hash collisions?
Asked Answered
Y

1

2

I am developing a game where every thing in the game world is represented by an global unique identifier.

Those ids each measure 64 bits and are generated by hashing together the time of creation, the machines network address and a random number. According to Wikipedia's article on the Birthday problem, the probability of a hash collision is 0.1% for two hundred million records.

Since it is unlikely that I'm going to get that much records, one could consider that no hash would ever collide. But I don't want to hope for that, but let my application handle the rare case of a id collision, thus hash collision.

Otherwise, the behavior would be very undesired because two independent things in the game world would have a connection, thus share their properties like position, movement, health points, and so on.

How can I handle hash collisions? How are they handled typically?

Yarn answered 29/8, 2013 at 0:25 Comment(6)
Actually, people usually do assume that GUIDs never collide.Toshikotoss
It sounds that what you want is not really a hash, but a unique identifier. Is there any reason not to use a 128-bit GUID?Bona
@Bona I would like to use C++ standard types like unsigned long long int instead of arrays to store the id. Moreover, I don't have that much records. But anyway, the question remains for any id length.Yarn
@Yarn Then I would go back to the question of "why use a hash?" What you really want for this is a unique id, unless there is a reason not to do so (say the ids are generated in a distributed way).Bona
@Bona I need ids, right. But since data can be loaded from different independent machines (savegame, modifications, patches, addons) this id must be globally unique. So it is kind of a hash, I guess, right?Yarn
Then you should go with 128-bits generated with a GUID it will save you a lot of pain down the road. Use an algorithm like RFC 4122 (your system should have the API already implemented (e.g. CoCreateGuid in Windows). And as Guffa says in his answer, there are other things that will break you before you get a collision.Bona
D
2

Typically hash collisions are handled in two ways:

  1. Use a larger hash, so that collisions are practically impossible.

  2. Consider hash codes to be non-unique, and use an equality comparer for the actual data to determine uniqueness.

A 128 bit GUID uses the first method. The HashSet<T> class in .NET is an example of the second method.

Decastro answered 29/8, 2013 at 0:33 Comment(5)
What if a 128 bit GUID collides?Yarn
@danijar: For 26,000 million records, the probability for that is 0.0000000000000001%. That is roughly the same probability as for a bit of the GUID to change spontaneously in the memory circuit.Decastro
I still cannot use 128 bits. That is for multiple reasons. For example stl containers like std::unordered_map do only accept hashes of std::size_t which measures 64 bit. Using larger ids would require hashing them again for smaller map hashes, which is useless since this is the main use case. Could you please elaborate on how to consider hash codes to be non-unique?Yarn
@danijar: The second option is just accepting that there may be hash collisions. For the rare case when hashes actually collide, you simply compare the actual data to see if it's the same value.Decastro
Okay and then I would have to scan through the loaded data and update all references to the collided hash to a new generated one. May actual question was about this process, but I think this is so implementation dependent that it would be too localized on SO.Yarn

© 2022 - 2024 — McMap. All rights reserved.