UUID Generated randomly is having duplicates

Asked 15/1, 2015 at 19:29 Answered 26/6, 2023 at 9:6

I'm using the below function to generate UUID

UUID.randomUUID().toString()

In production we have 50+ servers (application server - each is a JVM on its own) and for requests that land in these servers, as a first step we generate a UUID which essentially uniquely identifies a transaction.

What we are observing is that in Server 6 and Server 11, the UUIDs generated are matching at least for 10 to 15 messages per day which is strange because given the load i.e. about 1 million transactions a day, these UUIDs being duplicate within the same day is very odd.

This is what we have done so far

Verified the application logs - we didn't find anything fishy in there, all logs are as normal
Tried replicating this issue in the test environment with similar load in production and with 50+ servers - but this didn't happen in the test environment
Checked the application logic - this doesn't seem to be an issue because all other 48 servers except 6 and 11 which have a copy of the same code base is working perfectly fine and they are generating unique UUIDs per transaction.

So far we haven't been able to trace the issue, my question is basically if there is something at JVM level we are missing or UUID parameter that we need to set for this one off kind of an issue?

Quadrisect answered 15/1, 2015 at 19:29 Comment(8)

This might be useless advice, but if you found a way to generate V1 (MAC + timestamp based) instead of V4 it may reduce the collisions, since they would have to happen at the same time, the same machine, and be very unlucky. – Amias 15/1, 2015 at 19:34

As a first step I would log every single UUID that comes out of UUID.randomUUID() on every machine into a local text file. I would then re-run the duplicate search on those logs. It could be that in your actual code, the UUIDs are getting mixed up at a later stage, e.g. due to a race condition somewhere in a higher-level layer. – Stoops 15/1, 2015 at 19:38

I don't have a solution, but I admire the problem. ^_^ If the numbers from the Wiki are correct this is almost impossible to happen. Especially multiple times. It seems this "random" really isn't o_O – Synclastic 15/1, 2015 at 19:39

I would also search your entire code base for any places where you might be seeding any random number generators. ;-) – Stoops 15/1, 2015 at 19:40

And make sure you really are creating and using a new UUID every time you should be. – Teter 15/1, 2015 at 20:0

You could write a SSCCE that just creates millions of UUIDs and see if there are any duplicates. This would reveal if it's a system or JVM issue, as opposed to your own code as @Stoops suggested. – Teter 15/1, 2015 at 20:3

You are generating pseudo random UUIDs at multiple locations. If you don't find other bugs, consider either generating all the pseudo random UUIDs at one location, or generate real random UUIDs. – Witting 15/1, 2015 at 20:57

Under normal operation, random number generated by virtual machines should not show that kind of collision. But, say you generated a set of random numbers with virtual machine A. Then took a snapshot of A. Then sometime later, stopped A, resumed from the snap shot, and resumed generating random numbers - you may have some duplicates due to resuse of internal state values. I suspect if you snapshotted A, then started A and B from the snapshot, you may experience the same problem. – Devilry 15/1, 2015 at 21:50

Given time, I'm sure you'll find the culprit. In the meantime, there was a comment that I think deserves to be promoted to answer:

You are generating pseudo random UUIDs at multiple locations. If you don't find other bugs, consider either generating all the pseudo random UUIDs at one location, or generate real random UUIDs

So create a UUID server. It is just a process that churns out blocks of UUIDs. Each block consists maybe 10,000 (or whatever is appropriate) UUIDs. The process writes each block to disk after the process verifies the block contains no duplicates.

Create another process to distribute the blocks of UUIDs. Maybe it is just an a web service that returns an unused block when it gets a request. The transaction server makes a request for a block and then consumes those UUIDs as it creates transactions. When the server has used most of its assigned UUIDs, it requests another block.

Tanyatanzania answered 15/1, 2015 at 22:29 Comment(0)

I wouldn't waste time wondering how UUID.randomUUID() is generating a few duplicate UUIDs per day. The odds of that happening by chance are infinitesimal. (Generating a whole series of duplicates is possible—if the underlying RNG state is duplicated, but that doesn't seem to be the case.)

Instead, look for places where a UUID stored by one server could be clobbering one stored by another. Why does this only happen between 2 servers out of 50? That has something to do with the details of your environment and system that haven't been shared.

Lelea answered 26/4, 2019 at 15:6 Comment(0)

As stated above, the chances of a legit collision are impossibly small. A more likely possibly is if the values are ever transferred between objects in an improper way.

For languages like Java that behave as pass by reference, consider the following scenario

saveObject1.setUUID(initObj.getUUID())
initObj.setUUID(UUID.randomUUID());
saveObject2.setUUID(initObj.getUUID())

In this case saveObject1 & saveObject2 will have the same value, because they are both pointed to the same object reference (initObj's UUID reference).

An issue like this seems more likely than the actual UUIDs being a collision, esp if you can reproduce it. Naturally if it doesn't happen all the time it's probably something more complex, like a rare race condition where initObj doesn't get reinitialized in time, causing saveObject1 & 2 to share the same object reference.

Lillalillard answered 12/6, 2020 at 23:53 Comment(0)

Your database column length can limit you to experience duplicate UUID

Crossbreed answered 26/6, 2023 at 9:6 Comment(0)

Recommended topics

Hot tags