Likelihood of collision using most significant bits of a UUID in Java

Asked 28/11, 2008 at 10:24 Answered 3/5, 2013 at 0:31

246

If I'm using Long uuid = UUID.randomUUID().getMostSignificantBits() how likely is it to get a collision. It cuts off the least significant bits, so there is a possibility that you run into a collision, right?

Aestivate answered 28/11, 2008 at 10:24 Comment(0)

223

According to the documentation, the static method UUID.randomUUID() generates a type 4 UUID.

This means that six bits are used for some type information and the remaining 122 bits are assigned randomly.

The six non-random bits are distributed with four in the most significant half of the UUID and two in the least significant half. So the most significant half of your UUID contains 60 bits of randomness, which means you on average need to generate 2^30 UUIDs to get a collision (compared to 2^61 for the full UUID).

So I would say that you are rather safe. Note, however that this is absolutely not true for other types of UUIDs, as Carl Seleborg mentions.

Incidentally, you would be slightly better off by using the least significant half of the UUID (or just generating a random long using SecureRandom).

Eucharist answered 28/11, 2008 at 10:37 Comment(2)

I'm not sure this is entirely correct - looking at the implementation, it is clear that the version / variant information is not stored in the most significant bits, but rather somewhere in the middle. – Stanwinn 3/12, 2014 at 19:31

@RasmusFaber The comment by Tom is correct: The Answer here is incorrect about the six most significant bits being type information. There are indeed six bits of non-random data but four bits identify the Version 4 and two other bits are reserved. The four and two bits are located in different positions near the middle of the 128-bit value. See the Wikipedia article. – Advisable 17/7, 2015 at 2:5

Raymond Chen has a really excellent blog post on this:

GUIDs are globally unique, but substrings of GUIDs aren't

Fumigate answered 28/11, 2008 at 10:26 Comment(2)

The link is not dead any more. – Rebatement 30/1, 2016 at 10:17

The link is dead once again. Here's a link to a web archive version. – Keratinize 21/5, 2019 at 5:19

I thinks this is the best example for using randomUUID :

http://www.javapractices.com/topic/TopicAction.do?Id=56

Gale answered 15/5, 2012 at 10:2 Comment(0)

You are better off just generating a random long value, then all the bits are random. In Java 6, new Random() uses the System.nanoTime() plus a counter as a seed.

There are different levels of uniqueness.

If you need uniqueness across many machines, you could have a central database table for allocating unique ids, or even batches of unique ids.

If you just need to have uniqueness in one app you can just have a counter (or a counter which starts from the currentTimeMillis()*1000 or nanoTime() depending on your requirements)

Tabb answered 13/3, 2009 at 21:36 Comment(0)

Use Time YYYYDDDD (Year + Day of Year) as prefix. This decreases database fragmentation in tables and indexes. This method returns byte[40]. I used it in a hybrid environment where the Active Directory SID (varbinary(85)) is the key for LDAP users and an application auto-generated ID is used for non-LDAP Users. Also the large number of transactions per day in transactional tables (Banking Industry) cannot use standard Int types for Keys

private static final DecimalFormat timeFormat4 = new DecimalFormat("0000;0000");

public static byte[] getSidWithCalendar() {
    Calendar cal = Calendar.getInstance();
    String val = String.valueOf(cal.get(Calendar.YEAR));
    val += timeFormat4.format(cal.get(Calendar.DAY_OF_YEAR));
    val += UUID.randomUUID().toString().replaceAll("-", "");
    return val.getBytes();
}

Britnibrito answered 3/5, 2013 at 0:31 Comment(1)

Why not use a standard V1 UUID instead? – Frustrated 25/2, 2014 at 17:7

Recommended topics

Hot tags