Cassandra: Generate a unique ID?
Asked Answered
Z

4

35

I'm working on a distributed data base. I'm trying to generate a unique ID that will serve as a column family primary key in .

I read some articles about doing this with Java using UUID but it seems like there is a probability for collision (even if it's very low).

I wonder if there is a way to generate a unique ID based on time maybe?

Zetes answered 18/4, 2013 at 13:39 Comment(0)
I
36

You can use the TimeUUID type in Cassandra, which backs a Type 1 UUID. This uses the current time and the creator's MAC address and a sequence number. If the TimeUUID number is generated correctly this can be done with zero collisions (you can use the CQL now() method or insert your own, the java SDK's provide some thread-safe implementations). The main advantage of TimeUUIDs is that the IDs can be time ordered. See http://wiki.apache.org/cassandra/TimeBaseUUIDNotes for more info.

However, the time ordering is unlikely to be useful for row primary keys, since the ordering is useless when using a hash partitioner, though possible using a clustering key. And also the complexity of generating a unique ID could be a source of bugs if you roll your own. Cassandra also supports Type 4 UUIDs by using the UUID type. These are just random bits. There is a collision probability, but the collision probability (assuming uncorrelated random number sources, which it will be if you generate in Java) is extremely low - if you created 1 billion a second for 100 years the probability of one collision is about 50%. (See http://en.wikipedia.org/wiki/Universally_unique_identifier#Random_UUID_probability_of_duplicates for more details.)

Impute answered 18/4, 2013 at 14:3 Comment(7)
Thanks for your reply. If I use a type 1 UUID for example: CREATE TABLE timeline (rid uuid, PRIMARY KEY(rid)) How would I generate the rid and Insert it into the CF ? (I read about a method called now() in Cassandra but I dunno how to use it )Zetes
You should use type timeuuid if you're using type 1 UUID. This should work: CREATE TABLE timeline (rid timeuuid, PRIMARY KEY(rid)); insert into timeline (rid) values (now());Impute
I got this error: missing EOF at')'. it's like it doesn't recognize the method now(). ps: I'm using Cassandra 1.2.0Zetes
It works for me in Cassandra 1.2.2. Possible it was fixed in some of the general CQL3 fixes. You should probably upgrade to 1.2.4 (the current latest version), there have been many fixes.Impute
Thank you so much ! I think it's time for me to upgrade Cassandra :)Zetes
@Zetes Perhaps you are not entering values of all of your columns. That is also a reason that you can face kind of errors.Brachy
So can I use TimeUUID in my case where I have to generate for example over 400 000 unique ids per second to persist new rows in Cassandra database? I have to be sure that all my IDs are unique... Otherwise I will have to check somehow the uniqueness of my Ids being stored in my CassandraLeast
S
11

You should investigate using Twitter Snowflake. From the project readme:

As we at Twitter move away from Mysql towards Cassandra, we've needed a new way to generate id numbers. There is no sequential id generation facility in Cassandra, nor should there be.

Snowflake uses an intuitive algorithm that generates longs which are both time-ordered and unique. Since your database is distributed, this service should suit your needs well.

Squishy answered 18/4, 2013 at 15:18 Comment(0)
S
7

As said by Richard you can use TimeUUID, and generating TimeUUID value is not a big deal. Just follow cassandra FAQ timeuuid.

Summertree answered 18/4, 2013 at 15:18 Comment(0)
S
3

You need to use cassandra function now() to generate timeuuid and use uuid() function to generate uuid type string.

Sullivan answered 27/11, 2014 at 9:54 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.