Distributed primary key - UUID, simple auto increment or custom sequential values?
Asked Answered
Z

1

7

I know this type of question has been asked before, but I could not find one that compared the options I have in mind. So I am going to post them here, please post links if there are duplicates.

This has ended up a rather long post, if you have time please read it through as the question is at the end

EDIT2: I've accepted an answer as I think that will be the best solution for now. But I thought I would like to two other questions that answer my query about concatenating numbers. They can be found here: Combine two integers to create a unique number & Concatenate integers in C#. If I was going to try encoding the number (as below like 51122222) I think this would be useful. Though maybe just using something like String.Format in c# would be fast enough for my small application.

I'm currently trying to find a way to setup distributed applications that use the same database schema and can synchronise with maybe one master database that all others also sync with.

The program I am planning currently will start as a fairly simple program to track information. The first version might contain two tables: Items and ItemHistory. This is an example of possible fields:

Items
ItemID(PK) ?
Name String
Content String

ItemHistory
ItemHistoryID (PK) ?
ItemID (FK) ?
EventName String
CreatedOn DateTime

I've listed the field name and type, this question is about what to use for the PK types so they are missing.

The first version will be a standard desktop app, I'm currently planning on using C# with a WPF front end and SQLite for the database. Eventually I also want to create a version to run on my Android phone as well. This is where the distributed part comes in. I don't always have a signal so will need the app to run offline and allow synchronisation when online again.

Here are the ideas I have so far on how to deal with the ID's:

  1. Use a UUID for the IDs so there are no merge conflicts
  2. Use a auto increment field and set the starting number on each version of the app in some increment, e.g. 1 for first app, 10000 for second, 20000 for third etc
  3. Use a auto increment field with an offset value to avoid conflicts without the large gaps between numbers (mysql has auto_increment_increment and auto_increment_offset for this)
  4. Generate my own ID that encodes an ID for each database so they can have their own auto increment value and not cause a conflict. I found someone else that had the same idea: What data type is recommended for ID columns?

While option 1 would work and I have used it in the past I want to look at the possibility of other options to avoid the issues with UUIDs. I would like to have a solution that is easier to read while debugging and is sortable.

Option 2 would work but it does force a limit on the number of records. I know in my small application it will almost never go over that many but I would like to try and see if there's a solution that does not require such a limit. Option 3 avoids the limit by using alternating numbers, but I think you would need to know how many database are to be used or you might fill all numbers otherwise. Using a start of 1 and increment of 1 on DB1 and start of 2 and increment of 2 on DB2 would use every number alternatively. You could use 50 as the increment but then you just have another limit but now its on the number of applications that can use it. Again I know its a limit that is not going to be hit in my situation but could be an issue in an application that suddenly becomes very popular.

Option 4 seems like it could solve the issue for me, but I'm not sure if it would work in practice or not. One idea I had was to allow a prefix to be set on each application then that could be used with an auto incrementing value. e.g. PC1, PC2 for records on a pc and maybe PHONE1, PHONE2 etc for records from the Android. This would work but using numbers in strings causes the sorting issue with 1, 11, 100 showing up next to each other, that is in less leading zeros are used and then its back to a limited number of records again.

I have wondered if it would be possible to use a number for the DB ID and the auto increment. e.g PC = 1 and PHONE = 2. then we have 11, 12, 13 etc for the PC with maybe 111 for the 11th record and 2304 for the 304th record on PHONE. But I don't know how this would be done or if it can easily be done and not cause excess overheads for generating values.

At work they have used a similar numbering system, they use something like this 51122222. 5 would refer to the instance of the application, then its a 2 digit year and finally a auto incrementing number. I've not got a clear answer yet what happens if we go over 99999 records in a year. I think they might have figured that its not going to happen and are happy they have calculated the risk.

So finally a question, is there a way to create a primary key system for a distributed application that allows for sorting and does not enforce limits (besides the size of the data type itself e.g. max integer)?

EDIT: Here's a little bit more info on the app I plan to write. I want to create something that will let me store just about any type of information that I might gain, the system will include the ability to tag the entries so I can search on a topic. Types of information I see so far could be recommendations on books, dvds, websites etc. Or maybe local tips for the place I'm living. One overall idea is to stop keeping these bits of information spread across multiple computers/laptops/phones in different formats.

Zincate answered 31/8, 2011 at 14:37 Comment(3)
if you want to use an integer then you need some kind of systematic approach to reach max integer (counting etc). if you only need to reach sqrt(max integer) then random ids will work. do you really have no idea how much data you will have?Estelleesten
Eventually I intend on recording all sort of information that I useful find useful in it. But to start will I am going to use it for learning Kanji. I'm doing a set on my website at the moment that would end up about 50 I think. There are about 2000 Kanji to learn to be able to read a newspaper. So if I had a limit of 10,000 records for one version of the app that would probably work. If I ever hit the limit on one device I guess I could just move onto a different range say 50,000-60,000.Zincate
I realised something about the example from work. It does not require the fixed amount of digits for the auto increment part. So rather than adding the leading zeros I can just use the number. I don't know how they generate and store the numbers. I can only think that building a string and converting to a number would be a way. I'm going to look if there is another way as I think that might be a possible solution.Zincate
E
3

in broad terms, there are two approaches.

  1. you use sequential values. these may be divided up into groups, interleaved, whatever. they are the most efficient approach, but require collaboration and coordination.

  2. you use random values (this includes UIDs). these are much simpler but require more space. from "birthday collisions" we know that you if you need to store N values then a random key must be chosen from (more than) a range of N*N - http://en.wikipedia.org/wiki/Birthday_problem. working backwards, a 64 bit integer can hold about 32 bits of data if used as a random key - that's about 4 billion values. but that's for a probability of 50% collisions. you want a much lower probability, so a practical limit is around 10 million entries.

so, in simple terms, if you have a 64 bit key, a random approach would work for around 10 million entries a sequential approach for many more. in either case, that is probably more than you need.

if you have a 32 bit key then a random approach works for around a thousand values (a sequential approach goes to about 4 billion, as above).

obviously if you have a text value then you need to modify this accordingly, but UUIDs are designed to have "enough" values anyway http://en.wikipedia.org/wiki/Universally_unique_identifier

typically a database will provide a sequential ID and that is all you need. if not, the 64 bit random approach is usually simplest and worth the extra space.

Estelleesten answered 31/8, 2011 at 16:49 Comment(1)
Thanks, that's some useful information to read. I think I will go with the first option. While the second will work it doesn't really change one of the things I don't like about using UUIDs and that readability since it still creates quite long random numbers. Though it would work for the distributed nature of the app and would make it a purely numeric index which in theory seems like it should be more efficient compared to using a UUID.Zincate

© 2022 - 2024 — McMap. All rights reserved.