Shortest possible generated unique ID
Asked Answered
U

3

10

So we can generate a unique id with str(uuid.uuid4()), which is 36 characters long.

Is there another method to generate a unique ID which is shorter in terms of characters?

EDIT:

  • If ID is usable as primary key then even better
  • Granularity should be better than 1ms
  • This code could be distributed, so we can't assume time independence.
Unreliable answered 13/7, 2018 at 22:6 Comment(2)
For distributed, simply include the generating node as part of the IDDefinitely
pypi.org/project/shortuuidUnification
M
12

If this is for use as a primary key field in db, consider just using auto-incrementing integer instead.

str(uuid.uuid4()) is 36 chars but it has four useless dashes (-) in it, and it's limited to 0-9 a-f.

Better uuid4 in 32 chars:

>>> uuid.uuid4().hex
'b327fc1b6a2343e48af311343fc3f5a8'

Or just b64 encode and slice some urandom bytes (up to you to guarantee uniqueness):

>>> base64.b64encode(os.urandom(32))[:8]
b'iR4hZqs9'
Misti answered 13/7, 2018 at 22:11 Comment(5)
Somebody has to be that guy that points a UUID4 could technically collide. (Though the chances are infinitesimally small.) A uuid1 will be "more unique" because it is dependent upon the system time. Perhaps that better fits what is being asked for here.Tortuga
If you slice a UUID4 short (uuid4 is generated from os.urandom), you're simply upping the odds that another generated token would duplicate. Again, still tiny small probability, but the number of tokens is then 2**32.Tortuga
This does not slice the uuid short, it just removes the four - characters. For slicing urandom bytes short, you will need to add code that checks it was not used already and regenerate if necessary.Misti
Yeah, with regards to slicing it short I was talking about the second part of your answer. I do believe uuid4 is the id suggested by the django docs as wellTortuga
Adding to what @BradSolomon says, unless you have a computer that can coordinate every ID ever generated anywhere ever globally, (forget that) you will always have the chance of collision no matter what you do. Defense against this is to create a big space, which in turn means longer IDs. If a user insists on shorter IDs it can be done but always comes at a higher risk of collisions. Depending on how many you need that might be acceptable though.Tubing
N
4

TLDR

Most of the times it's better to work with numbers internally and encode them to short IDs externally. So here's a function for Python3, PowerShell & VBA that will convert an int32 to an alphanumeric ID. Use it like this:

int32_to_id(225204568)
'F2AXP8'

For distributed code use ULIDs: https://github.com/mdipierro/ulid

They are much longer but unique across different machines.

How short are the IDs?

It will encode about half a billion IDs in 6 characters so it's as compact as possible while still using only non-ambiguous digits and letters.

How can I get even shorter IDs?

If you want even more compact IDs/codes/Serial Numbers, you can easily expand the character set by just changing the chars="..." definition. For example if you allow all lower and upper case letters you can have 56 billion IDs within the same 6 characters. Adding a few symbols (like ~!@#$%^&*()_+-=) gives you 208 billion IDs.

So why didn't you go for the shortest possible IDs?

The character set I'm using in my code has an advantage: It generates IDs that are easy to copy-paste (no symbols so double clicking selects the whole ID), easy to read without mistakes (no look-alike characters like 2 and Z) and rather easy to communicate verbally (only upper case letters). Sticking to numeric digits only is your best option for verbal communication but they are not compact.

I'm convinced: show me the code

Python 3

def int32_to_id(n):
  if n==0: return "0"
  chars="0123456789ACEFHJKLMNPRTUVWXY"
  length=len(chars)
  result=""
  remain=n
  while remain>0:
    pos = remain % length
    remain = remain // length
    result = chars[pos] + result
  return result
  

PowerShell

function int32_to_id($n){
   $chars="0123456789ACEFHJKLMNPRTUVWXY"
   $length=$chars.length
   $result=""; $remain=[int]$n
   do {
      $pos = $remain % $length
      $remain = [int][Math]::Floor($remain / $length)
      $result = $chars[$pos] + $result
   } while ($remain -gt 0)
   $result
}

VBA

Function int32_to_id(n)
    Dim chars$, length, result$, remain, pos
    If n = 0 Then int32_to_id = "0": Exit Function
    chars$ = "0123456789ACEFHJKLMNPRTUVWXY"
    length = Len(chars$)
    result$ = ""
    remain = n
    Do While (remain > 0)
        pos = remain Mod length
        remain = Int(remain / length)
        result$ = Mid(chars$, pos + 1, 1) + result$
    Loop
    int32_to_id = result
End Function

Function id_to_int32(id$)
    Dim chars$, length, result, remain, pos, value, power
    chars$ = "0123456789ACEFHJKLMNPRTUVWXY"
    length = Len(chars$)
    result = 0
    power = 1
    For pos = Len(id$) To 1 Step -1
        result = result + (InStr(chars$, Mid(id$, pos, 1)) - 1) * power
        power = power * length
    Next
    id_to_int32 = result
End Function

Public Sub test_id_to_int32()
    Dim i
    For i = 0 To 28 ^ 3
        If id_to_int32(int32_to_id(i)) <> i Then Debug.Print "Error, i=", i, "int32_to_id(i)", int32_to_id(i), "id_to_int32('" & int32_to_id(i) & "')", id_to_int32(int32_to_id(i))
    Next
    Debug.Print "Done testing"
End Sub

Noctule answered 20/11, 2020 at 10:6 Comment(0)
G
0

Yes. Just use the current UTC millis. This number never repeats.

const uniqueID = new Date().getTime();

EDIT

If you have the rather seldom requirement to produce more than one ID within the same millisecond, this method is of no use as this number‘s granularity is 1ms.

Gamaliel answered 13/7, 2018 at 22:9 Comment(4)
This number does repeat if you draw two or more of them in the same millisecond.Tubing
Point taken! But the frequency wasn't mentioned here. So I didn‘t take this possibility into account.Gamaliel
Fair enough, I've updated the question to be a bit more specificUnreliable
Yeah, people have vastly different requirements for making IDs. This will work most of the time in most scenarios, but later drive-by readers should know the caveats. I myself earlier used this approach thinking it was safe, and it was, until my code grew, stuff started happening faster and the assumption broke. This is partly why people use gigantically huge UUIDs, so they can just forget about this problem and move on, at the cost of big ugly string identifiers.Tubing

© 2022 - 2024 — McMap. All rights reserved.