Tinyurl-style unique code: potential algorithm to prevent collisions
Asked Answered
U

4

7

I have a system that requires a unique 6-digit code to represent an object, and I'm trying to think of a good algorithm for generating them. Here are the pre-reqs:

  • I'm using a base-20 system (no caps, numbers, vowels, or l to prevent confusion and naughty words)
    • The base-20 allows 64 million combinations
  • I'll be inserting potentially 5-10 thousand entries at once, so in theory I'd use bulk inserts, which means using a unique key probably won't be efficient or pretty (especially if there starts being lots of collisions)
  • It's not out of the question to fill up 10% of the combinations so there's a high potential for lots of collisions
  • I want to make sure the codes are non-consecutive

I had an idea that sounded like it would work, but I'm not good enough at math to figure out how to implement it: if I start at 0 and increment by N, then convert to base-20, it seems like there should be some value for N that lets me count each value from 0-63,999,999 before repeating any.

For example, going from 0 through 9 using N=3 (so 10 mod 3): 0, 3, 6, 9, 2, 5, 8, 1, 4, 7.

Is there some magic math method for figuring out values of N for some larger number that is able to count through the whole range without repeating? Ideally, the number I choose would sort of jump around the set such that it wasn't obvious that there was a pattern, but I'm not sure how possible that is.

Alternatively, a hashing algorithm that guaranteed uniqueness for values 0-64 million would work, but I'm way too dumb to know if that's possible.

Uranus answered 10/8, 2009 at 23:44 Comment(4)
Prime numbers...I guess I really am bad at math; it seems obvious in hindsite. Thanks to everyone who answered. I gave credit to phantombrain for answering first.Uranus
Since you mention tinyurl, my answer to this question might be applicable: #1052449Lody
Don't forget to double the codes to make them non-consecutive.Notch
possible duplicate of Creating your own Tinyurl style uidAltostratus
C
8

All you need is a number that shares no factors with your key space. Easiest value is to use a prime number. You can google for large primes, or use http://primes.utm.edu/lists/small/10000.txt

Conformist answered 10/8, 2009 at 23:48 Comment(0)
O
1

Any prime number which is not a factor of the length of the sequence should be able to span the sequence without repeating. For 64000000, that means you shouldn't use 2 or 5. Of course, if you don't want them to be generated consecutively, generating them 2 or 5 apart is probably also not very good. I personally like the number 73973!

Oden answered 10/8, 2009 at 23:50 Comment(0)
A
1

There is another method to get a similar result (jumping over the entire set of the values without repeating, nonconsequtively), without using the primes - by using maximum length sequences, which you can generate using specially constructed shift registers.

Auriscope answered 11/8, 2009 at 0:21 Comment(0)
P
0

My math is a bit rusty, but I think you just need to ensure that the GCF of N and 64 million is 1. I'd go with a prime number (that doesn't divide evenly into 64 million) just in case though.

Prato answered 10/8, 2009 at 23:50 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.