Algorithm for generating a random number
Asked Answered
D

17

8

I'm looking to generate a random number and issue it to a table in a database for a particular user_id. The catch is, the same number can't be used twice. There's a million ways to do this, but I'm hoping someone very keen on algorithms has a clever way of solving the problem in an elegant solution in that the following criteria is met:

1) The least amount of queries to the database are made. 2) The least amount of crawling through a data structure in memory is made.

Essentially the idea is to do the following

1) Create a random number from 0 to 9999999
2) Check the database to see if the number exists
OR
2) Query the database for all numbers
3) See if the returned result matches whatever came from the db
4) If it matches, repeat step 1, if not, problem is solved.

Thanks.

Deach answered 26/11, 2008 at 1:44 Comment(4)
I had to -1 because the logic behind this question is flawed.Eruptive
Could add why you don't want to use a simple autoincrement field?Tamathatamaulipas
I can't figure out whether this is random or just unique and non-sequential. I can't figure out what user_id has to do with anything. And I con't figure out what 9,999,999 has to do with anything.Gypsy
There are two things you should 'never' do: try to generate your own random no.s or 'invent' your own encryption. 99 times out of 100 they will be flawed.Lighthearted
P
18

No your algorithm is not scalable. What I've done before is to issue numbers serially (+1 each time) and then pass them through an XOR operation to jumble the bits thus giving me a seemingly random numbers. Of course they aren't really random, but they look so to users eyes.


[Edit] Additional information

This algorithm's logic goes like this you use a known sequence to generate unique numbers and then you deterministically manipulate them, so they don't look serial anymore. The general solution is to use some form of encryption, which in my case was an XOR flipflop, because its as fast as it can get, and it fulfills the guarantee that numbers will never collide.

However you can use other forms of encryption, if you want prefer even more random looking numbers, over speed (say you don't need to generate many ids at a time). Now the important point in choosing an encryption algorithm is "the guarantee that numbers will never collide". And a way to prove if an encryption algorithm can fulfill this guarantee is to check if both the original number and the result of the encryption have the same number of bits, and that the the algorithm is reversible (bijection).

[Thanks to Adam Liss & CesarB for exapanding on the solution]

Pend answered 26/11, 2008 at 1:51 Comment(6)
Oooh ... clever! Use a known sequence to generate unique numbers and deterministically manipulate them. Replace XOR with "encrypt" for better randomness.Trevor
Yes encryption works, Actually XOR is a form of "cheap" encryption that has the guarantee it will never collide. Making my solution a special case of the more general solution, but you need to prove if another encryption can provide the same guarantees before using it safely.Pend
Easy to prove: if both the original number and the result of the encryption have the same number of bits, for the algorithm to be reversible, it must be a bijection (else there would be collisions in one of the directions). Thus, it only needs to be reversible and have the same number of bits.Vesper
There we go :) Thanks for mentioning a way to prove it. I'd upvote the comment if I could!Pend
@Robert Gould: why not incorporate it on your answer instead? It would make it even more useful, even for people who do not have Javascript.Vesper
If you actually care about hiding the original sequence this is a bad scheme, because if you get the plaintext for a single id you can then decrypt all others.Civics
M
17

Why don't you just use a GUID? Most languages should have a built-in way to do this. It's guaranteed to be unique (with very reasonable bounds).

Maffa answered 26/11, 2008 at 2:19 Comment(4)
GUIDs are 'Globally UNIQUE-IDs' not 'Gloablly RANDOM-IDs'Greasepaint
@andora: true; depends what the OP wants. it seemed he wanted something that seemed random, which GUIDs doMaffa
@Greasepaint While it's true that none of the initials from the acronym GUID stand for "random", the fact is GUIDs are random.Exordium
@rjmunro: If you do a little checking, you will find that they are not very random at all, they may appear so, but are not designed to be random, they are designed to be unique.Greasepaint
G
6

Want an over-the-top solution?

I assume randomness is not intended to be encryption-quality, but just enough to discourage guessing the longevity of a user, by user_id.

During development, generate a list of all 10 million numbers in string form.

Optionally, perform some simple transformation, like adding a constant string to the middle. (This is just in case the result is too predictable.)

Pass them into a tool that generates Perfect Hash functions, such as gperf.

The resulting code can be used to quickly encode the user's id at runtime into a unique hash value that is guaranteed not to clash with any other hash values.

Groovy answered 26/11, 2008 at 2:16 Comment(0)
E
3

Try the statement in mysql SELECT CAST(RAND() * 1000000 AS INT)

Estreat answered 26/11, 2008 at 7:51 Comment(0)
M
2

Assuming:

  • The randomness is needed for uniqueness, not for security
  • Your user_id is 32 bit
  • Your limit of 9999999 was just an example

You could do something simple as having the random number as a 64 bit integer, with the upper 32 bits containing the timestamp (at row insert) and the lower 32 bits the user_id. That would be unique even for multiple rows with the same user, provided you use an appropriate resolution on your timestamp depending on how often you add new rows for the same user. Combine with an unique constraint on the random column and catch any such error in your logic and then just retry.

Monometallism answered 26/11, 2008 at 2:0 Comment(0)
O
1

I think you'll find that you really do not want to do this. As the numbers in the database increase, you might spend too much time in the "make sure this number isn't taken" loop.

Personally, I've had luck with hashes as an alternative, but to come up with a better solution, I'd really need to know why you want to do it this way.

On answered 26/11, 2008 at 1:51 Comment(0)
F
1

My experience was simply using the RNG in PHP. I found that using a certain size of number (I'm using an int, so I have a max of 4G). I ran some tests and found that on average, in 500,000 iterations, I got 120 single duplicates. I never got a triplicate after running the loop a bunch of times. My "solution" was to then just insert and check if it fails, then generate a new ID and go again.

My advice is to do the same and see what your collision rate is &c and see if it's acceptable for your case.

This isn't optimal, so if anyone has suggestions I'm looking too:)

EDIT: I was limited to a 5 digit ID ([a-zA-z0-9]{5,5}), the longer the id (more combination, the few collisions). An md5 of the email would almost never conflict, for instance.

Fissirostral answered 26/11, 2008 at 1:51 Comment(0)
E
1

The problem is that if you are generating random numbers is is very possible to produce duplicates infinatly.

however:

<?php
//Lets assume we already have a connection to the db
$sql = "SELECT randField FROM tableName";
$result = mysql_query($sql);
$array = array();
while($row = mysql_fetch_assoc($result))
 {
   $array[] = $row['randField'];
 }
while(True)
 {
   $rand = rand(0, 999999);
   if(!in_array($rand))
     {
       //This number is not in the db so use it!
       break;
     }
 }
?>

While this will do what you want it too, it is a bad Idea as this won't scale for long, eventualy your array will get to large and it will take an extremely long time to generate a random that is not already in your db.

Eruptive answered 26/11, 2008 at 1:55 Comment(0)
A
1

It's easy to design a pseudorandom number generator with a long period of nonrepetition; e.g. this one, which is being used for the same thing that you want it for.

BTW, why not just issue the userid's sequentially?

Aniseikonia answered 26/11, 2008 at 2:2 Comment(0)
S
1

I like Oddthinking's idea, but instead of choosing the strongest hash function in the world, you could simply:

  • Generate the MD5's of the first 10 millions of numbers (expressed as strings, +some salt)
  • Check for duplicates offline, i.e. before going in production (I guess there won't be any)
  • Store the duplicates in an array somewhere
  • When your application starts, load the array
  • When you want to insert an ID, choose the next number, compute its MD5, check if it is in the array, and if it isn't use it as the ID in the database. Otherwise, choose next number

MD5's are fast, and checking if a string belongs to an array will avoid you a SELECT.

Subhuman answered 26/11, 2008 at 2:41 Comment(1)
Adding to this idea, if you do happen to find one or two duplicates, repeat the process with a different salt until you don't. That way, you can avoid the run-time check entirely.Groovy
B
1

I've actually previously written an article about this. It takes the same approach as Robert Gould's answer, but additionally shows how to shorten a block cipher to a suitable length using xor folding, and then how to generate the permutations over a range that isn't a power of 2, while still preserving the uniqueness property.

Bisectrix answered 26/11, 2008 at 10:13 Comment(2)
you probably changed the server url mapping, so the correct link to your article is blog.notdot.net/2007/9/… now. The one you mentioned is broken. +1 for the TEA cipherThimblerig
Thanks, fixed the link. Evidently I left out a few redirects when I migrated my blog.Bisectrix
O
1

If you really want to get "random" numbers form 0 to 9 999 999, then the solution is to do the "randomization" once, and then store the result to your disk.

It's not hard to get the result you want, but I think of it more like "make a long list with numbers", than "get a random number".

$array = range(0, 9999999);
$numbers = shuffle($array);

You also need a pointer to current position in $numbers (store it in a database); start with 0 and increment it each time you need a new number. (Or you could use array_shift() or array_pop(), if you dont like to use pointers.)

Oddson answered 27/11, 2008 at 22:41 Comment(0)
K
1

A proper PRNG (Pseudo-Random Number Generator) algorithm will have a cycle time during which it will never be in the same state. If you expose the entire state of the PRNG in the number retrieved from it, you will get a number guaranteed unique for the period of the generator.

A simple PRNG that does this is called the 'Linear Congruential' PRNG which iterates a formula:

X(i) = AX(i-1)|M

Using the right pair of factors you can get a period of 2^30 (approximately 1 billion) from a simple PRNG with a 32 bit accumulator. Note that you will need a 64 bit long long temporary variable to hold the intermediate 'AX' part of the computation. Most if not all C compilers will support this data type. You should also be able to do it with a numeric data type on most SQL dialects.

With the right values of A and M we can get a random number generator with good statistical and geometrical properties. There is a famous paper about this written by Fishman and Moore.

For M = 2^31 - 1 we get can use the values of A below to get a PRNG with a nice long period (2^30 IIRC).

Good Values of A:

742,938,285  
950,706,376  
1,226,874,159  
62,089,911  
1,343,714,438   

Note that this type of generator is (by definition) not cryptographically secure. If you know the last number generated from it you can predict what it will do next. Unfortunately I believe that you cannot get cryptographic security and guaranteed non-repeatability at the same time. For a PRNG to be cryptographically secure (e.g. Blum Blum Shub) it cannot expose sufficient state in a generated number to allow the next number in the sequence to be predicted. Therefore the internal state is wider than the generated number and (in order to have good security) the period will be longer than the number of possible values that can be generated. This means that the exposed number will not be unique within the period.

For similar reasons the same is true of long-period generators such as the Mersenne Twister.

Kenishakenison answered 27/11, 2008 at 22:59 Comment(0)
F
1

there are a couple ways to go about this one way would be to construct an array with the numbers 0000000 through 9999999 and then pick a random pick of these numbers in this array and swap the picked numbers values with the highest value Max then reduce max by 1 and pick another random member of this array up to the new maximum

each time reducing Max by one

for example (in basic) : (to the right are comments which should be removed in the actual program) Rndfunc is a call to whatever random number generator function you are using

dim array(0 to 9999999) as integer
for x% = 1 to 9999999
array(x%)=x%
next x%
maxPlus = 10000000
max =9999999
pickedrandom =int(Rndfunc*maxPlus)  picks a random indext of the array based on    
                                   how many numbers are left
maxplus = maxplus-1
swap array(pickedrandom) , array(max) swap this array value to the current end of the
                                     array 
max = max -1                   decrement the pointer of the max array value so it 
                              points to the next lowest place..

then keep doing this for each number you wish to pick , but you will need to have the option of using very big arrays

the other method would be as follows :generate a number and store it into an array that can grow dynamically then after that pick a new number and compare it to the value that is halfway from the first to the last element in the array in this case it would be the first number picked if it matches pick another random number, sort the array according to size and if there is not a match then depending on weather it is greater or smaller than the number you compared it with you go up or down in the list half of half the distance, each time that it does not match and is greater or lesser than what you are comparing it to.

each time halving it until you reach a gap size of one then you check once and stop as there is no match, and then the number is added to the list and the list is reshuffled in ascending order, so on and so on until you are done picking random numbers... hope this helps..

Fin answered 27/1, 2012 at 13:5 Comment(0)
D
0

PHP already has a function for this, uniqid. It generates a standard uuid which is great if you have to access the data from elsewhere. Don't reinvent the wheel.

Discourtesy answered 26/11, 2008 at 2:6 Comment(1)
uniqid returns a reasonably random string based on the current time, not a UUID.Apostatize
F
0

If you want to ensure that the random-numbers aren't repeating, you need a non-repeating random number-generator (as described here).

The basic idea is that the following formula seed * seed & p will produced non-repeating random-numbers for any input x such that 2x < p and p - x * x % p produces all other random-number aswell non-repeating, but only if p = 3 mod 4. So basically all you need is a single primnumber as close to 9999999 as possible. This way the effort can be reduced to a single read field, but with the downside that either too big IDs are generated or too few IDs will be generated.

This algorithm doesn't permute very well, so i'd recommend combining it with either XOR or addition or some other approach to change the exact value without destroying the 1-to-1-relation between seeds and their generated value.

Fleabite answered 4/10, 2015 at 22:49 Comment(0)
T
0

For Random Generation: ( In PHP )

Code:

It creates random numbers perfectly.

<?PHP
 /*set the bigger range if there is huge demand*/
$n=range(111111,999999);

shuffle($n); //shuffle those

for ($x=0; $x< 1; $x++) //selects unique random number.
{
echo $n[$x].' ';
}
echo "\n"

?>

Result:

First Time:

one

Second Time:

two

Third Time:

three

Texture answered 5/11, 2021 at 18:11 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.