Generating a uniform distribution of INTEGERS in C
Asked Answered
I

4

12

I've written a C function that I think selects integers from a uniform distribution with range [rangeLow, rangeHigh], inclusive. This isn't homework--I'm just using this in some embedded systems tinkering that I'm doing for fun.

In my test cases, this code appears to produce an appropriate distribution. I'm not feeling fully confident that the implementation is correct, though. Could someone do a sanity check and let me know if I've done anything wrong here?

//uniform_distribution returns an INTEGER in [rangeLow, rangeHigh], inclusive.
int uniform_distribution(int rangeLow, int rangeHigh)
{
    int myRand = (int)rand(); 
    int range = rangeHigh - rangeLow + 1; //+1 makes it [rangeLow, rangeHigh], inclusive.
    int myRand_scaled = (myRand % range) + rangeLow;
    return myRand_scaled;
}
//note: make sure rand() was already initialized using srand()

P.S. I searched for other questions like this. However, it was hard to filter out the small subset of questions that discuss random integers instead of random floating-point numbers.

Indices answered 25/7, 2012 at 1:54 Comment(1)
For decent randomness you may have to go for something platform-specific or at least use something outside standard C, e.g. POSIX or BSD-spec functionsCharlsiecharlton
E
11

On some implementations, rand() did not provide good randomness on its lower order bits, so the modulus operator would not provide very random results. If you find that to be the case, you could try this instead:

int uniform_distribution(int rangeLow, int rangeHigh) {
    double myRand = rand()/(1.0 + RAND_MAX); 
    int range = rangeHigh - rangeLow + 1;
    int myRand_scaled = (myRand * range) + rangeLow;
    return myRand_scaled;
}

Using rand() this way will produce a bias as noted by Lior. But, the technique is fine if you can find a uniform number generator to calculate myRand. One possible candidate would be drand48(). This will greatly reduce the amount of bias to something that would be very difficult to detect.

However, if you need something cryptographically secure, you should use an algorithm outlined in Lior's answer, assuming your rand() is itself cryptographically secure (the default one is probably not, so you would need to find one). Below is a simplified implementation of what Lior described. Instead of counting bits, we assume the range falls within RAND_MAX, and compute a suitable multiple. Worst case, the algorithm ends up calling the random number generator twice on average per request for a number in the range.

int uniform_distribution_secure(int rangeLow, int rangeHigh) {
    int range = rangeHigh - rangeLow + 1;
    int secureMax = RAND_MAX - RAND_MAX % range;
    int x;
    do x = secure_rand(); while (x >= secureMax);
    return rangeLow + x % range;
}
Ermine answered 25/7, 2012 at 3:6 Comment(1)
It should be "return rangeLow + x % range;".Cal
I
15

Let's assume that rand() generates a uniformly-distributed value I in the range [0..RAND_MAX], and you want to generate a uniformly-distributed value O in the range [L,H].

Suppose I in is the range [0..32767] and O is in the range [0..2].

According to your suggested method, O= I%3. Note that in the given range, there are 10923 numbers for which I%3=0, 10923 number for which I%3=1, but only 10922 number for which I%3=2. Hence your method will not map a value from I into O uniformly.

As another example, suppose O is in the range [0..32766].

According to your suggested method, O=I%32767. Now you'll get O=0 for both I=0 and I=32767. Hence 0 is twice as likely than any other value - your method is again nonuniform.


The suggest way to generate a uniform mapping is as follow:

  1. Calculate the number of bits that are needed to store a random value in the range [L,H]:

    unsigned int nRange = (unsigned int)H - (unsigned int)L + 1;
    unsigned int nRangeBits= (unsigned int)ceil(log((double(nRange) / log(2.));

  2. Generate nRangeBits random bits

    this can be easily implemented by shifting-right the result of rand()

  3. Ensure that the generated number is not greater than H-L. If it is - repeat step 2.

  4. Now you can map the generated number into O just by adding a L.

Ideation answered 25/7, 2012 at 8:27 Comment(1)
I have referenced this good answer here. Small candidate improvement ceil(log((double(nRange) / log(2.)) --> ceil(log2((double)nRange)) or some other integer only computation.Spandex
E
11

On some implementations, rand() did not provide good randomness on its lower order bits, so the modulus operator would not provide very random results. If you find that to be the case, you could try this instead:

int uniform_distribution(int rangeLow, int rangeHigh) {
    double myRand = rand()/(1.0 + RAND_MAX); 
    int range = rangeHigh - rangeLow + 1;
    int myRand_scaled = (myRand * range) + rangeLow;
    return myRand_scaled;
}

Using rand() this way will produce a bias as noted by Lior. But, the technique is fine if you can find a uniform number generator to calculate myRand. One possible candidate would be drand48(). This will greatly reduce the amount of bias to something that would be very difficult to detect.

However, if you need something cryptographically secure, you should use an algorithm outlined in Lior's answer, assuming your rand() is itself cryptographically secure (the default one is probably not, so you would need to find one). Below is a simplified implementation of what Lior described. Instead of counting bits, we assume the range falls within RAND_MAX, and compute a suitable multiple. Worst case, the algorithm ends up calling the random number generator twice on average per request for a number in the range.

int uniform_distribution_secure(int rangeLow, int rangeHigh) {
    int range = rangeHigh - rangeLow + 1;
    int secureMax = RAND_MAX - RAND_MAX % range;
    int x;
    do x = secure_rand(); while (x >= secureMax);
    return rangeLow + x % range;
}
Ermine answered 25/7, 2012 at 3:6 Comment(1)
It should be "return rangeLow + x % range;".Cal
T
3

I think it is known that rand() is not very good. It just depends on how good of "random" data you need.

I suppose you could write a test then calculate the chi-squared value to see how good your uniform generator is:

http://en.wikipedia.org/wiki/Pearson%27s_chi-squared_test

Depending on your use (don't use this for your online poker shuffler), you might consider a LFSR

http://en.wikipedia.org/wiki/Linear_feedback_shift_register

It may be faster, if you just want some psuedo-random output. Also, supposedly they can be uniform, although I haven't studied the math enough to back up that claim.

Terrilynterrine answered 25/7, 2012 at 2:11 Comment(0)
B
1

A version which corrects the distribution errors (noted by Lior), involves the high-bits returned by rand() and only uses integer math (if that's desirable):

int uniform_distribution(int rangeLow, int rangeHigh)
{
    int range = rangeHigh - rangeLow + 1; //+1 makes it [rangeLow, rangeHigh], inclusive.
    int copies=RAND_MAX/range; // we can fit n-copies of [0...range-1] into RAND_MAX
    // Use rejection sampling to avoid distribution errors
    int limit=range*copies;    
    int myRand=-1;
    while( myRand<0 || myRand>=limit){
        myRand=rand();   
    }
    return myRand/copies+rangeLow;    // note that this involves the high-bits
}

//note: make sure rand() was already initialized using srand()

This should work well provided that range is much smaller than RAND_MAX, otherwise you'll be back to the problem that rand() isn't a good random number generator in terms of its low-bits.

Brummett answered 25/7, 2012 at 20:34 Comment(3)
you meant myRand < 0 || myRand >= limit, no ? And why not using a do while ?Cal
@Cal I systematically use half open intervals for stuff like this; c.f. cs.utexas.edu/users/EWD/transcriptions/EWD08xx/EWD831.html and avoid do-while as part of my "style".Brummett
Ok dave, but myRand will never be both < 0 and >= limit.Cal

© 2022 - 2024 — McMap. All rights reserved.