Does "n * (rand() / RAND_MAX)" make a skewed random number distribution?
Asked Answered
C

1

10

I'd like to find an unskewed way of getting random numbers in C (although at most I'm going to be using it for values of 0-20, and more likely only 0-8). I've seen this formula but after running some tests I'm not sure if it's skewed or not. Any help?

Here is the full function used:

int randNum() 
{ 
    return 1 + (int) (10.0 * (rand() / (RAND_MAX + 1.0)));
}

I seeded it using:

unsigned int iseed = (unsigned int)time(NULL);
srand (iseed);

The one suggested below refuses to work for me I tried

int greek; 
for (j=0; j<50000; j++) 
{ 
greek =rand_lim(5); 
printf("%d, " greek); 
greek =(int) (NUM * (rand() / (RAND_MAX + 1.0))); 
int togo=number[greek]; 
number[greek]=togo+1; 
}

and it stops working and gives me the same number 50000 times when I comment out printf.

Chanson answered 18/4, 2012 at 23:3 Comment(5)
Shouldn't RAND_MAX + 1.0 be just RAND_MAX?Galloping
@EdHeal, if you want the interval to be [0.0,1.0) (i.e. not include 1.0) you must divide by more than RAND_MAX, although I think a value smaller than 1.0 would work better.Roughhew
@MarkRansom - I know that. The question asks for values in the range 0-20. So the formula will just require the division by RAND_MAX, as it does not mention that the range does not exclude 20.Galloping
@EdHeal, your method would work if you multiply by 20.999999999999 instead of 20. Otherwise the odds of outputing 20 are 1/RAND_MAX which is about as skewed as you can get.Roughhew
Of course it gives you always the same value, since you keep calling the unshown rand_lim(5), not rand().Occult
K
16

Yes, it's skewed, unless your RAND_MAX happens to be a multiple of 10.

If you take the numbers from 0 to RAND_MAX, and try to divide them into 10 piles, you really have only three possibilities:

  1. RAND_MAX is a multiple of 10, and the piles come out even.
  2. RAND_MAX is not a multiple of 10, and the piles come out uneven.
  3. You split it into uneven groups to start with, but throw away all the "extras" that would make it uneven.

You rarely have control over RAND_MAX, and it's often a prime number anyway. That really only leaves 2 and 3 as possibilities.

The third option looks roughly like this: [Edit: After some thought, I've revised this to produce numbers in the range 0...(limit-1), to fit with the way most things in C and C++ work. This also simplifies the code (a tiny bit).

int rand_lim(int limit) {
/* return a random number in the range [0..limit)
 */

    int divisor = RAND_MAX/limit;
    int retval;

    do { 
        retval = rand() / divisor;
    } while (retval == limit);

    return retval;
}

For anybody who questions whether this method might leave some skew, I also wrote a rather different version, purely for testing. This one uses a decidedly non-random generator with a very limited range, so we can simply iterate through every number in the range. It looks like this:

#include <stdlib.h>
#include <stdio.h>

#define MAX 1009

int next_val() {
    // just return consecutive numbers
    static int v=0;

    return v++;
}

int lim(int limit) {
    int divisor = MAX/limit;
    int retval;

    do {
        retval = next_val() / divisor;
    } while (retval == limit);

    return retval;
}

#define LIMIT 10

int main() {

    // we'll allocate extra space at the end of the array:
    int buckets[LIMIT+2] = {0};
    int i;

    for (i=0; i<MAX; i++)
        ++buckets[lim(LIMIT)];

    // and print one beyond what *should* be generated
    for (i=0; i<LIMIT+1; i++)
        printf("%2d: %d\n", i, buckets[i]);
}

So, we're starting with numbers from 0 to 1009 (1009 is prime, so it won't be an exact multiple of any range we choose). So, we're starting with 1009 numbers, and splitting it into 10 buckets. That should give 100 in each bucket, and the 9 leftovers (so to speak) get "eaten" by the do/while loop. As it's written right now, it allocates and prints out an extra bucket. When I run it, I get exactly 100 in each of buckets 0..9, and 0 in bucket 10. If I comment out the do/while loop, I see 100 in each of 0..9, and 9 in bucket 10.

Just to be sure, I've re-run the test with various other numbers for both the range produced (mostly used prime numbers), and the number of buckets. So far, I haven't been able to get it to produce skewed results for any range (as long as the do/while loop is enabled, of course).

One other detail: there is a reason I used division instead of remainder in this algorithm. With a good (or even decent) implementation of rand() it's irrelevant, but when you clamp numbers to a range using division, you keep the upper bits of the input. When you do it with remainder, you keep the lower bits of the input. As it happens, with a typical linear congruential pseudo-random number generator, the lower bits tend to be less random than the upper bits. A reasonable implementation will throw out a number of the least significant bits already, rendering this irrelevant. On the other hand, there are some pretty poor implementations of rand around, and with most of them, you end up with better quality of output by using division rather than remainder.

I should also point out that there are generators that do roughly the opposite -- the lower bits are more random than the upper bits. At least in my experience, these are quite uncommon. That with which the upper bits are more random are considerably more common.

Kho answered 18/4, 2012 at 23:11 Comment(10)
I tried that damn function before and it refuses to work. It just keeps giving me the same number over and over. If I have int greek; for (j=0; j<50000; j++) { greek =rand_lim(5); printf("%d, " greek); greek =(int) (NUM * (rand() / (RAND_MAX + 1.0))); int togo=number[greek]; number[greek]=togo+1; } If I edit out the print statement it gives me the same number 50000 timesChanson
Oh yeah I turned it into a while loop instead of a do while loop which I thought would be the same.Chanson
@RobertZ: I've added a little bit of demo code to the answer.Kho
If you take the numbers from 0 to RAND_MAX there are RAND_MAX + 1 numbers, so it is RAND_MAX + 1 that must be a multiple of 10.Motorboating
A thought occurs, couldn't I make it do{retval =rand() } while (retval<(Rand_MAX-(Rand_Max%limit+1))? That way if Rand_Max = 43502 it will ignore 43500, 43501 and 43502 and the results will be divisible?Chanson
@caf: well, technically there are RAND_MAX+1 numbers, and the range is normally for 0..LIMIT, so there are LIMIT+1 numbers, so RAND_MAX+1 needs to be a multiple of LIMIT+1 -- not that it changes any of the basic idea involved.Kho
@RobertZ: I think you mean retval > (RAND_MAX(..., but yes you can re-cast it in that direction too.Kho
Oh and retval = retval%limit at the end. Although it would be between 0-9 and not 0-10.Chanson
@caf: It produces un-skewed results as it is, but is more conservative than necessary, throwing away more numbers than it really needs to. Thanks for asking though -- I'd been meaning to check that for some time, and your asking get me to do so.Kho
Off by one. "RAND_MAX is a multiple of 10, and the piles come out even." --> "RAND_MAX + 1 is a multiple of 10, and the piles come out even.", etc.Adrea

© 2022 - 2024 — McMap. All rights reserved.