How to generate a random integer number from within a range
Asked Answered
L

11

128

This is a follow on from a previously posted question:

How to generate a random number in C?

I wish to be able to generate a random number from within a particular range, such as 1 to 6 to mimic the sides of a die.

How would I go about doing this?

Laudian answered 24/3, 2010 at 16:55 Comment(7)
if you look at the second answer to the question you refer to you have the answer. rand() % 6.Centesimal
I didn't understand how it worked, so I decided to make a separate question for clarity.Laudian
Random thought: If you polled a random cross-section of programmers, you'd find a random number of them are randomly thinking of ways to randomly generate numbers. Considering the Universe is governed by precise and predictable laws, isn't it interesting that we try to generate things more randomly? Questions like this always tend to bring out the 10k+ posters.Saul
@Mats rand() % 6 can return a 0. Not good for a die.Extravagant
Can you mark https://mcmap.net/q/64548/-how-to-generate-a-random-integer-number-from-within-a-range as the accepted answer instead of the answer that links to it :) Thanks.Cattycornered
@Cattycornered It's been marked :)Laudian
Shouldn't the title of the question be "How to generate an INTEGER random number from within a range in C" instead of the current one?Eduction
A
192

All the answers so far are mathematically wrong. Returning rand() % N does not uniformly give a number in the range [0, N) unless N divides the length of the interval into which rand() returns (i.e. is a power of 2). Furthermore, one has no idea whether the moduli of rand() are independent: it's possible that they go 0, 1, 2, ..., which is uniform but not very random. The only assumption it seems reasonable to make is that rand() puts out a Poisson distribution: any two nonoverlapping subintervals of the same size are equally likely and independent. For a finite set of values, this implies a uniform distribution and also ensures that the values of rand() are nicely scattered.

This means that the only correct way of changing the range of rand() is to divide it into boxes; for example, if RAND_MAX == 11 and you want a range of 1..6, you should assign {0,1} to 1, {2,3} to 2, and so on. These are disjoint, equally-sized intervals and thus are uniformly and independently distributed.

The suggestion to use floating-point division is mathematically plausible but suffers from rounding issues in principle. Perhaps double is high-enough precision to make it work; perhaps not. I don't know and I don't want to have to figure it out; in any case, the answer is system-dependent.

The correct way is to use integer arithmetic. That is, you want something like the following:

#include <stdlib.h> // For random(), RAND_MAX

// Assumes 0 <= max <= RAND_MAX
// Returns in the closed interval [0, max]
long random_at_most(long max) {
  unsigned long
    // max <= RAND_MAX < ULONG_MAX, so this is okay.
    num_bins = (unsigned long) max + 1,
    num_rand = (unsigned long) RAND_MAX + 1,
    bin_size = num_rand / num_bins,
    defect   = num_rand % num_bins;

  long x;
  do {
   x = random();
  }
  // This is carefully written not to overflow
  while (num_rand - defect <= (unsigned long)x);

  // Truncated division is intentional
  return x/bin_size;
}

The loop is necessary to get a perfectly uniform distribution. For example, if you are given random numbers from 0 to 2 and you want only ones from 0 to 1, you just keep pulling until you don't get a 2; it's not hard to check that this gives 0 or 1 with equal probability. This method is also described in the link that nos gave in their answer, though coded differently. I'm using random() rather than rand() as it has a better distribution (as noted by the man page for rand()).

If you want to get random values outside the default range [0, RAND_MAX], then you have to do something tricky. Perhaps the most expedient is to define a function random_extended() that pulls n bits (using random_at_most()) and returns in [0, 2**n), and then apply random_at_most() with random_extended() in place of random() (and 2**n - 1 in place of RAND_MAX) to pull a random value less than 2**n, assuming you have a numerical type that can hold such a value. Finally, of course, you can get values in [min, max] using min + random_at_most(max - min), including negative values.

Accustomed answered 27/7, 2011 at 23:20 Comment(27)
tried this in xcode 4.5.1 .. am getting this error -> Undefined symbols for architecture i386: "_random_in_range",Happygolucky
@Happygolucky How is the method called in your program? This is a recursive function that calls itself.Monecious
@Adam Rosenfield,@Ryan Reich : In a related question where Adam had answered:#138283 the most upvoted answer : The usage of 'modulus' would then be incorrect, no? To generate 1..7 from 1..21,the procedure what Ryan has described should be used.Please correct me if I am wrong.Intimidate
A shortcut might be to use the library function arc4random_uniform()Roomette
@Roomette What library is that from?Accustomed
stdlib.h (assuming UNIXy system)Roomette
You should make the variables unsigned int, otherwise range could receive a negative value.Hardened
@Hardened I should really make an assert that min < max, rather.Accustomed
That's a different issue. Even if max>min, max-min may not fit in an int and result in a negative value due to overflow.Hardened
On further review, another issue here is that this won't work when max - min > RAND_MAX, which is more serious than the issue I stated above (e.g. VC++ has RAND_MAX of only 32767).Hardened
How about adding a third parameter "nesting_level" to prevent infinite recursion? What if rand() returned always the same number from outside the range, because srand wasn't initialized?Flapper
Aside: rand() could be too bad for that. Also, wouldn't it be easier to pass only one integer, the number of distinct values in the range?Almeida
I have edited my answer to reflect some of the comments I've gotten over the years (!).Accustomed
I think it should be defect = num_rand % num_bins;Sessler
@Sessler It shouldn't, because the point of defect is to ensure that the computation x/bin_size is uniformly distributed in the num_bins = num_rand/bin_size bins, which means that x must be uniformly chosen from a range evenly divisible by bin_size; thus, the loop condition.Accustomed
Imagine RAND_MAX = 3, max = 2. With your calculation you get num_bins = 3, num_rand = 4, bin_size = 1, defect = 0. The condition in the while will never fail. The routine will return values in the range 0 - 3. This is clearly wrong for max = 2Sessler
max <= RAND_MAX is not always true. Here, max is a long, yet RAND_MAX is an int. Very easy for a max to be much larger than an int. Suggest changing all long to int.Discommode
The while loop could be made more readable. Rather than performing assignment in the conditional, you probably want a do {} while().Sundae
@chux In the documentation comment, I assume that max <= RAND_MAX. This is consistent with the documentation of random(), which describes its values as being in the range [0, RAND_MAX]. Granted that it is an integer, the type is correct for the output of random() (which is preferred over rand(), see previous comment) and, in principle, one could use a random source with a larger range.Accustomed
I noticed the comment, "Assumes 0 <= max <= RAND_MAX", but code speaks louder than comments and max is long and RAND_MAX is int. It is curious the this fine answer used long rather than int as long bestows no benefits, but does incur liabilities.Discommode
@chux Even if RAND_MAX is int, it doesn't need to be INT_MAX, so declaring max as an int doesn't solve the problem you identify. In addition, random() is declared as long int random(), so it is type-correct to use a long for max (which is supposed to be comparable to the random value). Really, the only way to actually solve this problem is to do an explicit comparison of max with RAND_MAX that enforces the comment. I didn't do it because it has nothing to do with the abstract algorithm nor its implementation, but is "just" hygiene.Accustomed
Agree about RAND_MAX <= INT_MAX. Further I sincerely apologize, for it was only now I see your answer used long random(), a non-standard C function (yet common in *nix) instead of the C standard int rand() like the other answers in this post. Thus using long make perfect sense with long random().Discommode
you can get values in [min, max] using min + random_at_most(max - min + 1) the +1 actually exceeds the max range. I tested it and took + 1 off to run properly .Illona
@Illona So it does. An artifact of an older draft.Accustomed
Hey, This answer is cited by Comet OS book ;) First time I see that in a teaching bookEyecup
It is also cited in the OSTEP book :) pages.cs.wisc.edu/~remzi/OSTEP (Chapter 9, Page 4)Skirr
I'm curious why doing numbers bigger than RAND_MAX is tricky. Couldn't we generate several random numbers in [0,2^k) (where 2^k< RAND_MAX) and concatenate their binary strings?Horripilation
S
36

Following on from @Ryan Reich's answer, I thought I'd offer my cleaned up version. The first bounds check isn't required given the second bounds check, and I've made it iterative rather than recursive. It returns values in the range [min, max], where max >= min and 1+max-min < RAND_MAX.

unsigned int rand_interval(unsigned int min, unsigned int max)
{
    int r;
    const unsigned int range = 1 + max - min;
    const unsigned int buckets = RAND_MAX / range;
    const unsigned int limit = buckets * range;

    /* Create equal size buckets all in a row, then fire randomly towards
     * the buckets until you land in one of them. All buckets are equally
     * likely. If you land off the end of the line of buckets, try again. */
    do
    {
        r = rand();
    } while (r >= limit);

    return min + (r / buckets);
}
Sundae answered 9/7, 2013 at 17:27 Comment(3)
Note this will get stuck in an infinite loop if range >= RAND_MAX. Ask me how I know :/Sundae
Note that you are comparing an int to an unsigned int (r >= limit). The problem is easily solved by making limit an int (and optionally bucket too) since RAND_MAX / range < INT_MAX and buckets * range <= RAND_MAX. EDIT: I've submitted and edit proposal.Clincher
the solution from @Ryan Reich still gives me better (less-biased) distributionGlossography
K
25

Here is a formula if you know the max and min values of a range, and you want to generate numbers inclusive in between the range:

r = (rand() % (max + 1 - min)) + min
Kingkingbird answered 5/5, 2011 at 6:44 Comment(3)
As noted in Ryan's answer, this produces a biased result.Lettie
Biased result, potential int overflow with max+1-min.Discommode
this work only with integer min and max. If the min and max are float, it's no possible to do the % operationCheerful
M
17
unsigned int
randr(unsigned int min, unsigned int max)
{
       double scaled = (double)rand()/RAND_MAX;

       return (max - min +1)*scaled + min;
}

See here for other options.

Misconstrue answered 24/3, 2010 at 16:58 Comment(11)
+1: Slightly more random than the obvious solution using %.Canella
But you're adding overhead for what's a very simple operation. And C is all about efficiency and performance. Otherwise, why use it? ;-)Saul
@Canella - not really. Each distributes the slightly-higher-odds cases differently, that's all. The double math gives the impression that there's more precision there, but you could just as easily use (((max-min+1)*rand())/RAND_MAX)+min and get probably the exact same distribution (assuming that RAND_MAX is small enough relative to int to not overflow).Bristol
This is slightly dangerous: it's possible for this to (very rarely) return max + 1, if either rand() == RAND_MAX, or rand() is very close to RAND_MAX and floating-point errors push the final result past max + 1. To be safe, you should check that the result is within range before returning it.Marcelenemarcelia
Come on guys... dice doesn't need to be that random. Otherwise, how you emulate the "cheating" factor of trying to roll double sixes in monopoly. ;-)Saul
scaled whould be a random value within [0;1[, which means you should divide by RAND_MAX + 1.0; this'll prevent the return value max + 1 and converts RAND_MAX to double to prevent an overflow when RAND_MAX == INT_MAXKibbutz
@Christoph: I agree about RAND_MAX + 1.0. I'm still not sure that's good enough to prevent a max + 1 return, though: in particular, the + min at the end involves a round that could end up producing max + 1 for large values of rand(). Safer to abandon this approach altogether and use integer arithmetic.Marcelenemarcelia
If RAND_MAX is replaced by RAND_MAX+1.0 as Christoph suggests, then I believe that this is safe provided that the + min is done using integer arithmetic: return (unsigned int)((max - min + 1) * scaled) + min. The (non-obvious) reason is that assuming IEEE 754 arithmetic and round-half-to-even, (and also that max - min + 1 is exactly representable as a double, but that'll be true on a typical machine), it's always true that x * scaled < x for any positive double x and any double scaled satisfying 0.0 <= scaled && scaled < 1.0.Marcelenemarcelia
Grr. Replace "any positive double" by "any finite positive double that's strictly larger than DBL_MIN" in the previous comment.Marcelenemarcelia
It's not a precision issue. It's a random issue. Read Knuth's analysis of linear congruential random number generators. The left-most bits are more random than the right most bits.Canella
Fails for randr(0, UINT_MAX): always generates 0.Discommode
S
11

Wouldn't you just do:

srand(time(NULL));
int r = ( rand() % 6 ) + 1;

% is the modulus operator. Essentially it will just divide by 6 and return the remainder... from 0 - 5

Saul answered 24/3, 2010 at 16:58 Comment(8)
It will give results from 1 - 6. That's what the + 1 is for.Saul
Simon, show me a libc in use anywhere where rand() includes the low-order bits of the generator's state (if it uses an LCG). I haven't seen one so far—all of them (yes, including MSVC with RAND_MAX being just 32767) remove the low-order bits. Using modulus isn't recommended for other reasons, namely that it skews the distribution in favor of smaller numbers.Aristippus
@Johannes: So it's safe to say slot machines don't use modulus?Saul
How would I exclude a 0? It seems that if I run it in a loop of 30, maybe the second or third time it runs there's a 0 roughly half way into it. Is this some sort of fluke?Laudian
@Johannes: Maybe it's not so much an issue nowadays, but traditionally using the low-order bits isn't advisable. c-faq.com/lib/randrange.htmlMileage
@ato: yes, pretty much. Although there's a neat way around of it; Just detect when your generated number falls into the incomplete interval that actually skews the distribution and reject it in that case. That's what Java does and they have a pretty nice implementation of it.Aristippus
@James: For LCG and some other generators I agree. However, this should only ever be an issue if you're implementing the PRNG yourself because every library and framework I've seen so far shields you from that. Note however, that there are also generators that yield low-quality high-order bits. Since C doesn't even specify what PRNG to use, blindly recommending throwing away low-order bits doesn't do much good.Aristippus
@Joey, NetBSD and glibc (for TYPE_0) both use the low-order bits of their LCG.Checklist
C
9

For those who understand the bias problem but can't stand the unpredictable run-time of rejection-based methods, this series produces a progressively less biased random integer in the [0, n-1] interval:

r = n / 2;
r = (rand() * n + r) / (RAND_MAX + 1);
r = (rand() * n + r) / (RAND_MAX + 1);
r = (rand() * n + r) / (RAND_MAX + 1);
...

It does so by synthesising a high-precision fixed-point random number of i * log_2(RAND_MAX + 1) bits (where i is the number of iterations) and performing a long multiplication by n.

When the number of bits is sufficiently large compared to n, the bias becomes immeasurably small.

It does not matter if RAND_MAX + 1 is less than n (as in this question), or if it is not a power of two, but care must be taken to avoid integer overflow if RAND_MAX * n is large.

Checklist answered 10/7, 2013 at 0:36 Comment(7)
RAND_MAX is often INT_MAX, so RAND_MAX + 1 --> UB (like INT_MIN)Discommode
@chux that's what I mean about "care must be taken to avoid integer overflow if RAND_MAX * n is large". You need to arrange to use appropriate types for your requirements.Checklist
@chux "RAND_MAX is often INT_MAX" Yes, but only on 16 bit systems! Any reasonably modern architechture will put INT_MAX at 2^32 / 2 and RAND_MAX at 2^16 / 2. Is this an incorrect assumption?Boatwright
@Boatwright Tested today 2 32-bit int compilers, I found RAND_MAX == 32767 on one and RAND_MAX == 2147483647 on another. My overall experience (decades) is that RAND_MAX == INT_MAX more often. So disagree that a reasonably modern 32-bit architecture will certainly have a RAND_MAX at 2^16 / 2. Since the C spec allows 32767 <= RAND_MAX <= INT_MAX, I code to that anyways rather than a tendency.Discommode
@chux Interesting, thank you! Which compilers for which platforms gave this behaviour?Boatwright
@Boatwright Same 64-bit computer: Visual Studio 2010 RAND_MAX == 32767 and gcc-4.9.3-1.i686 RAND_MAX == 2147483647.Discommode
Still covered by "care must be taken to avoid integer overflow".Checklist
K
6

Here is a slight simpler algorithm than Ryan Reich's solution:

/// Begin and end are *inclusive*; => [begin, end]
uint32_t getRandInterval(uint32_t begin, uint32_t end) {
    uint32_t range = (end - begin) + 1;
    uint32_t limit = ((uint64_t)RAND_MAX + 1) - (((uint64_t)RAND_MAX + 1) % range);

    /* Imagine range-sized buckets all in a row, then fire randomly towards
     * the buckets until you land in one of them. All buckets are equally
     * likely. If you land off the end of the line of buckets, try again. */
    uint32_t randVal = rand();
    while (randVal >= limit) randVal = rand();

    /// Return the position you hit in the bucket + begin as random number
    return (randVal % range) + begin;
}

Example (RAND_MAX := 16, begin := 2, end := 7)
    => range := 6  (1 + end - begin)
    => limit := 12 (RAND_MAX + 1) - ((RAND_MAX + 1) % range)

The limit is always a multiple of the range,
so we can split it into range-sized buckets:
    Possible-rand-output: 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16
    Buckets:             [0, 1, 2, 3, 4, 5][0, 1, 2, 3, 4, 5][X, X, X, X, X]
    Buckets + begin:     [2, 3, 4, 5, 6, 7][2, 3, 4, 5, 6, 7][X, X, X, X, X]

1st call to rand() => 13
    → 13 is not in the bucket-range anymore (>= limit), while-condition is true
        → retry...
2nd call to rand() => 7
    → 7 is in the bucket-range (< limit), while-condition is false
        → Get the corresponding bucket-value 1 (randVal % range) and add begin
    => 3
Kinney answered 14/5, 2015 at 12:10 Comment(1)
RAND_MAX + 1 can readily overflow int addition. In that case, (RAND_MAX + 1) % range will generate questionable results. Consider (RAND_MAX + (uint32_t)1)Discommode
S
4

While Ryan is correct, the solution can be much simpler based on what is known about the source of the randomness. To re-state the problem:

  • There is a source of randomness, outputting integer numbers in range [0, MAX) with uniform distribution.
  • The goal is to produce uniformly distributed random integer numbers in range [rmin, rmax] where 0 <= rmin < rmax < MAX.

In my experience, if the number of bins (or "boxes") is significantly smaller than the range of the original numbers, and the original source is cryptographically strong - there is no need to go through all that rigamarole, and simple modulo division would suffice (like output = rnd.next() % (rmax+1), if rmin == 0), and produce random numbers that are distributed uniformly "enough", and without any loss of speed. The key factor is the randomness source (i.e., kids, don't try this at home with rand()).

Here's an example/proof of how it works in practice. I wanted to generate random numbers from 1 to 22, having a cryptographically strong source that produced random bytes (based on Intel RDRAND). The results are:

Rnd distribution test (22 boxes, numbers of entries in each box):     
 1: 409443    4.55%
 2: 408736    4.54%
 3: 408557    4.54%
 4: 409125    4.55%
 5: 408812    4.54%
 6: 409418    4.55%
 7: 408365    4.54%
 8: 407992    4.53%
 9: 409262    4.55%
10: 408112    4.53%
11: 409995    4.56%
12: 409810    4.55%
13: 409638    4.55%
14: 408905    4.54%
15: 408484    4.54%
16: 408211    4.54%
17: 409773    4.55%
18: 409597    4.55%
19: 409727    4.55%
20: 409062    4.55%
21: 409634    4.55%
22: 409342    4.55%   
total: 100.00%

This is as close to uniform as I need for my purpose (fair dice throw, generating cryptographically strong codebooks for WWII cipher machines such as http://users.telenet.be/d.rijmenants/en/kl-7sim.htm, etc). The output does not show any appreciable bias.

Here's the source of cryptographically strong (true) random number generator: Intel Digital Random Number Generator and a sample code that produces 64-bit (unsigned) random numbers.

int rdrand64_step(unsigned long long int *therand)
{
  unsigned long long int foo;
  int cf_error_status;

  asm("rdrand %%rax; \
        mov $1,%%edx; \
        cmovae %%rax,%%rdx; \
        mov %%edx,%1; \
        mov %%rax, %0;":"=r"(foo),"=r"(cf_error_status)::"%rax","%rdx");
        *therand = foo;
  return cf_error_status;
}

I compiled it on Mac OS X with clang-6.0.1 (straight), and with gcc-4.8.3 using "-Wa,q" flag (because GAS does not support these new instructions).

Septavalent answered 19/10, 2014 at 0:16 Comment(3)
Compiled with gcc randu.c -o randu -Wa,q (GCC 5.3.1 on Ubuntu 16) or clang randu.c -o randu (Clang 3.8.0) works, but dumps core at runtime with Illegal instruction (core dumped). Any ideas?Boatwright
First, I don't know whether your CPU actually supports RDRAND instruction. Your OS is fairly recent, but the CPU may not be. Second (but this is less likely) - I've no idea what kind of assembler Ubuntu includes (and Ubuntu tends to be fairly backwards wrt. updating packages). Check the Intel site I referred to for ways to test whether your CPU supports RDRAND.Septavalent
You have indeed good points. What I still cannot get is what is so wrong with rand(). I tried some tests and posted this question but I cannot find a definitive answer yet.Eduction
R
4

In order to avoid the modulo bias (suggested in other answers) you can always use:

arc4random_uniform(MAX-MIN)+MIN

Where "MAX" is the upper bound and "MIN" is lower bound. For example, for numbers between 10 and 20:

arc4random_uniform(20-10)+10

arc4random_uniform(10)+10

Simple solution and better than using "rand() % N".

Rouge answered 4/3, 2016 at 21:51 Comment(2)
Woohoo, this is a billion times better than the other answers. Worth noting you need to #include <bsd/stdlib.h> first. Also, any idea how to get this on Windows without MinGW or CygWin?Boatwright
No, it is not per se better than the other answers, because the other answers are more generic. Here you are limited to arc4random, the other answers allow you to choose a different random source, operate with different number-types,... and last but not least they might help someone to understand the problem. Don't forget that the question is also interesting for other people who might have some special requirements or no access to arc4random... Nonetheless, if you have access to it and want a quick solution, it is indeed a very good answer 😊Kinney
W
1

As said before modulo isn't sufficient because it skews the distribution. Heres my code which masks off bits and uses them to ensure the distribution isn't skewed.

static uint32_t randomInRange(uint32_t a,uint32_t b) {
    uint32_t v;
    uint32_t range;
    uint32_t upper;
    uint32_t lower;
    uint32_t mask;

    if(a == b) {
        return a;
    }

    if(a > b) {
        upper = a;
        lower = b;
    } else {
        upper = b;
        lower = a; 
    }

    range = upper - lower;

    mask = 0;
    //XXX calculate range with log and mask? nah, too lazy :).
    while(1) {
        if(mask >= range) {
            break;
        }
        mask = (mask << 1) | 1;
    }


    while(1) {
        v = rand() & mask;
        if(v <= range) {
            return lower + v;
        }
    }

}

The following simple code lets you look at the distribution:

int main() {

    unsigned long long int i;


    unsigned int n = 10;
    unsigned int numbers[n];


    for (i = 0; i < n; i++) {
        numbers[i] = 0;
    }

    for (i = 0 ; i < 10000000 ; i++){
        uint32_t rand = random_in_range(0,n - 1);
        if(rand >= n){
            printf("bug: rand out of range %u\n",(unsigned int)rand);
            return 1;
        }
        numbers[rand] += 1;
    }

    for(i = 0; i < n; i++) {
        printf("%u: %u\n",i,numbers[i]);
    }

}
Weinberg answered 17/10, 2013 at 6:28 Comment(2)
Becomes quite inefficient when you reject numbers from the rand(). This will be especially inefficient when the range has a size that can be written as 2^k + 1. Then nearly half of all your attempts from a slow rand() call will be rejected by the condition. Would it may be better to calc RAND_MAX modulo range. Like: v = rand(); if (v > RAND_MAX - (RAND_MAX % range) -> reject and try again; else return v % range; I understand that modulo is a much slower operation than masking, but I still think ..... it should be tested.Delate
rand() returns an int in the range [0..RAND_MAX]. That range can easily be a subrange of uint32_t and then randomInRange(0, ,b) never generates values in the range (INT_MAX...b].Discommode
P
0

Will return a floating point number in the range [0,1]:

#define rand01() (((double)random())/((double)(RAND_MAX)))
Penstock answered 2/12, 2018 at 5:39 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.