How do you generate uniformly distributed random integers?
Asked Answered
M

14

189

I need a function which would generate a random integer in a given range (including boundary values). I don't have unreasonable quality/randomness requirements; I have four requirements:

  • I need it to be fast. My project needs to generate millions (or sometimes even tens of millions) of random numbers and my current generator function has proven to be a bottleneck.
  • I need it to be reasonably uniform (use of rand() is perfectly fine).
  • the minimum-maximum ranges can be anything from <0, 1> to <-32727, 32727>.
  • it has to be seedable.

I currently have the following C++ code:

output = min + (rand() * (int)(max - min) / RAND_MAX)

The problem is that it is not really uniform - max is returned only when rand() = RAND_MAX (for Visual C++ it is 1/32727). This is a major issue for small ranges like <-1, 1>, where the last value is almost never returned.

So I grabbed pen and paper and came up with following formula (which builds on the (int)(n + 0.5) integer rounding trick):

( (max - min) * rand() + (RAND_MAX / (2 * (max - min))) ) / RAND_MAX

But it still doesn't give me a uniform distribution. Repeated runs with 10000 samples give me ratio of 37:50:13 for values values -1, 0. 1.

Is there a better formula? (Or even whole pseudo-random number generator function?)

Mohammedanism answered 15/2, 2011 at 19:55 Comment(14)
See: #2254998Blubber
Is there anything wrong with Modulus?Churn
@Bill MaGriff: yes. It has the same problem. A simplified version is: how can you divide 10 pieces of candy among 3 children evenly (without breaking any of the candies)? The answer is, you can't -- you have to give three to each child, and just not give the tenth one to anybody.Blubber
Have you looked at Boost.Random?Gamopetalous
@Jerry - that suggests another long-term-unbiassed alternative. On each occasion, you could give the extra piece to a different child. Bresenhams (or maybe that span optimised version) covers some similar ground, IIRC. Probably not worth the hassle, though, just to save a few sticks of candy.Demurrage
@Jerry - I was going to point out a problem with my comment, but if an alien shapeshifter is able to take the place of a different one of your children on each occasion, ensuring it only gets a fair share of the candy is the least of your problems.Demurrage
@Steve314: Here we're dealing with RAND_MAX, which is basically "all the candies that can ever be given to these children in their lives."Blubber
@Jerry - On Linux GCC yes, but Windows compilers seem a bit short of candies.Demurrage
Check the Andrew Koenig article "A simple problem that is almost never solved correctly": drdobbs.com/blog/archives/2010/11/a_simple_proble.htmlVoyeur
@Gene Bushuyev: Both Andrew and I have been harping on this subject for quite a while now. See: groups.google.com/group/comp.lang.c++/browse_frm/thread/…, and: groups.google.com/group/comp.os.ms-windows.programmer.tools.mfc/…Blubber
@GeneBushuyev: Unfortunately, the link is broken now and I cannot find the article with a simple Google search. Do you have the new link?Valonia
@musiphil: Internet Archive Wayback Machine.Rudnick
Your code is C, not C++. There is no such thing like (int) in C++.Grained
@Bill MaGriff: yes. It has the same problem. A simplified version is: how can you divide 10 pieces of candy among 3 children evenly (without breaking any of the candies)? The answer is, you can't -- you have to give three to each child, and just not give the tenth one to anybody. – thanks. now i am sure with itCentury
M
119

A fast, somewhat better than yours, but still not properly uniform distributed solution is

output = min + (rand() % static_cast<int>(max - min + 1))

Except when the size of the range is a power of 2, this method produces biased non-uniform distributed numbers regardless the quality of rand(). For a comprehensive test of the quality of this method, please read this.

Mortenson answered 15/2, 2011 at 20:16 Comment(5)
Thanks, this seems to be good enough for me from quick tests - its distribution for the -1, 0, 1 is nearly 33:33:33.Sussex
It returns max value always. Am I missing here something? :|Mol
rand() should be considered harmful in C++ there are much better ways of getting something that is uniformly distributed and actually random.Koressa
Does it really return a correct number within range 100% of the time? I've found some other stackoverflow answer here that is using recursion to do it "the right way": https://mcmap.net/q/64548/-how-to-generate-a-random-integer-number-from-within-a-rangeCowhide
Since it is a highly upvoted (than desired) answer, which seems reliable source of information for many new readers, I think it's very important to mention the quality and potential dangers of this solution, so I made an edit.Grained
D
351

The simplest (and hence best) C++ (using the 2011 standard) answer is:

#include <random>

std::random_device rd;     // Only used once to initialise (seed) engine
std::mt19937 rng(rd());    // Random-number engine used (Mersenne-Twister in this case)
std::uniform_int_distribution<int> uni(min,max); // Guaranteed unbiased

auto random_integer = uni(rng);

There isn't any need to reinvent the wheel, worry about bias, or worry about using time as the random seed.

Diacaustic answered 1/11, 2013 at 14:8 Comment(14)
Nowadays this should be the answer. Pseudo-random number generation reference for more features.Lackaday
I agree on the "simplest" (and most idiomatic), not on the "best". Unfortunately the Standard gives no guarantee about random_device, which might be completely broken in some cases. Moreover, mt19937, while a very good general-purpose choice, is not the fastest of the good-quality generators (see this comparison) and therefore might be not the ideal candidate for the OP.Hyatt
@AlbertoM Unfortunately, the comparison your referring to does not provide enough details and is not reproducible, which renders it dubious (moreover, it's from 2015, while my answer dates back to 2013). It may well be true that there are better methods around (and hopefully in future, minstd will be such a method), but that's progress. As to the poor implementation of random_device -- that's horrible and should be considered a bug (possibly also of the C++ standard, if it allows it).Diacaustic
I totally agree with you; I did not actually want to criticize your solution per se, just wanted to warn the casual reader that the definitive answer on the matter, despite the promises of C++11, is yet to be written. I am going to post an overview of the subject as of 2015 as answer of a related question.Hyatt
That is "simplest"? Could you elaborate why the clearly much simpler rand() is not an option, and does it matter for non-critical use, like generating a random pivot index? Also, do I have to worry about constructing random_device/mt19937/uniform_int_distribution in a tight loop / inlined function? Should I rather prefer to pass them around?Lambart
@Lambart You seem to not quite understand the issue. The first problem with rand() is that does not return a random integer between the user given limits (min and max), but between 0 and RAND_MAX, and mapping this to the range [min, max] w/o any bias is non-trivial. The second problem is that of seeding the random sequence. The random_device is only needed for seeding and does not need to be carried around (and you can references around if you're worried). The third problem with rand() is its use of a (hidden) global state and that it isn't guaranteed to be threadsafe.Diacaustic
@Lambart You would never seed/initialise/construct a random number sequence/generator within a tight loop.Diacaustic
@TeodorKolev Perhaps related to the comment by Alberto? (see also this post).Diacaustic
@Diacaustic Any reason for using auto instead of int?Ileus
@AndreyPortnoy I always use auto for automatic variables, if possible, because this make maintenance easier. It will automatically pick up the correct type, even if I later change the template parameter for uniform_int_distribution<> to something else, say int64_t.Diacaustic
"No need to re-invent the wheel. No need to worry about bias. No need to worry about using time as random seed." - You only need to worry about whether std::random_device is truly random, which is not for an ESP32 microcontroller for example.Cao
@Cao That's unfortunately allowed by the standard, if the hardware does not support a truly random device. So if you're coding for such hardware, you must make amends. It's implementation dependent as well (which implementation did you have the issue with?)Diacaustic
Careful here: max is inclusive.Disingenuous
@apnkpr Sincestd::default_random_engine is implementation defined, it is not portable and hence should never be used. In other words, using but not caring about the random number engine is risky (as for anything which you don't understand but use).Diacaustic
M
119

A fast, somewhat better than yours, but still not properly uniform distributed solution is

output = min + (rand() % static_cast<int>(max - min + 1))

Except when the size of the range is a power of 2, this method produces biased non-uniform distributed numbers regardless the quality of rand(). For a comprehensive test of the quality of this method, please read this.

Mortenson answered 15/2, 2011 at 20:16 Comment(5)
Thanks, this seems to be good enough for me from quick tests - its distribution for the -1, 0, 1 is nearly 33:33:33.Sussex
It returns max value always. Am I missing here something? :|Mol
rand() should be considered harmful in C++ there are much better ways of getting something that is uniformly distributed and actually random.Koressa
Does it really return a correct number within range 100% of the time? I've found some other stackoverflow answer here that is using recursion to do it "the right way": https://mcmap.net/q/64548/-how-to-generate-a-random-integer-number-from-within-a-rangeCowhide
Since it is a highly upvoted (than desired) answer, which seems reliable source of information for many new readers, I think it's very important to mention the quality and potential dangers of this solution, so I made an edit.Grained
B
61

If your compiler supports C++0x and using it is an option for you, then the new standard <random> header is likely to meet your needs. It has a high quality uniform_int_distribution which will accept minimum and maximum bounds (inclusive as you need), and you can choose among various random number generators to plug into that distribution.

Here is code that generates a million random ints uniformly distributed in [-57, 365]. I've used the new std <chrono> facilities to time it as you mentioned performance is a major concern for you.

#include <iostream>
#include <random>
#include <chrono>

int main()
{
    typedef std::chrono::high_resolution_clock Clock;
    typedef std::chrono::duration<double> sec;
    Clock::time_point t0 = Clock::now();
    const int N = 10000000;
    typedef std::minstd_rand G;                // Select the engine
    G g;                                       // Construct the engine
    typedef std::uniform_int_distribution<> D; // Select the distribution
    D d(-57, 365);                             // Construct the distribution
    int c = 0;
    for (int i = 0; i < N; ++i)
        c += d(g);                             // Generate a random number
    Clock::time_point t1 = Clock::now();
    std::cout << N/sec(t1-t0).count() << " random numbers per second.\n";
    return c;
}

For me (2.8 GHz Intel Core i5) this prints out:

2.10268e+07 random numbers per second.

You can seed the generator by passing in an int to its constructor:

    G g(seed);

If you later find that int doesn't cover the range you need for your distribution, this can be remedied by changing the uniform_int_distribution like so (e.g., to long long):

    typedef std::uniform_int_distribution<long long> D;

If you later find that the minstd_rand isn't a high enough quality generator, that can also easily be swapped out. E.g.:

    typedef std::mt19937 G;  // Now using mersenne_twister_engine

Having separate control over the random number generator, and the random distribution can be quite liberating.

I've also computed (not shown) the first four "moments" of this distribution (using minstd_rand) and compared them to the theoretical values in an attempt to quantify the quality of the distribution:

min = -57
max = 365
mean = 154.131
x_mean = 154
var = 14931.9
x_var = 14910.7
skew = -0.00197375
x_skew = 0
kurtosis = -1.20129
x_kurtosis = -1.20001

(The x_ prefix refers to "expected".)

Biel answered 26/2, 2011 at 21:14 Comment(3)
This answer could use a short summary code snippet that shows only the code that is actually needed to generate a random integer from a range.Charland
The problem is made easier by the fact that min and max of the distribution never change. What if you had to create d at every iteration with different bounds? How much would it slow down the loop?Rooker
The design of <random> is such that you don't have to create a new distribution to use other parameters. You can just insert new parameters to use at the call site where you ask for another random value: d(g, D::param_type{m, M}). This will not impact performance at all as the overload that doesn't have an explicit param_type parameter typically calls the overload that does: github.com/llvm/llvm-project/blob/main/libcxx/include/__random/…Biel
C
17

Let's split the problem into two parts:

  • Generate a random number n in the range 0 through (max-min).
  • Add min to that number

The first part is obviously the hardest. Let's assume that the return value of rand() is perfectly uniform. Using modulo will add bias to the first (RAND_MAX + 1) % (max-min+1) numbers. So if we could magically change RAND_MAX to RAND_MAX - (RAND_MAX + 1) % (max-min+1), there would no longer be any bias.

It turns out that we can use this intuition if we are willing to allow pseudo-nondeterminism into the running time of our algorithm. Whenever rand() returns a number which is too large, we simply ask for another random number until we get one which is small enough.

The running time is now geometrically distributed, with expected value 1/p where p is the probability of getting a small enough number on the first try. Since RAND_MAX - (RAND_MAX + 1) % (max-min+1) is always less than (RAND_MAX + 1) / 2, we know that p > 1/2, so the expected number of iterations will always be less than two for any range. It should be possible to generate tens of millions of random numbers in less than a second on a standard CPU with this technique.

Although the above is technically correct, DSimon's answer is probably more useful in practice. You shouldn't implement this stuff yourself. I have seen a lot of implementations of rejection sampling and it is often very difficult to see if it's correct or not.

Culdesac answered 15/2, 2011 at 20:37 Comment(3)
For completeness: This is Rejection Sampling.Cioffred
Fun fact: Joel Spolsky once mentioned a version of this question as an example of what StackOverflow was good at answering. I looked through the answers on the site involving rejection sampling at that time and every single one was incorrect.Elseelset
One of the tricky aspects of this is that RAND_MAX is often equal to INT_MAX, so that RAND_MAX + 1 overflows causing undefined behavior.Rheta
N
13

Use the Mersenne Twister. The Boost implementation is rather easy to use and is well tested in many real-world applications. I've used it myself in several academic projects, such as artificial intelligence and evolutionary algorithms.

Here's their example where they make a simple function to roll a six-sided die:

#include <boost/random/mersenne_twister.hpp>
#include <boost/random/uniform_int.hpp>
#include <boost/random/variate_generator.hpp>

boost::mt19937 gen;

int roll_die() {
    boost::uniform_int<> dist(1, 6);
    boost::variate_generator<boost::mt19937&, boost::uniform_int<> > die(gen, dist);
    return die();
}

Oh, and here's some more pimping of this generator just in case you aren't convinced you should use it over the vastly inferior rand():

The Mersenne Twister is a "random number" generator invented by Makoto Matsumoto and Takuji Nishimura; their website includes numerous implementations of the algorithm.

Essentially, the Mersenne Twister is a very large linear-feedback shift register. The algorithm operates on a 19,937 bit seed, stored in an 624-element array of 32-bit unsigned integers. The value 2^19937-1 is a Mersenne prime; the technique for manipulating the seed is based on an older "twisting" algorithm -- hence the name "Mersenne Twister".

An appealing aspect of the Mersenne Twister is its use of binary operations -- as opposed to time-consuming multiplication -- for generating numbers. The algorithm also has a very long period, and good granularity. It is both fast and effective for non-cryptographic applications.

Nonparticipation answered 15/2, 2011 at 20:13 Comment(7)
The Mersenne twister is a good generator, but the problem he's dealing with remains, regardless of the underlying generator itself.Blubber
I don't want to use Boost just for the random generator, because (since my project is a library) it means introducing another dependency to the project. I will probably be forced to use it anyways in the future, so then I can switch to this generator.Sussex
@Jerry Coffin Which problem? I offered it because it satisfied all of his requirements: it's fast, it's uniform (using the boost::uniform_int distribution), you can transform the min max ranges into anything you like, and it's seedable.Nonparticipation
@mzabsky I probably wouldn't let that stop me, when I had to ship my projects to my professors for submission, I just included the relevant boost header files I was using; you shouldn't have to package the entire 40mb boost library with your code. Of course in your case this might not be feasible for other reasons such as copyright...Nonparticipation
@Nonparticipation My project is not really a scientific simulator or something that needs really uniform distribution. I used the old generator for 1.5 years without any issue, I only noticed the biased distribution when I first needed it to generate numbers from very small range (3 in this case). The speed is still argument to consider the boost solution though. I will look into its license to see whether I can just add the few needed files to my project - I like the "Checkout -> F5 -> ready to use" as it is now.Sussex
@Aphex: Sorry, I should have been more clear: the Boost library does a fine job of reducing range -- but the Mersenne twister itself does nothing to cure the problem being cited by the OP.Blubber
you may also find a mersenne twister implementation at tr1/randomWeathered
W
12
int RandU(int nMin, int nMax)
{
    return nMin + (int)((double)rand() / (RAND_MAX+1) * (nMax-nMin+1));
}

This is a mapping of 32768 integers to (nMax-nMin+1) integers. The mapping will be quite good if (nMax-nMin+1) is small (as in your requirement). Note however that if (nMax-nMin+1) is large, the mapping won't work (For example - you can't map 32768 values to 30000 values with equal probability). If such ranges are needed - you should use a 32-bit or 64-bit random source, instead of the 15-bit rand(), or ignore rand() results which are out-of-range.

Wootan answered 15/2, 2011 at 20:47 Comment(2)
Despite its unpopularity, this is also what I use for my non-scientific projects. Easy to understand (you don't need a math degree) and performs adequately (never had to profile any code using it). :) In case of large ranges, I guess we could string two rand() values together and get a 30-bit value to work with (assuming RAND_MAX = 0x7fff, i.e. 15 random bits)Choric
change RAND_MAX to (double) RAND_MAX to avoid integer overflow warning.Herwin
A
6

Assume min and max are integer values,

  • [ and ] means include this value,
  • ( and ) means do not include this value,

using the above to get the right value using C++'s rand().

Reference:

For ()[] define, visit Interval (mathematics).

For the rand and srand function or RAND_MAX define, visit std::rand.

[min, max]

int randNum = rand() % (max - min + 1) + min

(min, max]

int randNum = rand() % (max - min) + min + 1

[min, max)

int randNum = rand() % (max - min) + min

(min, max)

int randNum = rand() % (max - min - 1) + min + 1
Archaean answered 1/12, 2017 at 7:37 Comment(1)
This isn't uniform. See Why do people say there is modulo bias when using a random number generator?.Jaimeejaimes
F
5

Here is an unbiased version that generates numbers in [low, high]:

int r;
do {
  r = rand();
} while (r < ((unsigned int)(RAND_MAX) + 1) % (high + 1 - low));
return r % (high + 1 - low) + low;

If your range is reasonably small, there is no reason to cache the right-hand side of the comparison in the do loop.

Fleeta answered 15/2, 2011 at 20:7 Comment(4)
IMO, none of the solutions presented there is really much improvement. His loop-based solution works, but likely to be quite inefficient, especially for a small range like the OP discusses. His uniform deviate solution doesn't actually produce uniform deviates at all. At most it kind of camouflages the lack of uniformity.Blubber
@Jerry: Please check the new version.Fleeta
I'm a bit uncertain about that working correctly. It might, but correctness doesn't seem obvious, at least to me.Blubber
@Jerry: Here's my reasoning: Assume the range is [0, h) for simplicity. Calling rand() has RAND_MAX + 1 possible return values; taking rand() % h collapses (RAND_MAX + 1) / h of them to each of the h output values, except that (RAND_MAX + 1) / h + 1 of them are mapped to the values that are less than (RAND_MAX + 1) % h (because of the last partial cycle through the h outputs). We therefore remove (RAND_MAX + 1) % h possible outputs to get an unbiased distribution.Fleeta
R
4

I recommend the Boost.Random library. It's super detailed and well-documented, lets you explicitly specify what distribution you want, and in non-cryptographic scenarios can actually outperform a typical C library rand implementation.

Renter answered 15/2, 2011 at 20:15 Comment(0)
A
2

The following is the idea presented by Walter. I wrote a self-contained C++ class that will generate a random integer in the closed interval [low, high]. It requires C++11.

#include <random>

// Returns random integer in closed range [low, high].
class UniformRandomInt {

    std::random_device _rd{};
    std::mt19937 _gen{_rd()};
    std::uniform_int_distribution<int> _dist;

    public:

        UniformRandomInt() {
            set(1, 10);
        }
        UniformRandomInt(int low, int high) {
            set(low, high);
        }

        // Set the distribution parameters low and high.
        void set(int low, int high) {
            std::uniform_int_distribution<int>::param_type param(low, high);
            _dist.param(param);
        }

        // Get random integer.
        int get() {
            return _dist(_gen);
        }

};

Example usage:

UniformRandomInt ur;
ur.set(0, 9); // Get random int in closed range [0, 9].

int value = ur.get()
Aquaplane answered 9/9, 2021 at 21:47 Comment(0)
S
1

In answers to this question, rejection sampling was already addressed, but I wanted to suggest one optimization based on the fact that rand() % 2^something does not introduce any bias as already mentioned above.

The algorithm is really simple:

  • calculate the smallest power of 2 greater than the interval length
  • randomize one number in that "new" interval
  • return that number if it is less than the length of the original interval
    • reject otherwise

Here's my sample code:

int randInInterval(int min, int max) {
    int intervalLen = max - min + 1;
    //now calculate the smallest power of 2 that is >= than `intervalLen`
    int ceilingPowerOf2 = pow(2, ceil(log2(intervalLen)));

    int randomNumber = rand() % ceilingPowerOf2; //this is "as uniform as rand()"

    if (randomNumber < intervalLen)
        return min + randomNumber;      //ok!
    return randInInterval(min, max);    //reject sample and try again
}

This works well especially for small intervals, because the power of 2 will be "nearer" to the real interval length, and so the number of misses will be smaller.

PS: Obviously avoiding the recursion would be more efficient (there isn't any need to calculate over and over the log ceiling...), but I thought it was more readable for this example.

Sigfrid answered 21/1, 2018 at 15:37 Comment(0)
R
1

Notice that in most suggestions the initial random value that you have got from rand() function, which is typically from 0 to RAND_MAX, is simply wasted. You are creating only one random number out of it, while there is a sound procedure that can give you more.

Assume that you want [min,max] region of integer random numbers. We start from [0, max-min]

Take base b=max-min+1

Start from representing a number you got from rand() in base b.

That way you have got floor(log(b,RAND_MAX)) because each digit in base b, except possibly the last one, represents a random number in the range [0, max-min].

Of course the final shift to [min,max] is simple for each random number r+min.

int n = NUM_DIGIT-1;
while(n >= 0)
{
    r[n] = res % b;
    res -= r[n];
    res /= b;
    n--;
}

If NUM_DIGIT is the number of digit in base b that you can extract and that is

NUM_DIGIT = floor(log(b,RAND_MAX))

then the above is as a simple implementation of extracting NUM_DIGIT random numbers from 0 to b-1 out of one RAND_MAX random number providing b < RAND_MAX.

Ribwort answered 25/1, 2020 at 22:6 Comment(0)
B
0

The formula for this is very simple, so try this expression,

 int num = (int) rand() % (max - min) + min;  
 //Where rand() returns a random number between 0.0 and 1.0
Buff answered 6/8, 2014 at 10:10 Comment(2)
Whole problem was using C/C++'s rand which returns integer in a range specified by the runtime. As demonstrated in this thread, mapping random integers from [0, RAND_MAX] to [MIN, MAX] isn't entirely straightforward, if you want to avoid destroying their statistical properties or performance. If you have doubles in range [0, 1], the mapping is easy.Sussex
Your answer is wrong, you should use modulus instead: int num = (int) rand() % (max - min) + min;Eck
L
-1

The following expression should be unbiased if I am not mistaken:

std::floor( ( max - min + 1.0 ) * rand() ) + min;

I am assuming here that rand() gives you a random value in the range between 0.0 and 1.0 not including 1.0 and that max and min are integers with the condition that min < max.

Lezlielg answered 19/7, 2013 at 18:25 Comment(1)
std::floor returns double, and we need an integer value here. I would just cast to int instead of using std::floor.Valonia

© 2022 - 2024 — McMap. All rights reserved.