Uniformity of random numbers taken modulo N
Asked Answered
I

4

19

One common way of choosing a random number in [0, n) is to take the result of rand() modulo n: rand() % n. However, even if the results returned by the available rand() implementation are fully uniform, shouldn't there be a problem with the uniformity of the resulting [0, n) numbers when RAND_MAX + 1 does not divide evenly by n? E.g. suppose RAND_MAX is 2, and n is 2. Then out of 3 possible rand() outputs: 0, 1 and 2, we get 0, 1 and 0 respectively when we use them modulo n. Therefore the output will not be uniform at all.

Is this a real problem in practice? What is a better way of choosing random numbers in [0, n) uniformly deriving from rand() output, preferably without any floating point arithmetic?

Illassorted answered 27/10, 2012 at 21:46 Comment(5)
possible duplicate of What is the optimal algorithm for generating an unbiased random integer within a range?Boudreaux
Not quite a duplicate, since the bias issue is taken for granted and the question is "is this really a problem in practice?" I've tried to quantify the bias in my answer.Jollenta
See: eternallyconfuzzled.com/arts/jsw_art_rand.aspxThousand
@MuriloVasconcelos: yes, I've just accepted one.Illassorted
Possible duplicate of Why do people say there is modulo bias when using a random number generator?Feinleib
N
10

You are correct, rand() % N is not precisely uniformly distributed. Precisely how much that matters depends on the range of numbers you want and the degree of randomness you want, but if you want enough randomness that you'd even care about it you don't want to use rand() anyway. Get a real random number generator.

That said, to get a real random distribution, mod to the next power of 2 and sample until you get one in the range you want (e.g. for 0-9, use while(n = rand()%0x10 > 10);).

Nursemaid answered 27/10, 2012 at 22:3 Comment(14)
+1 for the workaround, but usually the low bits of rand() have very bad entropy. It would be smarter to use the high bits.Merrythought
Why would you not consider rand() a real RNG?Illassorted
@Illassorted It's just not a very good (high-entropy) RNG. As R says, the low bits are particularly bad, but the whole thing is not as random as it could be. It's If you want better randomness, use something like the Gnu Scientific Library (GSL).Nursemaid
@Kevin: Are you judging any particular implementation of rand(), i.e. the one found in modern glibc?Illassorted
@Nursemaid — Actually, rand() % N is perfectly uniformly distributed when N is a power of 2, provided that rand() is also uniformly distributed, which it had darn well better be. So it's not entirely correct to say that rand() % N is not uniformly distributed, because it depends on N.Comical
@R.. — That might have been true 20 years ago. Not true today.Comical
@ToddLehman: Citation needed. I suspect if you do a survey of rand implementations you'll find LCG is the most common. glibc is the prime exception and it has conformance bugs (sharing state with random last I checked).Merrythought
@R.. — But just because a given implementation uses LCG at its core does not mean that the generator has poor entropy in the lower bits. It's trivial to work around the issue by invoking the LCG twice and bit shifting and xor'ing. For example, in a 64-bit PRNG: static uint64_t x = 0; uint64_t r = (x = lcg(x)); r ^= (x = lcg(x)) >> 32; return r; But yes, OK, citation is needed for the claim.Comical
@ToddLehman On my system (OSX 10.10), the low bits are certainly not uniform. Run this on the command line for a live-updating count: pastebin.com/D5r7we3HNursemaid
@ToddLehman: I meant my suspicion is that most implementations use raw LCG output, not just LCG "at [the] core".Merrythought
@R.. — Let's hope not!Comical
@Nursemaid — Cool program. Compiled it with Apple LLVM version 6.0 (clang-600.0.54) on OS X 10.9 & 10.10. Looks super uniform to me (I tried it with N=32, 16, 4, and 2).Comical
@ToddLehman: I think your hopes are misplaced. rand is hopelessly bad (it can generate at most UINT_MAX possible sequences due to srand's signature) and trying to make it "better" only encourages people to use it. Rolling your own decent (non-cryptographic) PRNG is just a few lines and that's really what you should do - beyond sequence quality, it gives you useful properties like lack of global state, resumability, and cross platform same-sequence-for-same-seed.Merrythought
@R.. — Good point. I see a couple people have written "rand() considered harmful" articles. Personally, I haven't used rand since the 1980s because of its terrible range. Lately, been using a two-phase 64-bit LCG with values from Knuth and am pretty satisfied with that (I don't need crypto strength randomness, just enough for games). I guess I was arguing more in theory, but I think you're absolutely right that people should eschew rand because it can't be trusted. Rolling your own is tricky, but at least it's a known thing.Comical
J
8

That depends on:

  • The value of RAND_MAX
  • Your value of N

Let us assume your RAND_MAX is 2^32. If N is rather small (let's say 2) then the bias is 1 / 2^31 -- or too small to notice.

But if N is quite a bit larger, say 2^20, then the bias is 1 / 2^12, or about 1 in 4096. A lot bigger, but still pretty small.

Jollenta answered 27/10, 2012 at 21:55 Comment(4)
On the contrary, I think the answer is right on point. We're assuming a PRNG that generates numbers with perfect distribution. The question is, do we care about the bias? I tried to provide a way to quantify the bias, so the asker can determine for himself whether it is tolerable to him. This is all very language non-specific.Jollenta
Some systems have a RAND_MAX of 0xffff, leading to a much bigger bias.Nursemaid
Worse. Visual C++ implementations have RAND_MAX==0x7FFF, left over from 16-bit MSC 3.0 on MS-DOS.Taxpayer
@Jollenta can you direct me to link/resource for a formal way of calculating the bias?Lemal
M
1

One approach you can do is the following:

Knowing the value of N, you make R_MAX = ((RAND_MAX + 1) / N) * N; for uniformity.

So you can do your custom rand() function:

int custom_rand(int mod) {
    int x = rand();
    const int R_MAX = ((RAND_MAX + 1) / mod) * mod;    

    while (x > R_MAX) { // discard the result if it is bigger
        x = rand();
    }

    return (x % mod);
}
Meliorate answered 27/10, 2012 at 22:11 Comment(3)
what if rand_max is 2^32-1?Bloomington
When RAND_MAX == INT_MAX (a common occurrence). RAND_MAX + 1 --> undefined behavior - (maybe INT_MIN).Belongings
I think you actually want R_MAX = (RAND_MAX / N) * N; and while (x >= R_MAX), otherwise you have a bias for producing more zeros because R_MAX % mod == 0. Also, a do { x = rand(); } while (x >= R_MAX) would be better here, because then you wouldn't be writing x = rand(); twice.Comical
T
1

There are two problems with using remainder (% is not a "modulo" operator in C) to a uniform random number over a reduced range. First is that there is a slight bias toward smaller numbers (mentioned above) and second that typical PRNGs tend to be less random in the low order bits. I seem to recall that from Knuth (The Art of Computer Programming, Vol II, Seminumerical Algorithms) along with the claim that (after translating from MIX to C) rand()%2 is a poor source of random single bits. It's better to pick (rand() > RAND_MAX/2) (or test a high-order bit, if RAND_MAX is nearly a power of 2.)

The remainder should be good enough casual use on small intervals. Avoid it for simulations. Actually, avoid rand() altogether for large simulations or "Monte Carlo" computations. Implementations tend to have a period on the order of 2^32 or less. It's not hard to exceed 4 billion trials on a 2+ GHz processor.

Taxpayer answered 27/10, 2012 at 22:42 Comment(2)
Typical LCGs (Linear Congruential Generators) do tend to be less random in the lower bits, yes. For example, multiplying by an odd number and adding an odd number always inverts the least significant bit when you have a power-of-two modulus. But it doesn't follow that typical PRNGs have this problem. All one has to do is run the LCG twice and bit-shift and xor to mix things up nicely, or other tricks. Any C library providing a rand() having less randomness in the lower bits is severely broken.Comical
@ToddLehman: From the C11 specification, in a footnote to the rand() function description: "295) There are no guarantees as to the quality of the random sequence produced and some implementations are known to produce sequences with distressingly non-random low-order bits. Applications with particular requirements should use a generator that is known to be sufficient for their needs." So, such a rand() function is not considered seriously broken by ISO. Distressing, yes, but not broken.Taxpayer

© 2022 - 2024 — McMap. All rights reserved.