If we seed c++11 mt19937 as the same on different machines, will we get the same sequence of random numbers
Asked Answered
S

3

12

Inspired from this and the similar questions, I want to learn how does mt19937 pseudo-number generator in C++11 behaves, when in two separate machines, it is seeded with the same input.

In other words, say we have the following code;

std::mt19937 gen{ourSeed};
std::uniform_int_distribution<int> dest{0, 10000};
int randNumber = dist(gen);

If we try this code on different machines at different times, will we get the same sequence of randNumber values or a different sequence each time ?

And in either case, why this is the case ?

A further question:

Regardless of the seed, will this code generate randomly numbers infinitely ? I mean for example, if we use this block of code in a program that runs for months without stopping, will there be any problem in the generation of the number or in the uniformity of the numbers ?

Sonni answered 11/2, 2018 at 10:7 Comment(5)
IIRC, the C++ standard doesn't specify the algorithm that uniform_int_distribution must use to produce random values from the entropy source.Capybara
@OliverCharlesworth I'm sorry, I couldn't understand what you mean.Sonni
There's no guarantee you'll get the same sequence of randNumbers. You would get the same sequence of numbers from invoking gen() directly, however. The Mersenne Twister algorithm is fully deterministic given a seed.Decoction
You should specify what exactly you mean by "on different machines". A built executable, or the same source being built into one?Decoction
@StoryTeller Consider both.Sonni
E
20

The generator will generate the same values.

The distributions may not, at least with different compilers or library versions. The standard did not specify their behaviour to that level of detail. If you want stability between compilers and library versions, you have to roll your own distribution.

Barring library/compiler changes, that will return the same values in the same sequence. But if you care write your own distribution.

...

All PRNGs have patterns and periods. mt19937 is named after its period of 2^19937-1, which is unlikely to be a problem. But other patterns can develop. MT PRNGs are robust against many statistical tests, but they are not crytographically secure PRNGs.

So it being a problem if you run for months will depend on specific details of what you'd find to be a problem. However, mt19937 is going to be a better PRNG than anything you are likely to write yourself. But assume attackers can predict its future behaviour from past evidence.

Excrete answered 11/2, 2018 at 10:15 Comment(4)
What exactly do you mean by the period ? I mean what does that number represent ?Sonni
@onur Period, like period of a sin wave is 2 pi. It repeats itself after 2^19937 -1 steps. As 2^10 is about 10^3, this is 1 followed by 6000 0s (approx). At 1 GHz (10^9) it would take 10^6000 (approx) seconds to repeat itself. I repeated that number on purpose -- 1 billion is trivial next to it. So is the time since the big bang. Note that a long period is just one of many properties you want a PRNG to have, not evidence of perfection,Excrete
I see, thanks both for your answer, and further explanations.Sonni
You can also just use boost distributions with standard generators rather than writing your own.Inactive
L
3

Regardless of the seed, will this code generate randomly numbers infinitely ? I mean for example, if we use this block of code in a program that runs for months without stopping, will there be any problem in the generation of the number or in the uniformity of the numbers ?

RNG we deal with with standard C++ are called pseudo-random RNGs. By definition, this is pure computational device, with multi-bit state (you could think about state as large bit vector) and three functions:

  • state seed2state(seed);
  • state next_state(state);
  • uint(32|64)_t state2output(state);

and that is it. Obviously, state has finite size, 19937 bits in case of MT19937, so total number of states are 219937 and thus MT19937 next_state() function is a periodic one, with max period no more than 219937. This number is really HUGE, and most likely more than enough for typical simulation

But output is at max 64 bits, so output space is 264. It means that during large run any particular output appears quite a few times. What matters is when not only some 64bit number appears again, but number after that, and after that and after that - this is when you know RNG period is reached.

If we try this code on different machines at different times, will we get the same sequence of randNumber values or a different sequence each time?

Generators are defined rather strictly, and you'll get the same bit stream. For example for MT19937 from C++ standard (https://timsong-cpp.github.io/cppwp/rand)

class mersenne_twister_engine {
...
static constexpr result_type default_seed = 5489u;
...

and function seed2state described as (https://timsong-cpp.github.io/cppwp/rand#eng.mers-6)

Effects: Constructs a mersenne_­twister_­engine object. Sets X−n to value mod 2w. Then, iteratively for i=−n,…,−1, sets Xi to ...

Function next_state is described as well together with test value at 10000th invocation. Standard says (https://timsong-cpp.github.io/cppwp/rand#predef-3)

using mt19937 = mersenne_twister_engine<uint_fast32_t,32,624,397,31,0x9908b0df,11,0xffffffff,7,0x9d2c5680,15,0xefc60000,18,1812433253>;


3
#Required behavior: The 10000th consecutive invocation of a default-constructed object
of type mt19937 shall produce the value 4123659995.

Big four compilers (GCC, Clang, VC++, Intel C++) I used produced same MT19937 output.

Distributions, from the other hand, are not specified that well, and therefore vary between compilers and libraries. If you need portable distributions you either roll your own or use something from Boost or similar libraries

Lophophore answered 11/2, 2018 at 18:40 Comment(0)
F
0

Any pseudo RNG which takes a seed will give you the same sequence for the same seed every time, on every machine. This happens since the generator is just a (complex) mathematical function, and has nothing actually random about it. Most times when you want to randomize, you take the seed from the system clock, which constantly changes so each run will be different. It is useful to have the same sequence in computer games for example when you have a randomly generated world and want to generate the exact same one, or to avoid people cheating using save games in a game with random chances.

Fulmination answered 11/2, 2018 at 10:15 Comment(5)
But the C++ standard doesn't specify the function for uniform_int_distribution...Capybara
@OliverCharlesworth As long as you run a compiled version, it doesn't matter. This will only matter if you compile on different machines with different compilersFulmination
I guess it's not entirely clear what the OP means by "try this code on different machines" - are they talking about a binary, or source code?Capybara
@OliverCharlesworth Consider both of them, what would be the difference ?Sonni
@onurcanbektas If you are running the same binary compiled code, it will behave predictably. If you compile on different compilers, you might get different behavior, but most likely not, especially with something like the mersenne twisterFulmination

© 2022 - 2024 — McMap. All rights reserved.