Boost Mersenne Twister: how to seed with more than one value?
Asked Answered
I

2

6

I'm using the boost mt19937 implementation for a simulation.

The simulation needs to be reproducible, and that means storing and potentially reusing the RNG seeds later. I'm using the windows crypto api to generate the seed values because I need an external source for the seeds and not because of any particular guarantees of randomness. The output of any simulation run will have a note including the RNG seed - so the seed needs to be reasonably short. On the other hand, as part of the analysis of the simulation, I'll be comparing several runs - but to be sure that these runs are actually different, I'll need to use different seeds - so the seed needs to be long enough to avoid accidental collisions.

I've determined that 64-bits of seeding should suffice; the chance of a collision will reach 50% after about 2^32 runs - that probability is low enough that the average error caused by it is negligible to me. Using just 32-bits of seed is tricky; the chance of a collision reaches 50% already after 2^16 runs; and that's a little too likely for my tastes.

Unfortunately, the boost implementation either seeds with a full state vector - which is far, far too long - or a single 32-bit unsigned long - which isn't ideal.

How can I seed the generator with more than 32-bits but less than a full state vector? I tried just padding the vector or repeating the seeds to fill the state vector, but even a cursory glance at the results shows that that generates poor results.

Inhale answered 26/5, 2010 at 16:40 Comment(3)
You get just get the current state and modify it...Volcanic
Your collision math is not quite correct. For instance, for a 64-bit seed, the probability of a duplicate is >= 0.5 after 77163 != 65536 runs.Bridgettebridgewater
The collision math is just an easy approximation - I assume you mean a 32-bit seed, incidentally, not a 64-bit seed?Inhale
S
3

Your assumptions are mistaken. For a simulation, you don't need cryptographically strong seeds. In fact, using seeds 1,2,3,4, etcetera is often a better idea. The output values of the Mersenne Twister will be uncorrelated, yet nobody will question whether you cherry-picked your seeds to get desired simulation outputs.

For other people who do have a real need, one easy way is to discard the first (seed>>32) values generated. This gives you about log2(seed>>32) extra bits of state. However, it only works efficiently if you need a few extra bits. Adding 32 bits this way is probably too slow.

A faster algorithm is to generate the full state vector for the good random generator. The solutions mentioned in the question (repeating or padding) aren't so good due to the limited randomness in the resulting state vector. But if you fill the initial state vector from the output of mersenne_twister(seed1) ^ mersenne_twister(seed2), this is not an issue at all.

Snobbish answered 27/5, 2010 at 9:33 Comment(8)
I'm not worried about being cryptographically secure; I just need to get seed values from somewhere. The disadvantage of using sequential seeds is partially management: if I use some predetermined sequence of seeds that means that every run will be identical unless I somehow administer those seeds (i.e. use a file somewhere to ensure I don't reuse seeds accidentally).Inhale
A problem with mersenne_twister(seed1) ^ mersenne_twister(seed2) is that the seed values may be identical, in which case output is, in which case the XOR of the output is 0 - filling the entire state vector with zeros results in a stuck twister (I'm not sure it can escape at all, but certainly not quickly).Inhale
However, I could use your idea of using sequential seeds if I distinguished between individually inspectable runs (for which a 32-bit seed is fine) and aggregated analysis runs (for which sequential seeds are perfectly fine).Inhale
I'm not sure I can really mark this as an answer (yet?) since it doesn't actually get me 64-bit seeding; but you certainly did solve my direct need, +1 and lots of thanks for that!Inhale
Note that you are free to pick the pair of 32 bits seeds; just don't pick identical ones c.q. while your crypto source gives two identical ones, discard one and get another. This means you lose 2^32 out of 2^64 possible seeds - no big deal.Snobbish
Since XOR is commutative, I'd be losing out another bit - but that's fair enough.Inhale
MT isn't cryptographically strong anyway, but it is highly respected in the physics Monte Carlo community as it can provide very large uncorrelated tuples.Dipnoan
Can you clarify what you mean by mersenne_twister(seed1) ^ mersenne_twister(seed2). Both seed1 and seed2 are numbers in the same 32 bit space, and so are the xor of the first values. Do you mean to loop through and fill the state of the new MT as [ v for v in m1()^m2() ]? It's not clear to me that that does not have the same problem. The values of synced pairs from m1(), m2() are from the same 32 bit seed space, and m1()^m2() will also be. But, if you fill the first half of the new mt state with m1(), and the other half from m2() you fill it with indepenent data from a 64 bit seed space.Lampkin
D
3

Looking at boost sources of mersenne_twister template:

  void seed(UIntType value)
  {
    // New seeding algorithm from 
    // http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/MT2002/emt19937ar.html
    // In the previous versions, MSBs of the seed affected only MSBs of the
    // state x[].
    const UIntType mask = ~0u;
    x[0] = value & mask;
    for (i = 1; i < n; i++) {
      // See Knuth "The Art of Computer Programming" Vol. 2, 3rd ed., page 106
      x[i] = (1812433253UL * (x[i-1] ^ (x[i-1] >> (w-2))) + i) & mask;
    }
  }

For mt19937 UIntType is uint32_t, w is 32. For 64-bit seed, maybe you could use the lower 32 bits to calculate every even index of the state (x) and the higher 32 bits to calculate the odd indices of the state, using that algorithm.

(This is cargo cult suggestion though)

Drawee answered 26/5, 2010 at 22:40 Comment(0)
S
3

Your assumptions are mistaken. For a simulation, you don't need cryptographically strong seeds. In fact, using seeds 1,2,3,4, etcetera is often a better idea. The output values of the Mersenne Twister will be uncorrelated, yet nobody will question whether you cherry-picked your seeds to get desired simulation outputs.

For other people who do have a real need, one easy way is to discard the first (seed>>32) values generated. This gives you about log2(seed>>32) extra bits of state. However, it only works efficiently if you need a few extra bits. Adding 32 bits this way is probably too slow.

A faster algorithm is to generate the full state vector for the good random generator. The solutions mentioned in the question (repeating or padding) aren't so good due to the limited randomness in the resulting state vector. But if you fill the initial state vector from the output of mersenne_twister(seed1) ^ mersenne_twister(seed2), this is not an issue at all.

Snobbish answered 27/5, 2010 at 9:33 Comment(8)
I'm not worried about being cryptographically secure; I just need to get seed values from somewhere. The disadvantage of using sequential seeds is partially management: if I use some predetermined sequence of seeds that means that every run will be identical unless I somehow administer those seeds (i.e. use a file somewhere to ensure I don't reuse seeds accidentally).Inhale
A problem with mersenne_twister(seed1) ^ mersenne_twister(seed2) is that the seed values may be identical, in which case output is, in which case the XOR of the output is 0 - filling the entire state vector with zeros results in a stuck twister (I'm not sure it can escape at all, but certainly not quickly).Inhale
However, I could use your idea of using sequential seeds if I distinguished between individually inspectable runs (for which a 32-bit seed is fine) and aggregated analysis runs (for which sequential seeds are perfectly fine).Inhale
I'm not sure I can really mark this as an answer (yet?) since it doesn't actually get me 64-bit seeding; but you certainly did solve my direct need, +1 and lots of thanks for that!Inhale
Note that you are free to pick the pair of 32 bits seeds; just don't pick identical ones c.q. while your crypto source gives two identical ones, discard one and get another. This means you lose 2^32 out of 2^64 possible seeds - no big deal.Snobbish
Since XOR is commutative, I'd be losing out another bit - but that's fair enough.Inhale
MT isn't cryptographically strong anyway, but it is highly respected in the physics Monte Carlo community as it can provide very large uncorrelated tuples.Dipnoan
Can you clarify what you mean by mersenne_twister(seed1) ^ mersenne_twister(seed2). Both seed1 and seed2 are numbers in the same 32 bit space, and so are the xor of the first values. Do you mean to loop through and fill the state of the new MT as [ v for v in m1()^m2() ]? It's not clear to me that that does not have the same problem. The values of synced pairs from m1(), m2() are from the same 32 bit seed space, and m1()^m2() will also be. But, if you fill the first half of the new mt state with m1(), and the other half from m2() you fill it with indepenent data from a 64 bit seed space.Lampkin

© 2022 - 2024 — McMap. All rights reserved.