How to succinctly, portably, and thoroughly seed the mt19937 PRNG?

Asked 12/7, 2017 at 23:44 Answered 28/9, 2022 at 9:16

143

I seem to see many answers in which someone suggests using <random> to generate random numbers, usually along with code like this:

std::random_device rd;  
std::mt19937 gen(rd());
std::uniform_int_distribution<> dis(0, 5);
dis(gen);

Usually this replaces some kind of "unholy abomination" such as:

srand(time(NULL));
rand()%6;

We might criticize the old way by arguing that time(NULL) provides low entropy, time(NULL) is predictable, and the end result is non-uniform.

But all of that is true of the new way: it just has a shinier veneer.

rd() returns a single unsigned int. This has at least 16 bits and probably 32. That's not enough to seed MT's 19937 bits of state.
Using std::mt19937 gen(rd());gen() (seeding with 32 bits and looking at the first output) doesn't give a good output distribution. 7 and 13 can never be the first output. Two seeds produce 0. Twelve seeds produce 1226181350. (Link)
std::random_device can be, and sometimes is, implemented as a simple PRNG with a fixed seed. It might therefore produce the same sequence on every run. (Link) This is even worse than time(NULL).

Worse yet, it is very easy to copy and paste the foregoing code snippets, despite the problems they contain. Some solutions to the this require acquiring largish libraries which may not be suitable to everyone.

In light of this, my question is How can one succinctly, portably, and thoroughly seed the mt19937 PRNG in C++?

Given the issues above, a good answer:

Must fully seed the mt19937/mt19937_64.
Cannot rely solely on std::random_device or time(NULL) as a source of entropy.
Should not rely on Boost or other libaries.
Should fit in a small number of lines such that it would look nice copy-pasted into an answer.

Thoughts

My current thought is that outputs from std::random_device can be mashed up (perhaps via XOR) with time(NULL), values derived from address space randomization, and a hard-coded constant (which could be set during distribution) to get a best-effort shot at entropy.
std::random_device::entropy() does not give a good indication of what std::random_device might or might not do.

Edulcorate answered 12/7, 2017 at 23:44 Comment(19)

I don't know what the answer is but https://mcmap.net/q/67118/-why-do-i-get-the-same-sequence-for-every-run-with-std-random_device-with-mingw-gcc4-8-1 suggests a third-party library is currently your best bet. https://mcmap.net/q/67120/-what-is-the-difference-between-using-std-random_device-with-prng-e-g-std-mt19937-and-without agrees with your concern. My first reaction is that std::mt19937 is a nice portable framework that allows you to plug in external, non-standard sources of entropy. Note that rand() did not allow you to do that; perhaps that's the benefit? https://mcmap.net/q/67121/-c-11-how-to-set-seed-using-lt-random-gt agrees that it's down to your implementation, really, which frankly makes sense. – Solitta 12/7, 2017 at 23:50

My personal thought was that perhaps values could be drawn from std::random_device, time(NULL), and function addresses, then XORed together to produce a kind of best-effort entropy source. – Edulcorate 12/7, 2017 at 23:53

It would be nice if there was function such as does_random_device_actually_work() so one could at least degrade gracefully, or produce warnings or errors for the user. – Maffei 13/7, 2017 at 0:0

@NeilButterworth: True that. And the std::random_device::entropy() function would seem well-suited to that purpose. But no. – Edulcorate 13/7, 2017 at 0:2

The proper solution is not short, the short solution won't be proper. My approach I use in my seed11 library is basically to implement std::random_device properly on the platforms you're planning to run your program on, and provide a helper function that creates a seeded generator (seed11::make_seeded<std::mt19937>()) – Approver 13/7, 2017 at 0:5

@Edulcorate I suppose you could call on the device a 100 (say) times and see if you always get the same value back? – Maffei 13/7, 2017 at 0:12

@NeilButterworth: This doesn't identify if a statically-seeded PRNG is being used, as is the case in MinGW. – Edulcorate 13/7, 2017 at 0:14

@Edulcorate Even if you re-create the device each time? – Maffei 13/7, 2017 at 0:18

@NeilButterworth: I see your point. Indeed, this may detect the issue, but I'm looking for a robust response to it. – Edulcorate 13/7, 2017 at 0:25

random_utils.hpp is linked from another pcg-random blog post on the subject and does much of what you mention in your "thoughts" section. – Faggot 13/7, 2017 at 0:51

Are there any portable ways to turn the address of a function into an integer? Cast to void* then INT_PTR? – Impalpable 13/7, 2017 at 1:54

Does mt19937::seed not do what you want? – Seth 13/7, 2017 at 3:38

@Gambit: No. it's equivalent to passing a value to the constructor. The difficulty is in finding a good unpredictable value to pass. – Iota 13/7, 2017 at 3:44

Aside: your second bullet doesn't add anything new. It is not surprising that you found some value that appears 12 times. You should expect there to be just over three values that appear exactly 12 times, assuming that you have 2^32 independent, uniformly random samples. – Embrangle 13/7, 2017 at 7:2

@Hurkyl: True, but that's not really the point :) If you run a program using this scheme lots of times, and take the first number it outputs, you want to get an (at least approximately) uniform distribution, whereas in fact the output will in fact be skewed in a way that will make a noticeable difference to many applications. If you use a seed with the full possible entropy, those differences will be smoothed out much more. – Salicin 13/7, 2017 at 9:21

This later blog post in the series you linked should give you lots of good information. – Haricot 13/7, 2017 at 9:21

If you need cryptographic randomness, you shouldn't be using std::mt19937. It leaks internal state much too fast to be used for cryptographic purposes. – Midway 14/7, 2017 at 1:20

"unholy abomination" +1 – Atcliffe 8/8, 2018 at 14:39

@Edulcorate std::random_device in MinGW was fixed a while back: gcc.gnu.org/bugzilla/show_bug.cgi?id=85494 – Chaco 1/10, 2023 at 22:29

I would argue the greatest flaw with std::random_device is the that it is allowed a deterministic fallback if no CSPRNG is available. This alone is a good reason not to seed a PRNG using std::random_device, since the bytes produced may be deterministic. It unfortunately doesn't provide an API to find out when this happens, or to request failure instead of low-quality random numbers.

That is, there is no completely portable solution: however, there is a decent, minimal approach. You can use a minimal wrapper around a CSPRNG (defined as sysrandom below) to seed the PRNG.

Windows

You can rely on CryptGenRandom, a CSPRNG. For example, you may use the following code:

bool acquire_context(HCRYPTPROV *ctx)
{
    if (!CryptAcquireContext(ctx, nullptr, nullptr, PROV_RSA_FULL, 0)) {
        return CryptAcquireContext(ctx, nullptr, nullptr, PROV_RSA_FULL, CRYPT_NEWKEYSET);
    }
    return true;
}


size_t sysrandom(void* dst, size_t dstlen)
{
    HCRYPTPROV ctx;
    if (!acquire_context(&ctx)) {
        throw std::runtime_error("Unable to initialize Win32 crypt library.");
    }

    BYTE* buffer = reinterpret_cast<BYTE*>(dst);
    if(!CryptGenRandom(ctx, dstlen, buffer)) {
        throw std::runtime_error("Unable to generate random bytes.");
    }

    if (!CryptReleaseContext(ctx, 0)) {
        throw std::runtime_error("Unable to release Win32 crypt library.");
    }

    return dstlen;
}

Unix-Like

On many Unix-like systems, you should use /dev/urandom when possible (although this is not guaranteed to exist on POSIX-compliant systems).

size_t sysrandom(void* dst, size_t dstlen)
{
    char* buffer = reinterpret_cast<char*>(dst);
    std::ifstream stream("/dev/urandom", std::ios_base::binary | std::ios_base::in);
    stream.read(buffer, dstlen);

    return dstlen;
}

Other

If no CSPRNG is available, you might choose to rely on std::random_device. However, I would avoid this if possible, since various compilers (most notably, MinGW) implement it with as a PRNG (in fact, producing the same sequence every time to alert humans that it's not properly random).

Seeding

Now that we have our pieces with minimal overhead, we can generate the desired bits of random entropy to seed our PRNG. The example uses (an obviously insufficient) 32-bits to seed the PRNG, and you should increase this value (which is dependent on your CSPRNG).

std::uint_least32_t seed;    
sysrandom(&seed, sizeof(seed));
std::mt19937 gen(seed);

Comparison To Boost

We can see parallels to boost::random_device (a true CSPRNG) after a quick look at the source code. Boost uses MS_DEF_PROV on Windows, which is the provider type for PROV_RSA_FULL. The only thing missing would be verifying the cryptographic context, which can be done with CRYPT_VERIFYCONTEXT. On *Nix, Boost uses /dev/urandom. IE, this solution is portable, well-tested, and easy-to-use.

Linux Specialization

If you're willing to sacrifice succinctness for security, getrandom is an excellent choice on Linux 3.17 and above, and on recent Solaris. getrandom behaves identically to /dev/urandom, except it blocks if the kernel hasn't initialized its CSPRNG yet after booting. The following snippet detects if Linux getrandom is available, and if not falls back to /dev/urandom.

#if defined(__linux__) || defined(linux) || defined(__linux)
#   // Check the kernel version. `getrandom` is only Linux 3.17 and above.
#   include <linux/version.h>
#   if LINUX_VERSION_CODE >= KERNEL_VERSION(3,17,0)
#       define HAVE_GETRANDOM
#   endif
#endif

// also requires glibc 2.25 for the libc wrapper
#if defined(HAVE_GETRANDOM)
#   include <sys/syscall.h>
#   include <linux/random.h>

size_t sysrandom(void* dst, size_t dstlen)
{
    int bytes = syscall(SYS_getrandom, dst, dstlen, 0);
    if (bytes != dstlen) {
        throw std::runtime_error("Unable to read N bytes from CSPRNG.");
    }

    return dstlen;
}

#elif defined(_WIN32)

// Windows sysrandom here.

#else

// POSIX sysrandom here.

#endif

OpenBSD

There is one final caveat: modern OpenBSD does not have /dev/urandom. You should use getentropy instead.

#if defined(__OpenBSD__)
#   define HAVE_GETENTROPY
#endif

#if defined(HAVE_GETENTROPY)
#   include <unistd.h>

size_t sysrandom(void* dst, size_t dstlen)
{
    int bytes = getentropy(dst, dstlen);
    if (bytes != dstlen) {
        throw std::runtime_error("Unable to read N bytes from CSPRNG.");
    }

    return dstlen;
}

#endif

Other Thoughts

If you need cryptographically secure random bytes, you should probably replace the fstream with POSIX's unbuffered open/read/close. This is because both basic_filebuf and FILE contain an internal buffer, which will be allocated via a standard allocator (and therefore not wiped from memory).

This could easily be done by changing sysrandom to:

size_t sysrandom(void* dst, size_t dstlen)
{
    int fd = open("/dev/urandom", O_RDONLY);
    if (fd == -1) {
        throw std::runtime_error("Unable to open /dev/urandom.");
    }
    if (read(fd, dst, dstlen) != dstlen) {
        close(fd);
        throw std::runtime_error("Unable to read N bytes from CSPRNG.");
    }

    close(fd);
    return dstlen;
}

Thanks

Special thanks to Ben Voigt for pointing out FILE uses buffered reads, and therefore should not be used.

I would also like to thank Peter Cordes for mentioning getrandom, and OpenBSD's lack of /dev/urandom.

Teilo answered 13/7, 2017 at 0:5 Comment(8)

This is what I've done in the past, but the, or at least a, question is WTF can't the library writers for these platforms do this for us? I expect file access and threads (for example) to be abstracted by library implementations, so why not random number generation? – Maffei 13/7, 2017 at 0:13

@etarion, you can simply use a larger seed then. The sysrandom logic intentionally avoids any requirements on the input size. You can easily up the logic to the pool size of /dev/urandom (Linux 4.8 and below) to 512 bytes. This answer simply uses a simple example to show how to seed the generator, in most examples, fully seeding an mt19337 PRNG will not be practical (although you can reasonable do much more than what I've done above). – Teilo 13/7, 2017 at 8:32

OP here: It would be nice if this answer demonstrated the seeding a bit better. As much as possible, I am hoping for answers which generate copy-pastable code that does the job better than the simple example I posted in my question without requiring much technical interpretation or thought on the part of the coder. – Edulcorate 13/7, 2017 at 19:43

I thought /dev/random would be the better choice for seeding an RNG, but apparently /dev/urandom is still considered computationally secure even when /dev/random would block because of low available entropy, so urandom is the recommended choice for everything except maybe one-time pads. See also unix.stackexchange.com/questions/324209/…. Beware of predictable seeds from urandom very early after bootup, though. – Viaduct 13/7, 2017 at 20:36

Linux's getrandom(2) system call is like opening and reading /dev/urandom, except it will block if the kernel's randomness sources haven't been initialized yet. I think this saves you from the early-boot low-quality-randomness problem without blocking in other cases like /dev/random does. – Viaduct 13/7, 2017 at 20:54

@PeterCordes, for sure, and that's a great option when available. However, it does not work on BSD or other *Nixes, which is something /dev/urandom generally works on. The Python mailing list discussion on this is something I generally subscribe to: bugs.python.org/issue27266 – Teilo 13/7, 2017 at 21:12

Suggestion: whitelist implementations where std::random_device is known to not suck, and use it instead of actually coding up a read() from /dev/urandom? Then you don't have to worry about maintaining the platform support code if some OS decides there's an even better way to get randomness. It's unlikely that an implementation will regress from usable to unusable. Or if you don't use it for a security-sensitive purpose, it might make sense to take the riskier approach of only random_device on a few known-bad platforms like MinGW. – Viaduct 15/7, 2017 at 5:47

Also, for any x86 target, you can check CPUID and run RDSEED (int _rdseed64_step( unsigned __int64 *) if CPUID says it's available. (RDRAND apparently doesn't have as high a randomness-quality guarantee, hence the existence of RDSEED which is compliant to NIST SP800-90B and NIST SP800-90C.) But note that you still need a fallback because the API allows it to report an error (e.g. if it's physically damaged in a given CPU). IDK if real hardware ever does that. – Viaduct 15/7, 2017 at 5:54

In a sense, this can't be done portably. That is, one can conceive a valid fully-deterministic platform running C++ (say, a simulator which steps the machine clock deterministically, and with "determinized" I/O) in which there is no source of randomness to seed a PRNG.

Enough answered 12/7, 2017 at 23:51 Comment(13)

If interaction with the user is part of the platform, then it cannot be deterministic. That's not much practical help, though. – Argentic 12/7, 2017 at 23:56

@kbelder: 1. Who says the user is a person? 2. Not all programs have user interaction and you certainly can't assume there's always a user around... – Enough 13/7, 2017 at 0:7

I appreciate this response, but also feel as though a program should make a reasonable best-effort attempt. – Edulcorate 13/7, 2017 at 0:12

@Edulcorate Agreed, but the issue is the C++ standard writers have to (or at least try their darndest to) accommodate these sorts of bizarre situations. That's why you get these sorts of wishy-washy standard definitions, where you might get decent results, but the compiler can still be standards-compliant even if it gives back something that's functionally worthless. -- So your restrictions ("short and cannot rely on other libraries") rule out any response, as you effectively need a platform-by-platform/compiler-by-compiler special casing. (e.g. what Boost does so well.) – Integrant 13/7, 2017 at 15:13

@Edulcorate what it explains, though, is that you get what you get in the standard because there is no portable way to do better. If you want to do better (which is a noble goal) you will have to accept some greater or lesser amount of abomination :) – Grainy 13/7, 2017 at 19:8

@hobbs: I am all about balanced abomination. :-) – Edulcorate 13/7, 2017 at 19:13

@Richard: Sometimes you just have to accept that it's possible to make a standards-compliant C++ implementation which isn't useful. Since the implementations people use for anything that matters are designed to be useful, you sometimes have to live with arguments like "any sane implementation will do something reasonable". I would have hoped that std::random_device would be in that category, but apparently it's not if some real implementations use a fixed-seed PRNG! That goes way beyond einpoklum's argument. – Viaduct 13/7, 2017 at 21:5

@PeterCordes: Actually, I would argue that there probably are real platforms (with C++ compilers targeting them?) in which you just don't have effective access to a randomness source. For sure some embedded devices could be like that. – Enough 13/7, 2017 at 22:26

@einpoklum: A best-effort attempt to gather some randomness would maybe still be better than a fixed seed, though. e.g. sample the low bits of a high-precision clock a few times, mix in a PID and maybe some other things that vary between runs. Some argue that producing identical sequences every time is the best way to indicate that there isn't a good quality RNG available, but IMO that's a sign of an over-simplified API. If there was an optional high_quality_only=false arg, you could run rd(true) to request failure if crypto-quality randomness wasn't available. – Viaduct 14/7, 2017 at 0:32

But code that wanted good-enough randomness for non-security use cases could get what they want from rd(). And of course it's just a massive quality-of-implementation issue that MinGW is using a fixed seed on an OS that has a randomness source. It sucks a lot that we can't trust std::random_device to give something close to as good as the platform can provide. – Viaduct 14/7, 2017 at 0:35

@PeterCordes: The point is that on some systems you probably can't sample the clock with high precision, and there are no processes. – Enough 14/7, 2017 at 8:39

Yes, I understand there will be some platforms where there's just so little randomness that you can't really do anything. But as you say, usually only embedded, in which case any software that runs on it will (should) be tested carefully for that platform. Not compiled from source by someone who just downloaded it, like will happen on MinGW, which is ruining std::random_device for everyone by providing an unexpectedly terrible implementation. – Viaduct 14/7, 2017 at 13:36

@PeterCordes: Fair enough. – Enough 14/7, 2017 at 15:44

You can use a std::seed_seq and fill it to at least the requires state size for the generator using Alexander Huszagh's method of getting the entropy:

size_t sysrandom(void* dst, size_t dstlen); //from Alexander Huszagh answer above

void foo(){

    std::array<std::mt19937::UIntType, std::mt19937::state_size> state;
    sysrandom(state.begin(), state.length*sizeof(std::mt19937::UIntType));
    std::seed_seq s(state.begin(), state.end());

    std::mt19937 g;
    g.seed(s);
}

If there was a proper way to fill or create a SeedSequence from a UniformRandomBitGenerator in the standard library using std::random_device for seeding properly would be much simpler.

Matrilocal answered 13/7, 2017 at 10:52 Comment(2)

seed_seq has issues though, pcg-random.org/posts/developing-a-seed_seq-alternative.html – Haricot 13/7, 2017 at 11:28

There is nothing in the C++ standard or anywere to guarantee that the random number generator will use the whole array when you seed from seed_seq. This method will lead to failure if you are using the rng for a scientific simulation, and obviously also for cryptography. About the only use case for this will be to randomize a videogame, but there it would be an overkill. – Keramics 4/12, 2019 at 9:42

The implementation I am working on takes advantage of the state_size property of the mt19937 PRNG to decide how many seeds to provide on initialization:

using Generator = std::mt19937;

inline
auto const& random_data()
{
    thread_local static std::array<typename Generator::result_type, Generator::state_size> data;
    thread_local static std::random_device rd;

    std::generate(std::begin(data), std::end(data), std::ref(rd));

    return data;
}

inline
Generator& random_generator()
{
    auto const& data = random_data();

    thread_local static std::seed_seq seeds(std::begin(data), std::end(data));
    thread_local static Generator gen{seeds};

    return gen;
}

template<typename Number>
Number random_number(Number from, Number to)
{
    using Distribution = typename std::conditional
    <
        std::is_integral<Number>::value,
        std::uniform_int_distribution<Number>,
        std::uniform_real_distribution<Number>
    >::type;

    thread_local static Distribution dist;

    return dist(random_generator(), typename Distribution::param_type{from, to});
}

I think there is room for improvement because std::random_device::result_type could differ from std::mt19937::result_type in size and range so that should really be taken into account.

A note about std::random_device.

According to the C++11(/14/17) standard(s):

26.5.6 Class random_device [ rand.device ]

2 If implementation limitations prevent generating non-deterministic random numbers, the implementation may employ a random number engine.

This means the implementation may only generate deterministic values if it is prevented from generating non-deterministic ones by some limitation.

The MinGW compiler on Windows famously does not provide non-deterministic values from its std::random_device, despite them being easily available from the Operating System. So I consider this a bug and not likely a common occurrence across implementations and platforms.

Lethe answered 12/7, 2017 at 23:58 Comment(15)

This may fill the MT state, but still relies solely on std::random_device, and is therefore vulnerable to issues stemming from it. – Edulcorate 13/7, 2017 at 0:0

@Edulcorate What issues are those? – Lethe 13/7, 2017 at 0:1

I think I stated them clearly enough in the question. Happy to clarify/discuss, though. – Edulcorate 13/7, 2017 at 0:3

@Edulcorate Are there any real systems that don't actually implement a reasonable std::random_device? I know the standard allows a PRNG as fall back but I feel that's just to cover themselves as it is hard to demand that every device that uses C++ has a non-deterministic random source. And if they don't then what could you do about that anyway? – Lethe 13/7, 2017 at 0:9

@Edulcorate So I suspect that std::random_device is as good as it gets in terms of portable non-deterministic random source. – Lethe 13/7, 2017 at 0:10

@Galik, yes, MinGW does not: it's deterministic. The fact that true randomness is not guaranteed should be reason enough to ignore it except as a fallback. – Teilo 13/7, 2017 at 0:11

In fact, MinGW deliberately chose an identical starting sequence to make the fact that std::random_device is a PRNG, and not a CSPRNG, obvious. #18881154 – Teilo 13/7, 2017 at 0:13

@AlexanderHuszagh I don't use windows but I was under the impression that the problem with MinGW was a bug, not a feature, but I could be wrong. Is that true of all versions? – Lethe 13/7, 2017 at 0:16

@Galik, I am unsure, but I have confirmed it with my own code. I believe it is mostly unimplemented, and the consistency of the results is a feature (to notify an observant user). The fact that std::random_device is not guaranteed to be a CSPRNG is reason enough for me to consider it unreliable, and that other options should be used when possible. – Teilo 13/7, 2017 at 0:19

@AlexanderHuszagh I think MinGW is a port so it tends to suffer from incomplete implementation for longer than other compilers perhaps. But I don't think compiler implementers are going to avoid implementing a proper non-deterministic source unless the device simply doesn't have one. That's what the "get out" clause of for. – Lethe 13/7, 2017 at 0:24

That's dangerous though, since you are then making your portable solution highly dependent on the compiler and compiler version. Remember, this was back in 2013: it hasn't changed since. std::random_device may be good if a fallback is needed, but inherently, it's not reliable and should be considered a last resort if known CSPRNGs aren't available. – Teilo 13/7, 2017 at 0:37

@AlexanderHuszagh I'm not so sure. My intention is to make my "portable solution" dependent on the device because if the device supports non-deterministic generators then so should std::random_device. I believe that is the spirit of the standard. So I have searched and can only find MinGW that is broken in this respect. No one seems to be reporting this problem with anything else that I have found. So, in my library, I have simply marked MinGW as unsupported. If there was a wider problem then I'd re-think it. I just don't see the evidence of that right now. – Lethe 13/7, 2017 at 1:25

This code is wrong (at least) for 64-bit-MTs. std::array<typename Generator::result_type, Generator::state_size> does not work since seed_seq uses only 32 bit per element (and you only fill 32 bit of it anyway), so for a 64-bit MT you'd need at least twice as many elements. – Haricot 13/7, 2017 at 8:33

@Galik: If you were to XOR the random device with time, then it would at least generate separate results for each run of the program (to the resolution of the timer). This seems to me an improvement over the current MinGW situation. – Edulcorate 13/7, 2017 at 19:47

I'm really disappointed that MinGW is ruining std::random_device for everyone by making it available in a form that doesn't deliver the platform's randomness capabilities. Low-quality implementations defeat the purpose of the API existing. It would be better IMO if they just didn't implement it at all until they have it working. (Or better, if the API provided a way to request failure if high-quality randomness wasn't available, so MinGW could avoid causing security risks while still giving different seeds for games or whatever.) – Viaduct 14/7, 2017 at 0:42

There are already a lot of good answers here, but I wanted to add two things:

The MinGW bug, here cited as the most notable example of deterministic std::random_device implementation, was fixed in recent versions (link to the bug report).
In c++20, there is a way to fill a std::seed_seq with values from std::random_device without using a buffer array:

#include <random>
#include <ranges>

int main(){

    std::random_device rd;
    auto rd_range = std::views::iota(0) | std::views::transform([&rd](auto){return rd();});
    std::seed_seq seeds(rd_range.begin(), rd_range.begin() + std::mt19937::state_size);
    std::mt19937 gen(seeds);

    return 0;

}

Experientialism answered 28/9, 2022 at 9:16 Comment(0)

There's nothing wrong with seeding by using time, assuming you don't need it to be secure (and you didn't say this was necessary). The insight is that you can use hashing to fix the non-randomness. I've found this works adequately in all cases, including and in-particular for heavy Monte Carlo simulations.

One nice feature of this approach is that it generalizes to initialization from other not-really-random sets of seeds. For example, if you want each thread to have its own RNG (for threadsafety), you can just initialize based on hashed thread ID.

The following is a SSCCE, distilled from my codebase (for simplicity; some OO support structures elided):

#include <cstdint> //`uint32_t`
#include <functional> //`std::hash`
#include <random> //`std::mt19937`
#include <iostream> //`std::cout`

static std::mt19937 rng;

static void seed(uint32_t seed) {
    rng.seed(static_cast<std::mt19937::result_type>(seed));
}
static void seed() {
    uint32_t t = static_cast<uint32_t>( time(nullptr) );
    std::hash<uint32_t> hasher; size_t hashed=hasher(t);
    seed( static_cast<uint32_t>(hashed) );
}

int main(int /*argc*/, char* /*argv*/[]) {
    seed();
    std::uniform_int_distribution<> dis(0, 5);
    std::cout << dis(rng);
}

Loveinidleness answered 13/7, 2017 at 22:40 Comment(3)

I agree with your point that seeding with the time is probably good enough in practice, if you don't need it to be secure. But I can't agree with the rest of your answer. Seeding with the hash of the time is no better than seeding with the time itself. – Circumsolar 14/7, 2017 at 13:59

@Circumsolar Empirically, it is much better. The reason is that the hash is discontinuous and spans a much wider range of values (try this yourself: seed with 1 and 2 and observe that the sequence of floats generated by them takes a while to really diverge). – Loveinidleness 14/7, 2017 at 18:27

I don't see why that matters. We're only running on a single seed at a time. The space of possible values for the seed (the entropy of the seed) is the same either way -- hashing doesn't increase entropy. Perhaps you could edit the question to explain why hashing is better? – Circumsolar 14/7, 2017 at 18:36

Here's my own stab at the question:

#include <random>
#include <chrono>
#include <cstdint>
#include <algorithm>
#include <functional>
#include <iostream>

uint32_t LilEntropy(){
  //Gather many potential forms of entropy and XOR them
  const  uint32_t my_seed = 1273498732; //Change during distribution
  static uint32_t i = 0;        
  static std::random_device rd; 
  const auto hrclock = std::chrono::high_resolution_clock::now().time_since_epoch().count();
  const auto sclock  = std::chrono::system_clock::now().time_since_epoch().count();
  auto *heap         = malloc(1);
  const auto mash = my_seed + rd() + hrclock + sclock + (i++) +
    reinterpret_cast<intptr_t>(heap)    + reinterpret_cast<intptr_t>(&hrclock) +
    reinterpret_cast<intptr_t>(&i)      + reinterpret_cast<intptr_t>(&malloc)  +
    reinterpret_cast<intptr_t>(&LilEntropy);
  free(heap);
  return mash;
}

//Fully seed the mt19937 engine using as much entropy as we can get our
//hands on
void SeedGenerator(std::mt19937 &mt){
  std::uint_least32_t seed_data[std::mt19937::state_size];
  std::generate_n(seed_data, std::mt19937::state_size, std::ref(LilEntropy));
  std::seed_seq q(std::begin(seed_data), std::end(seed_data));
  mt.seed(q);
}

int main(){
  std::mt19937 mt;
  SeedGenerator(mt);

  for(int i=0;i<100;i++)
    std::cout<<mt()<<std::endl;
}

The idea here is to use XOR to combine many potential sources of entropy (fast time, slow time, std::random-device, static variable locations, heap locations, function locations, library locations, program-specific values) to make a best-effort attempt at initializing the mt19937. As long as at least once source is "good", the result will be at least that "good".

This answer is not as short as would be preferable and may contain one or more mistakes of logic. So I'm considering it a work in progress. Please comment if you have feedback.

Edulcorate answered 13/7, 2017 at 1:36 Comment(12)

The addresses might have very little randomness. You always have the same allocations, so on smaller embedded systems where you have access the the whole memory, it's likely to get the same results each time. I'd say it's likely good enough for a big system, but might do shit on a microcontroller. – Vaish 13/7, 2017 at 2:0

I would guess &i ^ &myseed should have considerably less entropy than either one alone, since both are objects with static storage duration in the same translation unit and therefore likely to be rather close together. And you don't seem to actually use the special value from the initialization of myseed? – Simp 13/7, 2017 at 2:57

Converting dealocated pointers to ints is undefined behaviour; do it while it still exists. ^ is a horrible hash combiner; if two values both have lots of entropy, but little compared to each other, it removes it. + is usually better (as x+x only burns 1 bit of entropy in x, while x^x burns them all). Function is not thead safe I suspect (rd()) – Impalpable 13/7, 2017 at 3:22

Oh and by + I mean on unsigned (+ on signed is UB-bait). While these are somewhat ridiculous UB cases, you did say portable. Also consider getting the address of a function as an integral value if possible (uncertain if it is?) – Impalpable 13/7, 2017 at 3:28

@meneldal: Even on a full-strength PC, although the allocations might get different physical locations (depending on state of the machine external to the process), the pointers are abstracted by the process virtual address space, and likely highly repeatable, particularly is ASLR isn't in effect. – Iota 13/7, 2017 at 3:42

@Ben that's true, it's just worse on a system where all allocations are predictable. ASLR has been a thing for a while, so I forgot this wasn't always the standard. – Vaish 13/7, 2017 at 4:20

I am unaware if reinterpret_cast<intptr_t>(&Entropyinator) is defined behavior under the C++ standard. Function pointers cannot be converted to void*, I don't know if they can be converted to intptr_t or not. Prove this one way or the other, or don't use it in portable code. – Impalpable 13/7, 2017 at 14:9

@Yakk: XOR's poor performance as a hash combiner was what I was afraid of. Thanks for the addition suggestion. – Edulcorate 13/7, 2017 at 19:53

@meneldal: The point of mucking with the addresses and timers is to try to find a best-effort offset such that even if random_device is a PRNG that restarts on each run of the application it will yield different results (provided the memory loads differently). – Edulcorate 13/7, 2017 at 19:55

Don't add signed ints and cause undefined behavior as overflow when I told you it was a problem 6 comments above. – Impalpable 13/7, 2017 at 20:21

You should sample the low bits of the high-res clock between every other operation. On x86, the CPU's timestamp counter increments at one count per reference cycle, where a reference cycle is the advertised CPU frequency (not the current turbo or power-saving core clock frequency). Differences in system load and other variations will change the number of cycles that calling other functions takes, especially functions that are being called for the first time in your program since they will probably miss in cache. – Viaduct 14/7, 2017 at 0:49

If you can scatter clock-sampling through other parts of your startup code (especially slow parts) before you need to seed your RNG, that's much better. You'll get timestamps with more unpredictable variations in spacing, especially if you make some system calls or especially do any file or network I/O before you need the RNG. – Viaduct 14/7, 2017 at 0:55

Use getentropy() to seed a pseudorandom number generator (PRNG).
Use getrandom() if you want random values (instead of, say, /dev/urandom or /dev/random).

These are available on modern UNIX-like systems, such as Linux, Solaris, and OpenBSD.

Emprise answered 21/10, 2019 at 15:23 Comment(0)

-2

A given platform might have a source of entropy, such as /dev/random. Nanoseconds since the Epoch with std::chrono::high_resolution_clock::now() is probably the best seed in the Standard Library.

I previously have used something like (uint64_t)( time(NULL)*CLOCKS_PER_SEC + clock() ) to get more bits of entropy for applications that aren’t security-critical.

Ietta answered 13/7, 2017 at 0:26 Comment(3)

You really should use /dev/urandom, especially in a case like this. /dev/random blocks, and often without good reasons for doing so ([insert long explanation about how many different OSes estimate the randomness of the bytes produced by /dev/random]). – Teilo 13/7, 2017 at 0:46

@AlexanderHuszagh True, although I’ve had to code on systems where /dev/urandom didn’t exist, and the alternative to blocking was determinism. A box might have /dev/hwrng or /dev/hw_random as well, which should be even better. – Ietta 13/7, 2017 at 8:24

Okay, I said, “such as /dev/random,” and that seems to have sparked a holy war about /dev/random versus /dev/urandom on Linux that I did not intend when I gave that example.. – Ietta 13/7, 2017 at 20:27

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++