Usefulness of `rand()` - or who should call `srand()`?
Asked Answered
S

4

46

Background: I use rand(), std::rand(), std::random_shuffle() and other functions in my code for scientific calculations. To be able to reproduce my results, I always explicitly specify the random seed, and set it via srand(). That worked fine until recently, when I figured out that libxml2 would also call srand() lazily on its first usage - which was after my early srand() call.

I filled in a bug report to libxml2 about its srand() call, but I got the answer:

Initialize libxml2 first then. That's a perfectly legal call to be made from a library. You should not expect that nobody else calls srand(), and the man page nowhere states that using srand() multiple time should be avoided

This is actually my question now. If the general policy is that every lib can/should/will call srand(), and I can/might also call it here and there, I don't really see how that can be useful at all. Or how is rand() useful then?

That is why I thought, the general (unwritten) policy is that no lib should ever call srand() and the application should call it only once in the beginning. (Not taking multi-threading into account. I guess in that case, you anyway should use something different.)

I also tried to research a bit which other libraries actually call srand(), but I didn't find any. Are there any?

My current workaround is this ugly code:

{
    // On the first call to xmlDictCreate,
    // libxml2 will initialize some internal randomize system,
    // which calls srand(time(NULL)).
    // So, do that first call here now, so that we can use our
    // own random seed.
    xmlDictPtr p = xmlDictCreate();
    xmlDictFree(p);
}

srand(my_own_seed);

Probably the only clean solution would be to not use that at all and only to use my own random generator (maybe via C++11 <random>). But that is not really the question. The question is, who should call srand(), and if everyone does it, how is rand() useful then?

Soler answered 10/10, 2014 at 7:33 Comment(20)
if you can use C++11 libs, you should have a look at en.cppreference.com/w/cpp/numeric/randomRed
Using rand for scientific calculations isn't necessarily a good idea anyway: the underlying implementation is frequently a simple LCG, which generally just doesn't deliver values of sufficient quality for scientific applications.Ba
IMO it's a design flaw on your side. When you need predictable random values (a contradictio in adiecto by itself), you should never use a global random generator that every other part of your program, including the libs you use, might also use. Even if those other parts do not call srand(), just by calling rand() they will also influence the global random generator.Overcloud
"the general (unwritten) policy is that no lib should ever call srand()" - that's the sensible approach, but then some library authors would get lots of complaints and queries with people wondering why "random" behaviours repeated between runs.... Perhaps the best thing would be to have a srand_if_not_yet_initialised() function in the API, but too late for that. Prefer the C++11 versions.Drais
There is little you can do to control the global function calls, essentially all code using them will "interfere" with each other. To isolate yourself (or another piece of code), use the C++11 features instead.Munda
Sorry, using srand/rand for security reasons, as the libxml2 guy claims, is ridicules.Cowen
Note also that even if you could prevent anyone else from calling srand(), it's still a bad idea to rely on the sequence generated by rand() for testing/validation/etc, as it may change if/when you switch to another platform or even a newer version of the libraries/OS.Clean
@ammoQ It tends to not be that you want predictable random numbers, but more that when you get a result, you want them to be repeatable. While things should of course work for the general case, it allows you to show that it does work for certain inputs if needed (working input configuration), but also means that you can trace how changes effect the program more easily because you are using the same input data.Unearthly
A library which modifies global state (and this include calling srand()) is broken. Their answer is just waffling, trying to justify the unjustifiable. For the rest: if you need reproducible random numbers for scientific calculations, rand() doesn't fill the bill. It's fine for games, or just playing around, but that's about it. Otherwise, in pre-C++11, you implement your own, and in C++11, you use <random>. (But that doesn't let the library implementers off the hook.)Doiron
@TonyD If a library implementer needs random numbers, they could require that srand() be called before initializing the library. Or simply use their own, private random number generator (in which case, they should provide a means of specifying the seed, so that you can test code which uses the library).Doiron
@JamesKanze "A library which modifies global state (and this include calling srand()) is broken." -- Calling rand modifies global state just as srand does. What is broken is that these functions have global state.Wimsatt
Why the heck does does libxml need randomness at all?Condominium
@Cowen Security? He is using it make his results reproducible. It is a very good idea when trying to debug algorithms with a stochastic component.Cabal
I agree that the application should set the seed and leave it at that. This is more moral support than an answer hence its a comment.Cabal
@JimBalter Yup. A library which needs random numbers should either have its own RNG (one of those in <random> is fine if the library doesn't need to support pre-C++11), or ask the user to provide one (like std::random_shuffle).Doiron
@Mehrdad I can think of a few reasons, but for none of them would rand() be appropriate. (Most would require some sort of cryptographically secure RNG. Which means that neither rand() nor anything in <random> would be appropriate.)Doiron
While it's easy to bag out rand and appropriate to recommend new C++11 alternatives, a point that seems to be being overlooked here is that in many single-threaded applications, after calling srand() with the same seed, and with the same (if any) external inputs, the execution both in app and library code will be entirely deterministic. Having a library call rand() changes the global state, but no less predictably that if the higher level app code had done so; if the calls are interspersed in the same way, no harm done.Drais
@JamesKanze: Well that's my question: why might XML parsing require a cryptographically secure RNG (never mind that rand isn't one)?Condominium
@user515430: rand() is probably used to protect against algorithmic complexity attacks, not for any serious cryptography.Papal
In addition to the other answers and comments, IMO you should decouple your source of pseudo-random numbers from the rest of the algorithm. Create a class RandomValues that you can switch between deterministic (maybe even hardcoded values) and any random number generator you wish, and access that class from your algorithm. That would allow you to painlessly switch between deterministic and random inputs.Gorgoneion
D
34

Use the new <random> header instead. It allows for multiple engine instances, using different algorithms and more importantly for you, independent seeds.

[edit] To answer the "useful" part, rand generates random numbers. That's what it's good for. If you need fine-grained control, including reproducibility, you should not only have a known seed but a known algorithm. srand at best gives you a fixed seed, so that's not a complete solution anyway.

Devoirs answered 10/10, 2014 at 7:36 Comment(18)
Yea, sure, I know that. But that is not really my question.Soler
If you consider the consequences, you see why the library can't assume that you called srand() - that initializes the global PRNG. You might not have any reason to do so because you're using your own PRNG. Thus they have to call it.Devoirs
@Albert: the answer to your literal question is "it isn't useful, at least not from a testability/repeatability perspective". The only way round that is to encapsulate your own RNG's state; the new C++ library does that for you.Patronage
@MSalters: That is my question. Should the lib not assume so? If the lib does not assume that, it means that all tools which call srand() after libxml2 init will reset the random generator used by the lib. So, in both cases, it doesn't really make sense for the lib to call srand(). Or is there some flaw in my thinking?Soler
@Albert: Which does not matter in the least for all those applications which use rand() only to get random numbers. They only need to know that srand() is called at least once.Devoirs
Re:their response, since srand is neither thread-safe nor reentrant having it called multiple times from different threads is not safe at all (rand has similar problems). There is a real, fundamental problem here which is addressed with the possibility for multiple generators in C++11.Whitman
@Soler Your question is much like "Shouldn't there be world peace?" ... regardless of the answer, there isn't. That libxml2 calls srand is a fact. Just move on and do as this answer suggests.Wimsatt
My somewhat implicit question, whether this is a bug/problem in libxml2, is still not really answered. Or is the answer that it doesn't really matter that much for libxml2? (And in my app, I should anyway use something different - but that would have been my consequence anyway.)Soler
@Soler It's a pointless philosophical exercise. The maintainer isn't going to change it just because of a discussion here. It's certainly not a bug, but it seems to be a problem for you ... so again, do something else.Wimsatt
@Devoirs "does not matter in the least... only need to know that srand() is called at least once." - it can matter: the period guarantees have been lost/reset, so earlier sequences may be repeated much earlier than otherwise (not only - but most dramatically - when several of the calls to srand() use say time(NULL) as a seed).Drais
@TonyD: Period guarantees? I don't think the standard makes those. Also, it doesn't matter that libXML gets the same random values as you do because they too called srand() with the same seed.Devoirs
@MSalters: many implementations document a period of 2^32, and it's reasonable for code to rely on implementation-defined behaviours if that suits its portability objectives. "Also, it doesn't matter that libXML gets the same random values as you do because they too called srand() with the same seed." - that's misunderstanding how rand() works - it's not going to give libXML the same sequence as the app because they share state, the problem is that if say the app is the serious consumer of random numbers then libXML's srand() call can restart a [sub]sequence the app already generated.Drais
So? Let A be the part of the repeating pattern generated between the two calls of srand and B the remainder, then the resulting sequence will be AABABABA etcetera, which is more random then ABABABA.Devoirs
@Devoirs If you seriously believe ignoring the lessened period and reasoning that way makes sense, then I guess I'd best call it quits... ;-P.Drais
@TonyD: The sequence AABABAB... isn't even periodic! Regardless, given an sub-sequence of just A, only the latter lets you predict with 100% certainty that the output will continue with B. Zero entropy.Devoirs
"that's misunderstanding how rand() works" -- Uh, no, you seem to have completely misunderstood what he wrote. " it's not going to give libXML the same sequence as the app because they share state" -- he didn't say anything about "because they share state", he said if srand is called with the same seed.Wimsatt
@JimBalter: I'm confident I haven't misunderstood anything here. As MSalter's wrote in his last comment "The sequence AABABAB... isn't even periodic!" - that's the very problem I've been highlighting. Whether it was the app or lib that first called srand() and consumed A, it might reasonably expect A not to be repeated so soon, but the second call to srand() invalidates that expectation and A's seen twice (minus any elements the other lib/app consumes by itself calling rand()). I mentioned "because they share state" because subsequences from A may be (re)seen by the app and lib.Drais
I'm sure such confidence is comforting.Wimsatt
D
26

Well, the obvious thing has been stated a few times by others, use the new C++11 generators. I'm restating it for a different reason, though.
You use the output for scientific calculations, and rand usually implements a rather poor generator (in the mean time, many mainstream implementations use MT19937 which apart from bad state recovery isn't so bad, but you have no guarantee for a particular algorithm, and at least one mainstream compiler still uses a really poor LCG).

Don't do scientific calculations with a poor generator. It doesn't really matter if you have things like hyperplanes in your random numbers if you do some silly game shooting little birds on your mobile phone, but it matters big time for scientific simulations. Don't ever use a bad generator. Don't.

Important note: std::random_shuffle (the version with two parameters) may actually call rand, which is a pitfall to be aware of if you're using that one, even if you otherwise use the new C++11 generators found in <random>.

About the actual issue, calling srand twice (or even more often) is no problem. You can in principle call it as often as you want, all it does is change the seed, and consequentially the pseudorandom sequence that follows. I'm wondering why an XML library would want to call it at all, but they're right in their response, it is not illegitimate for them to do it. But it also doesn't matter.
The only important thing to make sure is that either you don't care about getting any particular pseudorandom sequence (that is, any sequence will do, you're not interested in reproducing an exact sequence), or you are the last one to call srand, which will override any prior calls.

That said, implementing your own generator with good statistical properties and a sufficiently long period in 3-5 lines of code isn't all that hard either, with a little care. The main advantage (apart from speed) is that you control exactly where your state is and who modifies it.
It is unlikely that you will ever need periods much longer than 2128 because of the sheer forbidding time to actually consume that many numbers. A 3GHz computer consuming one number every cycle will run for 1021 years on a 2128 period, so there's not much of an issue for humans with average lifespans. Even assuming that the supercomputer you run your simulation on is a trillion times faster, your grand-grand-grand children won't live to see the end of the period.
Insofar, periods like 219937 which current "state of the art" generators deliver are really ridiculous, that's trying to improve the generator at the wrong end if you ask me (it's better to make sure they're statistically firm and that they recover quickly from a worst-case state, etc.). But of course, opinions may differ here.

This site lists a couple of fast generators with implementations. They're xorshift generators combined with an addition or multiplication step and a small (from 2 to 64 machine words) lag, which results in both fast and high quality generators (there's a test suite as well, and the site's author wrote a couple of papers on the subject, too). I'm using a modification of one of these (the 2-word 128-bit version ported to 64-bits, with shift triples modified accordingly) myself.

Dalrymple answered 10/10, 2014 at 9:29 Comment(0)
S
8

This problem is being tackled in C++11's random number generation, i.e. you can create an instance of a class:

std::default_random_engine e1

which allows you to fully control only random numbers generated from object e1 (as opposed to whatever would be used in libxml). The general rule of thumb would then be to use new construct, as you can generate your random numbers independently.

Very good documentation

To address your concerns - I also think that it would be a bad practice to call srand() in a library like libxml. However, it's more that srand() and rand() are not designed to be used in the context you are trying to use them - they are enough when you just need some random numbers, as libxml does. However, when you need reproducibility and be sure that you are independent on others, the new <random> header is the way to go for you. So, to sum up, I don't think it's a good practice on library's side, but it's hard to blame them for doing that. Also, I could not imagine them changing that, as billion other pieces of software probably depend on it.

Selfinsurance answered 10/10, 2014 at 7:39 Comment(0)
A
7

The real answer here is that if you want to be sure that YOUR random number sequence isn't being altered by someone else's code, you need a random number context that is private to YOUR work. Note that calling srand is only one small part of this. For example, if you call some function in some other library that calls rand, it will also disrupt the sequence of YOUR random numbers.

In other words, if you want predictable behaviour from your code, based on random number generation, it needs to be completely separate from any other code that uses random numbers.

Others have suggested using the C++ 11 random number generation, which is one solution.

On Linux and other compatible libraries, you could also use rand_r, which takes a pointer to an unsigned int to a seed that is used for that sequence. So if you initialize that a seed variable, then use that with all calls to rand_r, it will be producing a unique sequence for YOUR code. This is of course still the same old rand generator, just a separate seed. The main reason I meantion this is that you could fairly easily do something like this:

int myrand()
{
   static unsigned int myseed = ... some initialization of your choice ...;
   return rand_r(&myseed);
}

and simply call myrand instead of std::rand (and should be doable to work into the std::random_shuffle that takes a random generator parameter)

Ashby answered 10/10, 2014 at 8:1 Comment(1)
Yes! So right. Just moving the srand() call out of the libxml library won't really make your results very reproducible. It might work somewhat, but obviously some libxml functions are calling rand() at some point (calculating UUIDs?) and this will alter the psuedorandom sequence your program will receive from rand() after that...Sphygmograph

© 2022 - 2024 — McMap. All rights reserved.