(I am not a cryptographer but I've absorbed a lot over the years. I had to help scrutinize a client's random number generation years back which is what lead to finding that Crypt::Random bug mentioned below.)
The seed code makes more sense if you properly indent all those ifdefs. This is the code in 5.16.0.
U32
Perl_seed(pTHX)
{
dVAR;
/*
* This is really just a quick hack which grabs various garbage
* values. It really should be a real hash algorithm which
* spreads the effect of every input bit onto every output bit,
* if someone who knows about such things would bother to write it.
* Might be a good idea to add that function to CORE as well.
* No numbers below come from careful analysis or anything here,
* except they are primes and SEED_C1 > 1E6 to get a full-width
* value from (tv_sec * SEED_C1 + tv_usec). The multipliers should
* probably be bigger too.
*/
#if RANDBITS > 16
# define SEED_C1 1000003
# define SEED_C4 73819
#else
# define SEED_C1 25747
# define SEED_C4 20639
#endif
#define SEED_C2 3
#define SEED_C3 269
#define SEED_C5 26107
#ifndef PERL_NO_DEV_RANDOM
int fd;
#endif
U32 u;
#ifdef VMS
# include <starlet.h>
/* when[] = (low 32 bits, high 32 bits) of time since epoch
* in 100-ns units, typically incremented ever 10 ms. */
unsigned int when[2];
#else
# ifdef HAS_GETTIMEOFDAY
struct timeval when;
# else
Time_t when;
# endif
#endif
/* This test is an escape hatch, this symbol isn't set by Configure. */
#ifndef PERL_NO_DEV_RANDOM
# ifndef PERL_RANDOM_DEVICE
/* /dev/random isn't used by default because reads from it will block
* if there isn't enough entropy available. You can compile with
* PERL_RANDOM_DEVICE to it if you'd prefer Perl to block until there
* is enough real entropy to fill the seed. */
# define PERL_RANDOM_DEVICE "/dev/urandom"
# endif
fd = PerlLIO_open(PERL_RANDOM_DEVICE, 0);
if (fd != -1) {
if (PerlLIO_read(fd, (void*)&u, sizeof u) != sizeof u)
u = 0;
PerlLIO_close(fd);
if (u)
return u;
}
#endif
#ifdef VMS
_ckvmssts(sys$gettim(when));
u = (U32)SEED_C1 * when[0] + (U32)SEED_C2 * when[1];
#else
# ifdef HAS_GETTIMEOFDAY
PerlProc_gettimeofday(&when,NULL);
u = (U32)SEED_C1 * when.tv_sec + (U32)SEED_C2 * when.tv_usec;
# else
(void)time(&when);
u = (U32)SEED_C1 * when;
# endif
#endif
u += SEED_C3 * (U32)PerlProc_getpid();
u += SEED_C4 * (U32)PTR2UV(PL_stack_sp);
#ifndef PLAN9 /* XXX Plan9 assembler chokes on this; fix needed */
u += SEED_C5 * (U32)PTR2UV(&when);
#endif
return u;
}
The code is so confusing because it's really several different ways of getting entropy all interleaved together. There's basically two paths: the system random device and gathering from the state of the interpreter and environment.
- The system random device.
This is the simplest and probably strongest method. If your OS has a random device which does not block, ie. /dev/urandom
read 32 bits from it. Done! #ifndef PERL_NO_DEV_RANDOM
(nice double negative) controls that bit. This is done on pretty much every Unix system. At this point the analysis of Perl's random seed switches to the implementation of your particular OS' /dev/urandom
.
- Derive something from the clock, pid and stack pointer.
If your system does not have a random device, basically Windows, Perl falls back to deriving a seed by mixing up some hopefully hard to predict system values.
- The time in microseconds or just seconds, depends on if
gettimeofday()
exists.
- The process id,
PerlProc_getpid()
.
- The memory location of the current stack pointer,
PTR2UV(PL_stack_sp)
.
What it should do with that information, and this is what the big comment at the start is about, is mash them together using a real hashing algorithm. Instead it multiplies them by various constants (SEED_C1
, SEED_C2
and so on) and adds them up. This is sure to be flawed.
All of that information is, in theory, predictable. I don't know what the state of the art is in predicting system information, but time + pid + stack pointer is a fairly common method of getting entropy and there's sure to be papers on the subject.
There is an additional flaw in common with all of Perl's methods, it does this all using only 32 bits even on 64 bit machines. It will not pull 64 bits out of /dev/urandom
, just 32. It will only look at 32 bits of the process id, stack pointer or time information even if there's 64 bits of information.
After reading through the code I three concerns.
- Its use of just 32 bits of randomness.
It's possible multi-GPU system could brute force that.
- (Unix) How good is your
/dev/urandom
.
/dev/urandom
can run out of entropy if you pull too much from it too fast. Instead of blocking it will generate weaker entropy. This is out of Perl's control but is a system wide weakness. In addition, some programs may pull more entropy than they need exhausting /dev/urandom
. We found a bug years ago in Crypt::Random which was doing just that.
- (Windows) That weak hashing algorithm.
Next to the 32 bit issue, this is probably the weakest link.
- What random function is it using?
Once a seed has been provided, what random number function is it passing it to? A poor rand function makes it easier to guess the seed. Perl looks for several, usually winding up with drand48
. You can see what its using with: use Config; print $Config{randfunc}'
. I have no idea how well that works, but the OS X drand48 man page says random(3)
is more powerful and the Linux man page says drand48 is obsolete.
The function hasn't been touched since... oh dear the late 90s. It was moved to util.c but hasn't been seriously touched. git blame 132efe8bfb7cd0fb1beb15aaf284e33bf44eb1fa^ pp.c
shows the real history, look for S_seed
. It probably needs some love. Most other languages have more advanced random number generators.