What are the weaknesses of Perl's srand() default seed, post version 5.004?
Asked Answered
F

1

5

I can find plenty of documentation as to issues with use of time() prior to Perl version 5.004, but nothing following.

For a homework assignment, we are asked to reverse-engineer a program's results based on the assumption that the default Perl srand() is still flawed in default seeding. The changelog for the perl 5.004 release states that the srand() default seed is now based on "a heavy mix of difficult-to-predict system-dependent values."

Is that the case, and, if so, what are those values and do they have any inherent weaknesses?

Filippa answered 19/9, 2012 at 14:39 Comment(1)
perl5.git.perl.org/perl.git/blob/HEAD:/util.c#l5563 seems like the place to look (now). It apparently used to live in pp.c. perl5.git.perl.org/perl.git/blob/HEAD:/pp.c#l2700 appears to call it, but I don't pretend to grok any of that.Socket
P
3

(I am not a cryptographer but I've absorbed a lot over the years. I had to help scrutinize a client's random number generation years back which is what lead to finding that Crypt::Random bug mentioned below.)

The seed code makes more sense if you properly indent all those ifdefs. This is the code in 5.16.0.

U32
Perl_seed(pTHX)
{
    dVAR;
    /*
     * This is really just a quick hack which grabs various garbage
     * values.  It really should be a real hash algorithm which
     * spreads the effect of every input bit onto every output bit,
     * if someone who knows about such things would bother to write it.
     * Might be a good idea to add that function to CORE as well.
     * No numbers below come from careful analysis or anything here,
     * except they are primes and SEED_C1 > 1E6 to get a full-width
     * value from (tv_sec * SEED_C1 + tv_usec).  The multipliers should
     * probably be bigger too.
     */
#if RANDBITS > 16
#  define SEED_C1   1000003
#  define SEED_C4   73819
#else
#  define SEED_C1   25747
#  define SEED_C4   20639
#endif

#define   SEED_C2   3
#define   SEED_C3   269
#define   SEED_C5   26107

#ifndef PERL_NO_DEV_RANDOM
    int fd;
#endif

    U32 u;

#ifdef VMS
#  include <starlet.h>
    /* when[] = (low 32 bits, high 32 bits) of time since epoch
     * in 100-ns units, typically incremented ever 10 ms.        */
   unsigned int when[2];
#else
#  ifdef HAS_GETTIMEOFDAY
       struct timeval when;
#  else
       Time_t when;
#  endif
#endif

/* This test is an escape hatch, this symbol isn't set by Configure. */
#ifndef PERL_NO_DEV_RANDOM
#    ifndef PERL_RANDOM_DEVICE
         /* /dev/random isn't used by default because reads from it will block
          * if there isn't enough entropy available.  You can compile with
          * PERL_RANDOM_DEVICE to it if you'd prefer Perl to block until there
          * is enough real entropy to fill the seed. */
#        define PERL_RANDOM_DEVICE "/dev/urandom"
#    endif
     fd = PerlLIO_open(PERL_RANDOM_DEVICE, 0);
     if (fd != -1) {
        if (PerlLIO_read(fd, (void*)&u, sizeof u) != sizeof u)
        u = 0;
    PerlLIO_close(fd);
    if (u)
        return u;
    }
#endif

#ifdef VMS
    _ckvmssts(sys$gettim(when));
    u = (U32)SEED_C1 * when[0] + (U32)SEED_C2 * when[1];
#else
#  ifdef HAS_GETTIMEOFDAY
        PerlProc_gettimeofday(&when,NULL);
        u = (U32)SEED_C1 * when.tv_sec + (U32)SEED_C2 * when.tv_usec;
#  else
        (void)time(&when);
        u = (U32)SEED_C1 * when;
#  endif
#endif

    u += SEED_C3 * (U32)PerlProc_getpid();
    u += SEED_C4 * (U32)PTR2UV(PL_stack_sp);

#ifndef PLAN9           /* XXX Plan9 assembler chokes on this; fix needed  */
    u += SEED_C5 * (U32)PTR2UV(&when);
#endif

    return u;
}

The code is so confusing because it's really several different ways of getting entropy all interleaved together. There's basically two paths: the system random device and gathering from the state of the interpreter and environment.

  • The system random device.

This is the simplest and probably strongest method. If your OS has a random device which does not block, ie. /dev/urandom read 32 bits from it. Done! #ifndef PERL_NO_DEV_RANDOM (nice double negative) controls that bit. This is done on pretty much every Unix system. At this point the analysis of Perl's random seed switches to the implementation of your particular OS' /dev/urandom.

  • Derive something from the clock, pid and stack pointer.

If your system does not have a random device, basically Windows, Perl falls back to deriving a seed by mixing up some hopefully hard to predict system values.

  • The time in microseconds or just seconds, depends on if gettimeofday() exists.
  • The process id, PerlProc_getpid().
  • The memory location of the current stack pointer, PTR2UV(PL_stack_sp).

What it should do with that information, and this is what the big comment at the start is about, is mash them together using a real hashing algorithm. Instead it multiplies them by various constants (SEED_C1, SEED_C2 and so on) and adds them up. This is sure to be flawed.

All of that information is, in theory, predictable. I don't know what the state of the art is in predicting system information, but time + pid + stack pointer is a fairly common method of getting entropy and there's sure to be papers on the subject.

There is an additional flaw in common with all of Perl's methods, it does this all using only 32 bits even on 64 bit machines. It will not pull 64 bits out of /dev/urandom, just 32. It will only look at 32 bits of the process id, stack pointer or time information even if there's 64 bits of information.

After reading through the code I three concerns.

  • Its use of just 32 bits of randomness.

It's possible multi-GPU system could brute force that.

  • (Unix) How good is your /dev/urandom.

/dev/urandom can run out of entropy if you pull too much from it too fast. Instead of blocking it will generate weaker entropy. This is out of Perl's control but is a system wide weakness. In addition, some programs may pull more entropy than they need exhausting /dev/urandom. We found a bug years ago in Crypt::Random which was doing just that.

  • (Windows) That weak hashing algorithm.

Next to the 32 bit issue, this is probably the weakest link.

  • What random function is it using?

Once a seed has been provided, what random number function is it passing it to? A poor rand function makes it easier to guess the seed. Perl looks for several, usually winding up with drand48. You can see what its using with: use Config; print $Config{randfunc}'. I have no idea how well that works, but the OS X drand48 man page says random(3) is more powerful and the Linux man page says drand48 is obsolete.

The function hasn't been touched since... oh dear the late 90s. It was moved to util.c but hasn't been seriously touched. git blame 132efe8bfb7cd0fb1beb15aaf284e33bf44eb1fa^ pp.c shows the real history, look for S_seed. It probably needs some love. Most other languages have more advanced random number generators.

Papule answered 20/10, 2012 at 23:48 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.