controlling seeds with mclapply
Asked Answered
Y

1

10

Imagine we are doing a number of processes where i want to set one overall seed at the beginning of a program: e.g.

mylist <- list( as.list(rep(NA,3)), as.list(rep(NA,3)) )
foo <- function(x){  for(i in 1:length(x)){ 
                       x[[i]] <- sample(100,1)
                         }
                      return(x) 
                     } 

# start block
set.seed(1)
l1 <- lapply(mylist, foo)
l2 <- lapply(mylist, foo)
# end

of course within a block l1 and l2 are different, but if i run the above block again l1 will be the same as before and l2 will be the same as before.

Imagine foo is horribly time consuming so i want to use mclapply not lapply, so i do:

library(parallel)

# start block
set.seed(1)
mclapply(mylist , foo,  mc.cores = 3)
mclapply(mylist , foo,  mc.cores = 3)
# end

If i run this block again I will get different results the next time. How do I produce the same behaviour as with setting one overall seed with lapply but using mclappy. I have looked through mclapply doc but I am not sure because using:

set.seed(1)
l1 <-  mclapply(mylist , foo,  mc.cores = 3, mc.set.seed=FALSE)
l2 <-  mclapply(mylist , foo,  mc.cores = 3, mc.set.seed=FALSE)

result in l1 and l2 being the same, which is not what I want...

Yamada answered 26/5, 2015 at 10:53 Comment(4)
You can use clusterSetupRNG ... #8358598Impostume
cheers but the library(doRNG) example appears to be dated and no longer functional and the clusterSetupRNG is not really quite what i requested, unless you can show me otherwise.Yamada
Seems to have changed slightly... have a look at pg three of the doRNG reference manual. Or use snowImpostume
Actually the help page for ?'%dorng%' gives an exampleImpostume
C
11

The parallel package comes with special support for the "L'Ecuyer-CMRG" random number generator which was introduced at the same time as parallel. You can read the documentation for that support using:

library(parallel)
?mc.reset.stream

To use it, you first need to enable "L'Ecuyer-CMRG":

RNGkind("L'Ecuyer-CMRG")

After doing that, code such as:

set.seed(1)
mclapply(mylist, foo, mc.cores=3)
mclapply(mylist, foo, mc.cores=3)

will be reproducible, but the two calls to mclapply will return identical results. This is because the state of the random number generator in the master process isn't changed by calling mclapply.

I've used the following function to skip over the random number streams used by the mclapply workers:

skip.streams <- function(n) {
  x <- .Random.seed
  for (i in seq_len(n))
    x <- nextRNGStream(x)
  assign('.Random.seed', x, pos=.GlobalEnv)
}

You can use this function to get the behavior that I think you want:

set.seed(1)
mclapply(mylist, foo, mc.cores=3)
skip.streams(3)
mclapply(mylist, foo, mc.cores=3)
skip.streams(3)
Crenulation answered 26/5, 2015 at 21:21 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.