Using jags.parallel from within a function (R language Error in get(name, envir = envir) : object 'y' not found)

Asked 21/5, 2014 at 17:50 Answered 24/5, 2023 at 9:57

Solved r parallel-processing r2jags jags.parallel

Using jags.parallel from the command line or a script works fine. I can run this modified example from http://www.inside-r.org/packages/cran/R2jags/docs/jags just fine

# An example model file is given in:
  model.file <- system.file(package="R2jags", "model", "schools.txt")
#=================#
# initialization  #
#=================#

  # data
  J <- 8.0
  y <- c(28.4,7.9,-2.8,6.8,-0.6,0.6,18.0,12.2)
  sd <- c(14.9,10.2,16.3,11.0,9.4,11.4,10.4,17.6)

  jags.data <- list("y","sd","J")
  jags.params <- c("mu","sigma","theta")
  jags.inits <- function(){
    list("mu"=rnorm(1),"sigma"=runif(1),"theta"=rnorm(J))
  }


#===============================#
# RUN jags and postprocessing   #
#===============================#
#  jagsfit <- jags(data=jags.data, inits=jags.inits, jags.params, 
#    n.iter=5000, model.file=model.file)

  # Run jags parallely, no progress bar. R may be frozen for a while, 
  # Be patient. Currenlty update afterward does not run parallelly

  print("Running Parallel") 
  jagsfit <- jags.parallel(data=jags.data, inits=jags.inits, jags.params, 
    n.iter=5000, model.file=model.file)

However if I wrap it in a function

testparallel <- functions(out){
# An example model file is given in:
    .
    .
    .
jagsfit <- jags.parallel(data=jags.data, inits=jags.inits, jags.params, 
  n.iter=5000, model.file=model.file)
print(out)
return(jagsfit)
}

Then I get the error: Error in get(name, envir = envir) : object 'y' not found Based on what I found here I know that it is an issue with the environment exported to the cluster and I have fixed it by changing

J <- 8.0
y <- c(28.4,7.9,-2.8,6.8,-0.6,0.6,18.0,12.2)
sd <- c(14.9,10.2,16.3,11.0,9.4,11.4,10.4,17.6)

  assign("J",8.0,envir=globalenv()) 
  assign("y",c(28.4,7.9,-2.8,6.8,-0.6,0.6,18.0,12.2),envir=globalenv()) 
  assign("sd",c(14.9,10.2,16.3,11.0,9.4,11.4,10.4,17.6),envir=globalenv())

Is there a better way to get around this?

Thank you, Greg

P.S.

I am working on this code for someone else so I don't really want to changes things in the R2jags package to let me pass in the environment to export though I plan on suggesting it to the authors of the package.

Somewhat answered 21/5, 2014 at 17:50 Comment(0)

So I have contacted the author of R2jags and he has added an addition argument to jags.parallel that lets you pass envir, which is then past onto clusterExport.

This works well except it allows clashes between the name of my data and variables in the jags.parallel function.

Somewhat answered 10/2, 2015 at 23:2 Comment(0)

if you use intensively JAGS in parrallel I can suggest you to look the package rjags combined with the package dclone. I think dclone is realy powerfull because the syntaxe was exactly the same as rjags. I have never see your problem with this package.

If you want to use R2jags I think you need to pass your variables and your init function to the workers with the function:

clusterExport(cl, list("jags.data", "jags.params", "jags.inits"))

Renick answered 23/5, 2014 at 14:38 Comment(3)

The clusterExport call is inside of jags.parallel. The problem is that clusterExport automatically exports the global environment. Since I am calling jags.parallel from within another function the variables (jags.data, etc.) are in the function environment not the global environment. Thus the processors don't get the variables and throws the error I cite. clusterExport does have an optional argument for selecting which environment to export, but jags.parallel does not. I have suggested a change to the authors. I am working on this code for someone else I don't want to edit the package myself. – Somewhat 28/5, 2014 at 13:56

I saw dclone, but I am new to both jags and R and working on large amount of preexisting code, so I was looking for a way to parallelize the code requiring the smallest amount of changes. If you know of a good source for learning about rjags and how to use dclone I'll look into it more. Thank you for answering. – Somewhat 28/5, 2014 at 14:0

After checking the jags.parallel function I think they have not the argument you need to find a solution to your problem. But with the dclone package the makeSOCKcluster function are not in the jag.parfit function. So you can control the environment of your variable. The equivalent fonction to jags.parallel is jags.parfit. The main difference in the arguments is the cl (clusters created before to run the function). The manual of jags and the exemples of the jags.parfit function are relatively substantial I think. – Renick 28/5, 2014 at 15:27

Without changing the code of R2jags, you can still assign those data variables to the global environment in an easier way by using list2env.

Obviously, there is is a concern that those variable names could be overwritten in the global environment, but you probably can control for that.

Below is the same code as the example given in the original post except I put the data into a list and sent that list's data into the global environment using the list2env function. (Also I took out the unused "out" variable in the function.) This currently runs fine for me; you may have to add more chains and/or add more iterations to see the parallelism in action, though.

testparallel <- function(){

    library(R2jags)

    model.file <- system.file(package="R2jags", "model", "schools.txt")

    # Make a list of the data with named items.
    jags.data.v2 <- list(
        J=8.0, 
        y=c(28.4,7.9,-2.8,6.8,-0.6,0.6,18.0,12.2),
        sd=c(14.9,10.2,16.3,11.0,9.4,11.4,10.4,17.6) )

    # Store all that data explicitly in the globalenv() as
    # was previosly suggesting using the assign(...) function.
    # This will do that for you.
    # Now R2jags will have access to the data without you having 
    # to explicitly "assign" each to the globalenv.
    list2env( jags.data.v2, envir=globalenv() )

    jags.params <- c("mu","sigma","theta")
    jags.inits <- function(){
        list("mu"=rnorm(1),"sigma"=runif(1),"theta"=rnorm(J))
    }

    jagsfit <- jags.parallel(
        data=names(jags.data.v2), 
        inits=jags.inits, 
        jags.params, 
        n.iter=5000, 
        model.file=model.file)

    return(jagsfit)
}

Malva answered 10/2, 2015 at 2:32 Comment(1)

Thank you. My problem was that I didn't like having to make the variables global, but it is nice to there is a better way to make them global. – Somewhat 10/2, 2015 at 22:56

This is esentially what r2jags is doing, but rewritten so you can put those variables into the environment by hand in clusterExport(), which loads variables into the blank R session set up separately for each cluster:

jinits <- function(){list(.RNG.name = 'lecuyer::RngStream',
                            .RNG.seed = round(1e+06*runif(1)))}
cl <- makeCluster(mcmc_params$nchains,methods=F, type="PSOCK")
JAGSmod <- function(seed){
      set.seed(seed) #note this affects jinits, but not rjags itself
      jags_mod <- jags.model(mod_path,datastruct,inits=jinits,n.adapt=mcmc_params$nadapt) 
      update(jags_mod, n.iter=mcmc_params$nburnin)
      mod_samp <- coda.samples(jags_mod, monitorparams, n.iter=mcmc_params$nsamples, thin=mcmc_params$nthin)
      return(mod_samp)
    }
clusterExport(cl,c('mod_path','datastruct','mcmc_params','monitorparams','jinits'),envir=environment()) ## could just do 'JAGSmod','jags.model','coda.samples','load.module' here instead but:
clusterEvalQ(cl, {
      library(rjags)
      load.module('lecuyer')
      load.module('glm')
    }) #runs code in each blank cl instance of R
res <- parLapply(cl,1:mcmc_params$nchains,JAGSmod)
stopCluster(cl)
l_mcmc <- as.mcmc.list(lapply(res,as.mcmc))
parsum <- summary(window(l_mcmc))

This can easily be wrapped in a function, although might require you to pass in c(datastruct,mcmc_params,monitorparams).

Pard answered 24/5, 2023 at 9:57 Comment(0)

Recommended topics

Hot tags