Exceeded maximum number of DLLs in R
Asked Answered
C

2

12

I am using RStan to sample from a large number of Gaussian Processes (GPs), i.e., using the function stan(). For every GP that I fit, another DLL gets loaded, as can be seen by running the R command

getLoadedDLLs()

The problem I'm running into is that, because I need to fit so many unique GPs, I'm exceeding the maximum number of DLLs that can be loaded, at which point I receive the following error:

Error in dyn.load(libLFile) : 
unable to load shared object '/var/folders/8x/n7pqd49j4ybfhrm999z3cwp81814xh/T//RtmpmXCRCy/file80d1219ef10d.so':
maximal number of DLLs reached...

As far as I can tell, this is set in Rdynload.c of the base R code, as follows:

#define MAX_NUM_DLLS 100

So, my question is, what can be done to fix this? Building R from source with a larger MAX_NUM_DLLS isn't an option, as my code will be run by collaborators who wouldn't be comfortable with that process. I've tried the naive approach of just unloading DLLs using dyn.unload() in the hopes that they'd just be reloaded when they're needed again. The unloading works fine, but when I try to use the fit again, R fairly unsurprisingly crashes with an error like:

*** caught segfault ***
address 0x121366da8, cause 'memory not mapped'

I've also tried detaching RStan in the hopes that the DLLs would be automatically unloaded, but they persist even after unloading the package (as expected, given the following in the help for detach: "detaching will not in general unload any dynamically loaded compiled code (DLLs)").

From this question, Can Rcpp package DLLs be unloaded without restarting R?, it seems that library.dynam.unload() might have some role in the solution, but I haven't had any success using it to unload the DLLs, and I suspect that after unloading the DLL I'd run into the same segfault as before.

EDIT: adding a minimal, fully-functional example:

The R code:

require(rstan)

x <- c(1,2)
N <- length(x)

fits <- list()
for(i in 1:100)
{
    fits[i] <- stan(file="gp-sim.stan", data=list(x=x,N=N), iter=1, chains=1)
}

This code requires that the following model definition be in the working directory in a file gp-sim.stan (this model is one of the examples included with Stan):

// Sample from Gaussian process
// Fixed covar function: eta_sq=1, rho_sq=1, sigma_sq=0.1

data {
  int<lower=1> N;
  real x[N];
}
transformed data {
   vector[N] mu;
   cov_matrix[N] Sigma;
   for (i in 1:N) 
     mu[i] <- 0;
   for (i in 1:N) 
     for (j in 1:N)
       Sigma[i,j] <- exp(-pow(x[i] - x[j],2)) + if_else(i==j, 0.1, 0.0);
 }
 parameters {
   vector[N] y;
 }
 model {
   y ~ multi_normal(mu,Sigma);
 }

Note: this code takes quite some time to run, as it is creating ~100 Stan models.

Cap answered 18/7, 2014 at 19:1 Comment(2)
I am surprised that another DLL gets loaded for every process. I wonder if it would be easiest to prevent this from happening in the first place. Can you supply a minimal, but fully functional, example of code that captures your problem?Sawyer
That's a (R)Stan design issue and limitation. Rcpp just helps to create the dynamically loadable library; it has no view on whether it is advisable to load 100s of them. Eventually you will hit an OS limit (beyond the hardcoded R limit you identified) I suspect.Moltke
T
9

I can't speak for the issues regarding dlls, but you shouldn't need to compile the model each time. You can compile the model once and reuse it, which won't cause this problem and it will speed up your code.

The function stan is a wrapper for stan_model which compiles the model and the sampling method which draws samples from the model. You should run stan_model once to compile the model and save it to an object, and then use the sampling method on that object to draw samples.

require(rstan)

x <- c(1,2)
N <- length(x)

fits <- list()
mod <- stan_model("gp-sim.stan")
for(i in 1:100)
{
    fits[i] <- sampling(mod, data=list(x=x,N=N), iter=1, chains=1)
}

This is similar to the problem of running parallel chains, discussed in the Rstan wiki. Your code could by sped up by replace the for loop with something that processes the sampling in parallel.

Temporize answered 18/7, 2014 at 22:35 Comment(1)
For completeness, if you did have a valid reason to load 100 DLLs in an R session, I think you could use the dyn.unload function to unload some of them with dyn.unload(file.path(tempdir(), paste0(get_stanmodel(stanfit)@dso@dso_filename, .Platform$dynlib.ext))), where stanfit is an object produced by the sampling or stanfunctions. Or you could replace get_stanmodel(stanfit) with the object produced by stan_model. However, you would be very limited as to what you could subsequently do with the stanfit object without crashing R (no monitor, print, log_prob, etc.)Younglove
S
1

Here is, what I use to run several stan models in a row (Win10, R 3.3.0).

I needed to not only unload the dll-files but also delete them and other temporary files. Then, the filename for me was different than found in the stan object, as Ben suggested.

 dso_filenames <- dir(tempdir(), pattern=.Platform$dynlib.ext)
  filenames  <- dir(tempdir())
  for (i in seq(dso_filenames))
    dyn.unload(file.path(tempdir(), dso_filenames[i]))
  for (i in seq(filenames))
    if (file.exists(file.path(tempdir(), filenames[i])) & nchar(filenames[i]) < 42) # some files w/ long filenames that didn't like to be removeed
      file.remove(file.path(tempdir(), filenames[i]))
Sitra answered 9/6, 2016 at 13:30 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.