Below is a MWE of my problem: I have programmed a progress bar for some function using the bootstrap (via the boot function from the boot package).
This works fine as long as I don't use parallel processing (res_1core
below). If I want to use parallel processing by setting parallel = "multicore"
and ncpus = 2
, the progress bar isn't displayed properly (res_2core
below).
library(boot)
rsq <- function(formula, data, R, parallel = c("no", "multicore", "snow"), ncpus = 1) {
env <- environment()
counter <- 0
progbar <- txtProgressBar(min = 0, max = R, style = 3)
bootfun <- function(formula, data, indices) {
d <- data[indices,]
fit <- lm(formula, data = d)
curVal <- get("counter", envir = env)
assign("counter", curVal + 1, envir = env)
setTxtProgressBar(get("progbar", envir = env), curVal + 1)
return(summary(fit)$r.square)
}
res <- boot(data = data, statistic = bootfun, R = R, formula = formula, parallel = parallel, ncpus = ncpus)
return(res)
}
res_1core <- rsq(mpg ~ wt + disp, data = mtcars, R = 1000)
res_2core <- rsq(mpg ~ wt + disp, data = mtcars, R = 1000, parallel = "multicore", ncpus = 2)
I have read that this is related to the fact that the boot function calls on lapply
for single core processing and mclapply
for multicore processing. Does anyone know of an easy workaround to deal with this? I mean, I would like to display the progress taking into account all of the parallel processes.
Update
Thanks to the input of Karolis Koncevičius, I have found a workaround (just use the updated rsq
function below):
rsq <- function(formula, data, R, parallel = c("no", "multicore", "snow"), ncpus = 1) {
bootfun <- function(formula, data, indices) {
d <- data[indices,]
fit <- lm(formula, data = d)
return(summary(fit)$r.square)
}
env <- environment()
counter <- 0
progbar <- txtProgressBar(min = 0, max = R, style = 3)
flush.console()
intfun <- function(formula, data, indices) {
curVal <- get("counter", envir = env) + ncpus
assign("counter", curVal, envir = env)
setTxtProgressBar(get("progbar", envir = env), curVal)
bootfun(formula, data, indices)
}
res <- boot(data = data, statistic = intfun, R = R, formula = formula, parallel = parallel, ncpus = ncpus)
return(res)
}
Unfortunately, this only works for multicore processing when I run R from the terminal. Any ideas how to patch this so it also displays properly in R console or Rstudio?
mclapply
, which is called byboot
whenparallel='multicore'
: "It is strongly discouraged to use these functions in GUI or embedded environments, because it leads to several processes sharing the same GUI which will likely cause chaos (and possibly crashes)." stat.ethz.ch/R-manual/R-devel/library/parallel/html/… – Kiddy