txtProgressBar for parallel bootstrap not displaying properly
Asked Answered
H

1

6

Below is a MWE of my problem: I have programmed a progress bar for some function using the bootstrap (via the boot function from the boot package).

This works fine as long as I don't use parallel processing (res_1core below). If I want to use parallel processing by setting parallel = "multicore" and ncpus = 2, the progress bar isn't displayed properly (res_2core below).

library(boot)

rsq <- function(formula, data, R, parallel = c("no", "multicore", "snow"), ncpus = 1) {
  env <- environment()
  counter <- 0
  progbar <- txtProgressBar(min = 0, max = R, style = 3)
  bootfun <- function(formula, data, indices) {
    d <- data[indices,]
    fit <- lm(formula, data = d)
    curVal <- get("counter", envir = env)
    assign("counter", curVal + 1, envir = env)
    setTxtProgressBar(get("progbar", envir = env), curVal + 1)
    return(summary(fit)$r.square)
  }
  res <- boot(data = data, statistic = bootfun, R = R, formula = formula, parallel = parallel, ncpus = ncpus)
  return(res)
}

res_1core <- rsq(mpg ~ wt + disp, data = mtcars, R = 1000)
res_2core <- rsq(mpg ~ wt + disp, data = mtcars, R = 1000, parallel = "multicore", ncpus = 2)

I have read that this is related to the fact that the boot function calls on lapply for single core processing and mclapply for multicore processing. Does anyone know of an easy workaround to deal with this? I mean, I would like to display the progress taking into account all of the parallel processes.

Update

Thanks to the input of Karolis Koncevičius, I have found a workaround (just use the updated rsq function below):

rsq <- function(formula, data, R, parallel = c("no", "multicore", "snow"), ncpus = 1) {
  bootfun <- function(formula, data, indices) {
    d <- data[indices,]
    fit <- lm(formula, data = d)
    return(summary(fit)$r.square)
  }

  env <- environment()
  counter <- 0
  progbar <- txtProgressBar(min = 0, max = R, style = 3)
  flush.console()

  intfun <- function(formula, data, indices) {
    curVal <- get("counter", envir = env) + ncpus
    assign("counter", curVal, envir = env)
    setTxtProgressBar(get("progbar", envir = env), curVal)
    bootfun(formula, data, indices)
  }
  res <- boot(data = data, statistic = intfun, R = R, formula = formula, parallel = parallel, ncpus = ncpus)
  return(res)
}

Unfortunately, this only works for multicore processing when I run R from the terminal. Any ideas how to patch this so it also displays properly in R console or Rstudio?

Hermia answered 19/11, 2014 at 14:21 Comment(2)
it's not surprising that this is hard -- you'd have to set up a 'listener' of some sort to receive progress messages from the individual workers ...Landbert
Regarding mclapply, which is called by boot when parallel='multicore': "It is strongly discouraged to use these functions in GUI or embedded environments, because it leads to several processes sharing the same GUI which will likely cause chaos (and possibly crashes)." stat.ethz.ch/R-manual/R-devel/library/parallel/html/…Kiddy
R
6

Not exactly what you ordered, but might be helpful.

A simple statistics function to boot:

library(boot)

bootfun <- function(formula, data, indices) {
    d <- data[indices,]
    fit <- lm(formula, data=d)
    summary(fit)$r.square
}

Higher order function to display progress:

progressReporter <- function(total, nBars=100, f, ...) {
    count <- 1
    step <- ceiling(total/nBars)
    cat(paste(rep("|", nBars), collapse=""), "\r")
    flush.console()
    function(...) {
        if (count %% step==0) {
            cat(".")
        }
        count <<- count + 1
        f(...)
    }
}

Now this function is cheating - it displays progress every "step" of iterations. If you have 1000 iterations, use two cores and print every 10th iteration - it will do the job. The cores don't share state, but they each will run the counter up to 500, and the function will respond to both counters.

On the other hand if you do 1000 iterations, run 10 cores and report every 200 - the function will stay silent, as all the cores will count to 100 each. None will reach 200 - no progress bar. Hope you get the idea. I think it should be ok in most of the cases.

Try it out:

res_1core <- boot(formula="mpg ~ wt + disp", data=mtcars, R=1000, statistic=progressReporter(1000, nBars=100, f=bootfun))
res_2core <- boot(formula="mpg ~ wt + disp", data=mtcars, R=1000, statistic=progressReporter(1000, nBars=100, f=bootfun), parallel="multicore", ncpus=2)
Refugiorefulgence answered 22/11, 2014 at 21:34 Comment(4)
Many thanks! I get the idea. I've got two questions, though. 1) Is there any reason why this only works for multicore processing if I run R from terminal (rather than via R console or Rstudio on mac)? 2) Did you resort to your own kind of progress bar because this approach won't work with txtProgressBar? Or did you did this for no particular reason?Hermia
1) I cannot answer first question yet, as I have never used Rstudio or R console. But I am interested why it would not work. Maybe "\r" is not working hmm. 2) The second question - txtProgressBar has states and it needs to keep track of some counter. However with multiprocessing I was getting back separate counters for each. So I thought: I am able to "append" something every time I get back a number, but I cannot keep the track of the progress. I solved this with "cat" by constructing that error-bar :) The progress is stored on screen (number of dots), but not internally in R.Featureless
So for example the "count" in the progressReporter will reach 500 two times (with 2 cores). But it will never reach 1000. So I cannot use a number to track states. But with cat(".") it reaches 500 two times and adds a dot each step. It does not matter that 1000 won't be reached - instead each step (in the example step=10) will be reached twice. It's a hack but I liked it because it's really simple.Featureless
Thanks! I've actually managed to integrate your idea using txtProgressBar. However, it still only works when I run R from terminal. I'll skim the internet and see whether I can find a solution for that, as this must be an issue beyond the current one.Hermia

© 2022 - 2024 — McMap. All rights reserved.