Is it possible to get a progress bar with foreach and a "multicore-kind" of backend
Asked Answered
D

3

13

While using "multicore" parallelism using foreach and the doMC backend (I use doMC as at the time I looked into it other package did not allow logging from the I would like to get a progress bar, using the progress package, but any progress (that works on a linux terminal ie no tcltk popups) could do.

Given it uses forking I can imagine it might not be possible but I am not sure.

The intended use is to indicate progress when I load an concatenate 100's of files in parallel (usually within a #!Rscript)

I've looked at the few posts like How do you create a progress bar when using the “foreach()” function in R?. Happy to award a bounty on this.

EDIT

500 points bounty offered for someone showing me how to

  1. using foreach and a multicore (forking) type of parallelism
  2. get a progress bar
  3. get logging using futile.logger

Reprex

# load packages                                                                                                        
library("futile.logger")                                                                                               
library("data.table")                                                                                                  
library("foreach")                                                                                                     
# create temp dir                                                                                                      
tmp_dir <- tempdir()                                                                                                   
# create names for 200 files to be created                                                                             
nb_files <- 200L                                                                                                       
file_names <- file.path(tmp_dir, sprintf("file_%s.txt", 1:nb_files))                                                   
# make it reproducible                                                                                                 
set.seed(1L)                                                                                                           
nb_rows <- 1000L                                                                                                       
nb_columns <- 10L                                                                                                      
# create those 200 files sequentially                                                                                  
foreach(file_i = file_names) %do%                                                                                      
{                                                                                                                      
    DT <- as.data.table(matrix(data = runif(n = nb_rows * nb_columns), nrow = nb_rows))                                
    fwrite(x = DT, file = file_i)                                                                                      
    flog.info("Creating file %s", file_i)                                                                              
} -> tmp                                                                                                               

# Load back the files                                                                                                  
foreach(file_i = file_names, .final = rbindlist) %dopar%                                                               
{                                                                                                                      
    flog.info("Loading file %s", file_i)                                                                               
    # >>> SOME PROGRESS BAR HERE <<<                                                                                   
    fread(file_i)                                                                                                      
} -> final_data                                                                                                        
# show data                                                                                                            
final_data                                                                                                             

Desired output

Note that the progress bar is not messed up with the print lines)

INFO [2018-07-18 19:03:48] Loading file /tmp/RtmpB13Tko/file_197.txt
INFO [2018-07-18 19:03:48] Loading file /tmp/RtmpB13Tko/file_198.txt
INFO [2018-07-18 19:03:48] Loading file /tmp/RtmpB13Tko/file_199.txt
INFO [2018-07-18 19:03:48] Loading file /tmp/RtmpB13Tko/file_200.txt
[ =======>                          ] 4% 

EDIT 2

After the bounty ended nothing comes close to the expected result.

Logging within the progress bar messes everything. If someone gets the correct result I'll give another result-based bounty.

Dishwasher answered 6/7, 2018 at 15:9 Comment(3)
Related questions: #28222736, https://mcmap.net/q/371759/-parsapply-and-progress-bar, #44450970Undertrump
Can you please add example how you specify parallel process (using doMC).Hippy
Or even better - can you give an actual code snippet that you want to run/parallelize with progress bar?Hippy
H
3

Here's a solution (not perfect) using custom function.

This function outputs to console (using message) progress bar.

  • ii is current iteration.
  • N is total number of iterations to perform.
  • per is step (percent) when to update the progress bar. We need this as when multiple iterations are being performed progress bar gets updated too often and output is messed up.

Function:

progBar <- function(ii, N, per = 10) {
    if (ii %in% seq(1, N, per)) {
        x <- round(ii * 100 / N)
        message("[ ", 
                paste(rep("=", x), collapse = ""),
                paste(rep("-", 100 - x), collapse = ""), 
                " ] ", x, "%", "\r",
                appendLF = FALSE)
        if (ii == N) cat("\r")
    }
}

Code to test:

library(doMC)
library(foreach)
registerDoMC(10)

nIteration <- 1e3
foreach(i = 1:nIteration, ii = icount()) %dopar% {
    # For progBar ii I'm using icount(), because
    # user might iterate over all kind of objects
    progBar(ii, nIteration)
    Sys.sleep(1 / 10)
}

enter image description here

PS: It's not perfect, because:

  • Bar not always runs to 100% (depending on the number of iterations it can stop at 99%)
  • Sometimes output messes up (depends on number of iterations and how often they switch) - still debugging this part
  • Console is not flushed if you use print/cat within foreach
Hippy answered 10/7, 2018 at 14:10 Comment(1)
Is it possible if I use doParallel instead of doMCHumpbacked
N
2

You can refer to this link Progress bar parallel for the few insights (May be not the exact solution) which will help in creating a progress bar parallel.

The txtProgressBar only works when the stype is 2 or 3

library("foreach")
library("doParallel")
library("progress")

registerDoParallel(parallel::makeCluster(7, outfile = ""))

pb <- progress_bar$new(
            format = " [:bar] :percent in :elapsed",
            total = 30, clear = FALSE, width = 80, force = T)
a <- foreach (i  = 1:30) %dopar% {
    pb$tick()
    Sys.sleep(0.5)
}


pb <- txtProgressBar(title = "Iterative training", min = 0, max = 30, style = 3)

foreach (i  = 1:30) %dopar% {
    setTxtProgressBar(pb, i)
    Sys.sleep(0.5)
}

Do refer this link Monitoring the function with progress bar for the different ways a progress bar can be implemented depending on the needs.

Using Multicore: You can register a different parallel backend later, or deregister doMC by registering the sequential backend by calling the registerDoSEQ function. For example consider the following program

> x <- iris[which(iris[,5] != "setosa"), c(1,5)]
> trials <- 10000
> ptime <- system.time({
+ r <- foreach(icount(trials), .combine=cbind) %dopar% {
+ ind <- sample(100, 100, replace=TRUE)
+ result1 <- glm(x[ind,2]~x[ind,1], family=binomial(logit))
+ coefficients(result1)
+ }
+ })[3]
> ptime
Nunci answered 15/7, 2018 at 7:37 Comment(3)
This is not using multicore parallelism is it?Dishwasher
@statquant.. I have updated the above answer with a small example on multicoreNunci
It doesn't work on Windows 10, R 3.6.1, my bet is that it's an OS problem. The progress bar won't progress, not even with flush.console()Antimonous
L
-1

A package I have used that does this in parallel for processing lists is pbmcapply, hope this helps.

Lauds answered 6/7, 2018 at 19:5 Comment(3)
well I 'd like to keep the foreach syntax that I like. Is that a multicore type of parallelism?Dishwasher
It is multicore, it operates the same way using lapply does, but in addition you can specify the number of cores you'd like to use. I found it to be most beneficial when wanting to parallelize a function over an object / list of dataframes, works similar to map, but in parallel. cran.r-project.org/web/packages/pbmcapply/index.htmlLauds
OK so it means it should be doable with foreach thenDishwasher

© 2022 - 2024 — McMap. All rights reserved.