Wrapper to FOR loops with progress bar
Asked Answered
S

8

20

I like to use a progress bar while running slow for loops. This could be done easily with several helpers, but I do like the tkProgressBar from tcltk package.

A small example:

pb <- tkProgressBar(title = "Working hard:", min = 0, max = length(urls), width = 300)
for (i in 1:300) {
    # DO SOMETHING
    Sys.sleep(0.5)
    setTkProgressBar(pb, i, label=paste( round(i/length(urls)*100, 0), "% ready!"))
}
close(pb)

And I would like to set up a small function to store in my .Rprofile named to forp (as: for loop with progressbar), to call just like for but with auto added progress bar - but unfortunately have no idea how to implement and grab the expr part of the loop function. I had some experiments with do.call but without success :(

Imaginary working example (which acts like a for loop but creates a TkProgressBar and auto updates it in each iteration):

forp (i in 1:10) {
    #do something
}

UPDATE: I think the core of the question is how to write a function which not only has parameters in the parentheses after the function (like: foo(bar)), but also can handle expr specified after the closing parentheses, like: foo(bar) expr.


BOUNTY OFFER: would go to any answer that could modify my suggested function to work like the syntax of basic for loops. E.g. instead of

> forp(1:1000, {
+   a<-i
+ })
> a
[1] 1000

it could be called like:

> forp(1:1000) {
+   a<-i
+ }
> a
[1] 1000

Just to clarify the task again: how could we grab the { expression } part of a function call? I am afraid that this is not possible, but will leave on the bounty for a few days for the pros :)

Sedan answered 8/9, 2011 at 14:21 Comment(6)
Can we add something between the forp and the expression, like forp(1:1000) %do% { expression }? If so, it should be similar to what the foreach package does, and maybe even could be used directly. I don't think you can do it without adding that, but am open to being corrected.Langston
Thanks @Aaron for your comment. I hope there could be some kind of solution without extra tweaks in syntax. If not, then the working function below will be just fine without any modification.Sedan
We'll see if anyone comes forward with a way with no modification; in the meantime, I did code up my above suggestion, which at least only needs modification at the top of the loop (no extra ) at the end, that is).Langston
That is cool @Aaron, thanks! If there won't be a solution to escape even a small modification, then the bounty will go to you :)Sedan
You are trying to modify the language. I would be very careful with this... You may forgot many things (like break/continue statements etc.) and prepare ground for future mystery-errors. Be careful with that.Laura
@Tomas, I think the spirit of the question is more "Can I do this without modifying the language, and if not, how close can I get?" I don't think @Sedan is looking to get into the R source and make a new forp command, though it does seem that would be the only way to precisely do what he's hoping for.Langston
F
6

Given the other answers supplied, I suspect that it is impossible tough to do in exactly the way you specify.

However, I believe there is a way of getting very close, if you use the plyr package creatively. The trick is to use l_ply which takes a list as input and creates no output.

The only real differences between this solution and your specification is that in a for loop you can directly modify variables in the same environment. Using l_ply you need to send a function, so you will have to be more careful if you want to modify stuff in the parent environment.

Try the following:

library(plyr)
forp <- function(i, .fun){
  l_ply(i, .fun, .progress="tk")
}

a <- 0
forp(1:100, function(i){
  Sys.sleep(0.01)
  a<<-a+i
  })
print(a)
[1] 5050

This creates a progress bar and modifies the value of a in the global environment.


EDIT.

For the avoidance of doubt: The argument .fun will always be a function with a single argument, e.g. .fun=function(i){...}.

For example:

for(i in 1:10){expr} is equivalent to forp(1:10, function(i){expr})

In other words:

  • i is the looping parameter of the loop
  • .fun is a function with a single argument i
Framboise answered 8/9, 2011 at 16:37 Comment(5)
This does look like a nice version of my pseudocode answer. But: what happens if you want to run an existing function of more than one variable? lply(i, myfunc(x,y)) isn't going to work so far as I can tell.Koheleth
@CarlWitthoft And that's OK, isn't it? Since you can only ever have one variable in a for loop. Any other variables are simply referred to inside the body of the function... Because of scoping up the call stack it will work - in exactly the same way as a for loop.Framboise
Andrie, I think I see yr point. i <- c(1,3,5,6,7,8,9); forp(i,myfunc(x=i,y)) is the way it would work.Koheleth
Thank you very much, this is a neat solution with some compromise (+1). Unfortunately it is bit far from what I am after, but it seems that my goal is unattainable.Sedan
@CarlWitthoft I am not sure that will work. I have edited my answer to give a bit more detail. The equivalent to for(i in seq(1, 9, by=2){expr} would be forp(i=seq(1, 9, by=2), .fun=function(i){expr}). In other words, .fun will always be a function with only a single argument.Framboise
L
6

My solution is very similar to Andrie's except it uses base R, and I second his comments on the need to wrap what you want to do in a function and the subsequent need to use <<- to modify stuff in a higher environment.

Here's a function that does nothing, and does it slowly:

myfun <- function(x, text) {
  Sys.sleep(0.2)
  cat("running ",x, " with text of '", text, "'\n", sep="")
  x
}

Here's my forp function. Note that regardless of what we're actually looping over, it instead loops over the sequence 1:n instead and get the right term of what we actually want within the loop. plyr does this automatically.

library(tcltk)
forp <- function(x, FUN, ...) {
  n <- length(x)
  pb <- tkProgressBar(title = "Working hard:", min = 0, max = n, width = 300)
  out <- vector("list", n)
  for (i in seq_len(n)) {
    out[[i]] <- FUN(x[i], ...)
    setTkProgressBar(pb, i, label=paste( round(i/n*100, 0), "% ready!"))
  }
  close(pb)
  invisible(out)
}

And here's how both for and forp might be used, if all we want to do is call myfun:

x <- LETTERS[1:5]
for(xi in x) myfun(xi, "hi")
forp(x, myfun, text="hi")

And here's how they might be used if we want to modify something along the way.

out <- "result:"
for(xi in x) {
  out <- paste(out, myfun(xi, "hi"))
}

out <- "result:"
forp(x, function(xi) {
    out <<- paste(out, myfun(xi, "hi"))
})

For both versions the result is

> out
[1] "result: A B C D E"

EDIT: After seeing your (daroczig's) solution, I have another idea that might not be quite so unwieldy, which is to evaluate the expression in the parent frame. This makes it easier to allow for values other than i (now specified with the index argument), though as of right now I don't think it handles a function as the expression, though just to drop in instead a for loop that shouldn't matter.

forp2 <- function(index, x, expr) {
  expr <- substitute(expr)
  n <- length(x)
  pb <- tkProgressBar(title = "Working hard:", min = 0, max = n, width = 300)
  for (i in seq_len(n)) {
    assign(index, x[i], envir=parent.frame())
    eval(expr, envir=parent.frame())
    setTkProgressBar(pb, i, label=paste( round(i/n*100, 0), "% ready!"))
  }
  close(pb)
}

The code to run my example from above would be

out <- "result:"
forp2("xi", LETTERS[1:5], {
    out <- paste(out, myfun(xi, "hi"))
})

and the result is the same.

ANOTHER EDIT, based on the additional information in your bounty offer:

The syntax forX(1:1000) %doX$ { expression } is possible; that's what the foreach package does. I'm too lazy right now to build it off of your solution, but building off mine, it could look like this:

`%doX%` <- function(index, expr) {
  x <- index[[1]]
  index <- names(index)
  expr <- substitute(expr)
  n <- length(x)
  pb <- tkProgressBar(title = "Working hard:", min = 0, max = n, width = 300)
  for (i in seq_len(n)) {
    assign(index, x[i], envir=parent.frame())
    eval(expr, envir=parent.frame())
    setTkProgressBar(pb, i, label=paste( round(i/n*100, 0), "% ready!"))
  }
  close(pb)
  invisible(out)
}

forX <- function(...) {
  a <- list(...)
  if(length(a)!=1) {
    stop("index must have only one element")
  }
  a
}

Then the use syntax is this, and the result is the same as above.

out <- "result:"
forX(xi=LETTERS[1:5]) %doX% {
  out <- paste(out, myfun(xi, "hi"))
}
out
Langston answered 8/9, 2011 at 17:5 Comment(2)
Thank you Aaron, that is great too (+1). Does not perfectly fit my demand, but close :)Sedan
Thank you Aaron again, especially for the updated script. As I wrote before, if we cannot find a "perfect" solution, than the bounty should be awarded to you. Thanks!Sedan
E
3

If you use the plyr family of commands instead of a for loop (generally a good idea if possible), you get as an added bonus a whole system of progress bars.

R.utils also has some progress bars built into it, and there exist instructions for using them in for loops.

Ecumenicism answered 8/9, 2011 at 14:43 Comment(1)
Thanks for the answer: plyr is a really great tool in most of the time, but I definitely need for loops sometimes (with complex structures where data is spread in several datasets). Unfortunately the linked resource just show an example like I entered in my question, so just several manual ways of adding a progress bar to a for loop, but no ideas of an automatic progress bar what I am after (with e.g. forp function).Sedan
I
3

R's syntax doesn't let you do exactly what you want, ie:

forp (i in 1:10) {
    #do something
}

But what you can do is create some kind of iterator object and loop using while():

while(nextStep(m)){sleep.milli(20)}

Now you have the problem of what m is and how you make nextStep(m) have side effects on m in order to make it return FALSE at the end of your loop. I've written simple iterators that do this, as well as MCMC iterators that let you define and test for a burnin and thinning period within your loop.

Recently at the R User conference I saw someone define a 'do' function that then worked as an operator, something like:

do(100) %*% foo()

but I'm not sure that was the exact syntax and I'm not sure how to implement it or who it was put that up... Perhaps someone else can remember!

Infundibulum answered 8/9, 2011 at 15:5 Comment(2)
Your latter example looks similar to the foreach syntax from the foreach package.Openminded
Thank you @Infundibulum too! I am not sure right now how your suggestions can help me to make up a forp function, but will try to catch up :) Will report back.Sedan
K
3

What you're hoping for, I think would be something that looks like

body(for)<- as.call(c(as.name('{'),expression([your_updatebar], body(for))))

And yep, the problem is that "for" is not a function, or at least not one whose "body" is accessible. You could, I suppose, create a "forp" function that takes as arguments 1) a string to be turned into the loop counter, e.g., " ( i in seq(1,101,5) )" , and 2) the body of your intended loop, e.g., y[i]<- foo[i]^2 ; points(foo[i],y[i], and then jump thru some getcallparse magic to execute the actual for loop. Then , in pseudocode (not close to actual R code, but I think you see what should happen)

forp<-function(indexer,loopbody) { 

pseudoparse( c("for (", indexer, ") {" ,loopbody,"}") }

Koheleth answered 8/9, 2011 at 16:9 Comment(0)
C
3

The problem is that the for-loop in R is treated special. A normal function is not allowed to look like that. Some small tweaks can make it loop pretty close though. And as @Aaron mentioned, the foreach package's %dopar% paradigm seems like the best fit. Here's my version of how it could work:

`%doprogress%` <- function(forExpr, bodyExpr) {
   forExpr <- substitute(forExpr)
   bodyExpr <- substitute(bodyExpr)

   idxName <- names(forExpr)[[2]]
   vals <- eval(forExpr[[2]])

   e <- new.env(parent=parent.frame())

   pb <- tkProgressBar(title = "Working hard:", min = 0, max = length(vals), width = 300)
   for (i in seq_along(vals)) {
     e[[idxName]] <- vals[[i]]
     eval(bodyExpr, e)
     setTkProgressBar(pb, i, label=paste( round(i/length(vals)*100, 0), "% ready!"))
   }
}


# Example usage:

foreach(x = runif(10)) %doprogress% { 
  # do something
  if (x < 0.5) cat("small\n") else cat("big")
}

As you can see, you have to type x = 1:10 instead of x in 1:10, and the infix operator %<whatever>% is needed to get hold of the looping construct and the loop body. I currently don't do any error checking (to avoid muddling the code). You should check the name of the function ("foreach"), the number of arguments to it (1) and that you actually get a valid loop variable ("x") and not an empty string.

California answered 20/9, 2011 at 18:57 Comment(2)
If you choose to use foreach directly, I'd suggest also using the iteration function in the foreach package, iter; similarly to foreach:::doSEQ.Langston
Thanks Tommy, that is cool also. As @Aaron was faster and especially for %doX% being shorter than %doprogress% the bounty goes to him :) I can only award you with an upvote.Sedan
A
3

I propose hereby two solutions that use the standard for syntax, both are using the great package progress from Gábor Csárdi and Rich FitzJohn

  • 1) we can override temporarily or locally the for function to wrap around base::for and support progress bars.
  • 2) we can define the unused for<-, and wrap around base::for using syntax pb -> for(it in seq) {exp} where pb is progress bar built with progress::progress_bar$new().

Both solutions behave as standard for calls :

  • The values changed at the previous iteration are available
  • on error the modified variables will have the value they had just before the error

I packaged my solution and will demo them below then will go through the code


Usage

#devtools::install_github("moodymudskipper/pbfor")
library(pbfor)

Using pb_for()

By default pb_for() will override the for function for one run only.

pb_for()
for (i in 1:10) {
  # DO SOMETHING
  Sys.sleep(0.5)
}

Using parameters from progress::progress_bar$new() :

pb_for(format = "Working hard: [:bar] :percent :elapsed", 
       callback = function(x) message("Were'd done!"))
for (i in 1:10) {
  # DO SOMETHING
  Sys.sleep(0.5)
}

Using for<-

The only restriction compared to a standard for call is that the first argument must exist and can't be NULL.

i <- NA 
progress_bar$new() -> for (i in 1:10) {
  # DO SOMETHING
  Sys.sleep(0.5)
}

We can define a custom progress bar, and maybe define it conveniently in an initialisation script or in one's R profile.

pb <- progress_bar$new(format = "Working hard: [:bar] :percent :elapsed", 
       callback = function(x) ("Were'd done!"))
pb  -> for (i in 1:10) {
  # DO SOMETHING
  Sys.sleep(0.5)
}

For nested progress bars we can use the following trick :

pbi <- progress_bar$new(format = "i: [:bar] :percent\n\n")
pbj <- progress_bar$new(format = "j: [:bar] :percent  ")
i <- NA
j <- NA
pbi  -> for (i in 1:10) {
  pbj  -> for (j in 1:10) {
    # DO SOMETHING
    Sys.sleep(0.1)
  }
}

note that due to operator precedence the only way to call for<- and benefit from the syntax of for calls is to use the left to right arrow ´->´.


how they work

pb_for()

pb_for() creates a for function object in its parent environment, then the new for :

  • sets up a progress bar
  • modifies the loop content
  • adds a `*pb*`$tick() at the end of the loop content expression
  • feeds it back to base::`for` in a clean environment
  • assigns on exit all modified or created variables to the parent environment.
  • removes itself if once is TRUE (the default)

It's generally sensitive to override an operator, but it cleans after itself and won't affect the global environment if used in a function so I think it's safe enough to use.

for<-

This approach :

  • doesn't override for
  • allows the use of progress bar templates
  • has an arguably more intuitive api

It has a few drawbacks however:

  • its first argument must exist, which is the case for all assignment functions (fun<-).
  • it does some memory magic to find the name of its first argument as it's not easily done with assignment functions, this might have a performance cost, and I'm not 100% sure about the robustness
  • we need the package pryr

What it does :

  • find the name of the first argument, using a helper function
  • clone the progress bar input
  • edit it to account for the number of iterations of the loop (the length of the second argument of for<-

After this it's similar to what is described for pb_for() in the section above.


The code

pb_for()

pb_for <-
  function(
    # all args of progress::progress_bar$new() except `total` which needs to be
    # infered from the 2nd argument of the `for` call, and `stream` which is
    # deprecated
    format = "[:bar] :percent",
    width = options("width")[[1]] - 2,
    complete = "=",
    incomplete = "-",
    current =">",
    callback = invisible, # doc doesn't give default but this seems to work ok
    clear = TRUE,
    show_after = .2,
    force = FALSE,
    # The only arg not forwarded to progress::progress_bar$new()
    # By default `for` will self detruct after being called
    once = TRUE) {

    # create the function that will replace `for`
    f <- function(it, seq, expr){
      # to avoid notes at CMD check
      `*pb*` <- IT <- SEQ <- EXPR <- NULL

      # forward all arguments to progress::progress_bar$new() and add
      # a `total` argument computed from `seq` argument
      pb <- progress::progress_bar$new(
        format = format, width = width, complete = complete,
        incomplete = incomplete, current = current,
        callback = callback,
        clear = clear, show_after = show_after, force = force,
        total = length(seq))

      # using on.exit allows us to self destruct `for` if relevant even if
      # the call fails.
      # It also allows us to send to the local environment the changed/created
      # variables in their last state, even if the call fails (like standard for)
      on.exit({
        vars <- setdiff(ls(env), c("*pb*"))
        list2env(mget(vars,envir = env), envir = parent.frame())
        if(once) rm(`for`,envir = parent.frame())
      })

      # we build a regular `for` loop call with an updated loop code including
      # progress bar.
      # it is executed in a dedicated environment and the progress bar is given
      # a name unlikely to conflict
      env <- new.env(parent = parent.frame())
      env$`*pb*` <-  pb
      eval(substitute(
        env = list(IT = substitute(it), SEQ = substitute(seq), EXPR = substitute(expr)),
        base::`for`(IT, SEQ,{
          EXPR
          `*pb*`$tick()
        })), envir = env)
    }
    # override `for` in the parent frame
    assign("for", value = f,envir = parent.frame())
  }

for<- (and fetch_name())

`for<-` <-
  function(it, seq, expr, value){
    # to avoid notes at CMD check
    `*pb*` <- IT <- SEQ <- EXPR <- NULL
    # the symbol fed to `it` is unknown, R uses `*tmp*` for assignment functions
    # so we go get it by inspecting the memory addresses
    it_chr <- fetch_name(it)
    it_sym <-as.symbol(it_chr)

    #  complete the progress bar with the `total` parameter
    # we need to clone it because progress bars are environments and updated
    # by reference
    pb <- value$clone()
    pb$.__enclos_env__$private$total <- length(seq)

    # when the script ends, even with a bug, the values that have been changed
    # are written to the parent frame
    on.exit({
      vars <- setdiff(ls(env), c("*pb*"))
      list2env(mget(vars, env),envir = parent.frame())
    })

    # computations are operated in a separate environment so we don't pollute it
    # with it, seq, expr, value, we need the progress bar so we name it `*pb*`
    # unlikely to conflict by accident
    env <- new.env(parent = parent.frame())
    env$`*pb*` <-  pb
    eval(substitute(
      env =  list(IT = it_sym, SEQ = substitute(seq), EXPR = substitute(expr)),
      base::`for`(IT, SEQ,{
        EXPR
        `*pb*`$tick()
      })), envir = env)

    # because of the `fun<-` syntax we need to return the modified first argument
    invisible(get(it_chr,envir = env))
  }

helpers:

fetch_name <- function(x,env = parent.frame(2)) {
  all_addresses       <- sapply(ls(env), address2, env)
  all_addresses       <- all_addresses[names(all_addresses) != "*tmp*"]
  all_addresses_short <- gsub("(^|<)[0x]*(.*?)(>|$)","\\2",all_addresses)

  x_address       <- tracemem(x)
  untracemem(x)
  x_address_short <- tolower(gsub("(^|<)[0x]*(.*?)(>|$)","\\2",x_address))

  ind    <- match(x_address_short, all_addresses_short)
  x_name <- names(all_addresses)[ind]
  x_name
}

address2 <- getFromNamespace("address2", "pryr")
Airman answered 4/5, 2019 at 12:27 Comment(0)
S
0

Thanks for everyone for your kind answers! As none of those fit my wacky needs, I started to steal some pieces of the given answers and made up a quite customized version:

forp <- function(iis, .fun) {
    .fun <- paste(deparse(substitute(.fun)), collapse='\n')
    .fun <- gsub(' <- ', ' <<- ', .fun, fixed=TRUE)
    .fun <- paste(.fun, 'index.current <- 1 + index.current; setTkProgressBar(pb, index.current, label=paste( round(index.current/index.max*100, 0), "% ready!"))', sep='\n')
    ifelse(is.numeric(iis), index.max <- max(iis), index.max <- length(iis))
    index.current <- 1
    pb <- tkProgressBar(title = "Working hard:", min = 0, max = index.max, width = 300) 
    for (i in iis) eval(parse(text=paste(.fun)))
    close(pb)
}

This is quite lengthy for a simple function like this, but depends only on base (anf of course: tcltk) and has some nice features:

  • can be used on expressions, not just functions,
  • you do not have to use <<- in your expressions to update global environment, <- are replaced to <<- in the given expr. Well,this might be annoying for someone.
  • can be used with non-numeric indexes (see below). That is why the code become so long :)

Usage is similar to for except for you do not have to specify the i in part and you have to use i as index in the loop. Other drawback is that I did not find a way to grab the {...} part specified after a function, so this must be included in the parameters.

Example #1: Basic use

> forp(1:1000, {
+   a<-i
+ })
> a
[1] 1000

Try it to see the neat progress bar on your computer! :)

Example #2: Looping through some characters

> m <- 0
> forp (names(mtcars), {
+   m <- m + mean(mtcars[,i])
+ })
> m
[1] 435.69
Sedan answered 8/9, 2011 at 20:48 Comment(2)
Just be careful, as a <<- b will be replaced with ` a<<<- b` :-(Koheleth
That's true :) Thanks for pointing out @Carl Witthoft! I have updated my function according to this problem, thought thanks to this modification writing the expression part of the forp function will require the user to use properly formatted syntax (leaving a space before and after the <-).Sedan

© 2022 - 2024 — McMap. All rights reserved.