How to avoid 'sink stack is full' error when sink() is used to capture messages in foreach loop
Asked Answered
R

4

6

In order to see the console messages output by a function running in a foreach() loop I followed the advice of this guy and added a sink() call like so:

   library(foreach)    
   library(doMC)
   cores <- detectCores()
   registerDoMC(cores)

   X <- foreach(i=1:100) %dopar%{
   sink("./out/log.branchpies.txt", append=TRUE)
   cat(paste("\n","Starting iteration",i,"\n"), append=TRUE)
   myFunction(data, argument1="foo", argument2="bar")
   }

However, at iteration 77 I got the error 'sink stack is full'. There are well-answered questions about avoiding this error when using for-loops, but not foreach. What's the best way to write the otherwise-hidden foreach output to a file?

Rushton answered 10/10, 2014 at 9:35 Comment(7)
Are you actually running this in parallel? Why are you using sink and cat with a file?Exosmosis
I am running the same computationally-intensive function on 100 elements of a list in parallel using foreach because it would take forever using a for loop, or even mclapply (I've tried and it's much slower). I'm using sink and cat because the linked page recommended I do, and because it helps keep track of which iteration the foreach loop is up to.Rushton
You didn't answer the question. You don't show how you set up the cluster. Also, the tutorial you link to doesn't use the file argument of cat.Exosmosis
mclapply shouldn't be slower than foreach if you set up the cluster correctly.Exosmosis
Sorry, I didn't actually use the file argument in cat—that was something I was experimenting with. I mistyped the code. I'll fix it now.Rushton
@Exosmosis he may be running this on Windows, where mclapply doesn't do anything.Waterway
@Hong @Exosmosis I'm using a Mac. mclapply resulted in a definite speed increase relative to a for-loop but it was meagre compared to foreach.Rushton
E
7

This runs without errors on my Mac:

library(foreach)    
library(doMC)
cores <- detectCores()
registerDoMC(cores)

X <- foreach(i=1:100) %dopar%{
  sink("log.branchpies.txt", append=TRUE)
  cat(paste("\n","Starting iteration",i,"\n"))
  sink() #end diversion of output
  rnorm(i*1e4)
}

This is better:

library(foreach)    
library(doMC)
cores <- detectCores()
registerDoMC(cores)
sink("log.branchpies.txt", append=TRUE)
X <- foreach(i=1:100) %dopar%{
  cat(paste("\n","Starting iteration",i,"\n"))
    rnorm(i*1e4)
}
sink() #end diversion of output

This works too:

library(foreach)    
library(doMC)
cores <- detectCores()
registerDoMC(cores)

X <- foreach(i=1:100) %dopar%{
  cat(paste("\n","Starting iteration",i,"\n"), 
       file="log.branchpies.txt", append=TRUE)
  rnorm(i*1e4)
}
Exosmosis answered 10/10, 2014 at 17:8 Comment(1)
Thanks. The problem with my original code must have been that it didn't include sink() to end the diversion of output.Rushton
A
4

As suggested by this guy , it is quite tricky to keep track of the sink stack. It is, therefore advised to use ability of cat to write to file, such as suggested in the answer above:

cat(..., file="log.txt", append=TRUE)

To save some typing you could create a wrapper function that diverts output to file every time cat is called:

catf <- function(..., file="log.txt", append=TRUE){
  cat(..., file=file, append=append)
}

So that at the end, when you call foreach you would use something like this:

library(foreach)    
library(doMC)
cores <- detectCores()
registerDoMC(cores)

X <- foreach(i=1:100) %dopar%{
  catf(paste("\n","Starting iteration",i,"\n"))
  rnorm(i*1e4)
}

Hope it helps!

Artless answered 13/12, 2015 at 18:41 Comment(0)
N
1

Unfortunately, none of the abovementioned approaches worked for me: With sink() within the foreach()-loop, it did not stop to throw the "sink stack is full"-error. With sink() outside the loop, the file was created, but never updated.

To me, the easiest way of creating a log-file to keep track of a parallelised foreach()-loop's progress is by applying the good old write.table()-function.

    library(foreach)
    library(doParallel)

    availableClusters <- makeCluster(detectCores() - 1) #use all cpu-threads but one (i.e. one is reserved for the OS)
    registerDoParallel(availableClusters) #register the available cores for the parallisation

    x <- foreach (i = 1 to 100) %dopar% {
           log.text <- paste0(Sys.time(), " processing loop run ", i, "/100")
           write.table(log.text, "loop-log.txt", append = TRUE, row.names = FALSE, col.names = FALSE)

           #your statements here
    }

And don't forget (as I did several times...) to use append = TRUE within write.table().

Numismatist answered 1/3, 2022 at 18:7 Comment(0)
J
0

Call sink() with no arguments once inside the for loop to reset it to end the file writing at the end of each iteration and you will not get this error again.

Jamijamie answered 7/7, 2016 at 9:30 Comment(1)
Doesn't work for me, I suspect because each worker is breaking out of the loop beforehand, so this maybe works contingent on the loop reaching it.Laughton

© 2022 - 2024 — McMap. All rights reserved.