Trying to get started with doParallel and foreach but no improvement - McMap

About

Trying to get started with doParallel and foreach but no improvement

Asked 24/5, 2013 at 12:15 Answered 24/5, 2013 at 21:26

Solved r parallel-processing mpi

U

1

10

I am trying to use the doParallel and foreach package but I'm getting reduction in performance using the bootstrapping example in the guide found here CRANpage.

library(doParallel)
library(foreach)
registerDoParallel(3)
x <- iris[which(iris[,5] != "setosa"), c(1,5)]
trials <- 10000
ptime <- system.time({
  r <- foreach(icount(trials), .combine=cbind) %dopar% {
    ind <- sample(100, 100, replace=TRUE)
    result1 <- glm(x[ind,2]~x[ind,1], family=binomial(logit))
    coefficients(result1)
    }
  })[3]
ptime

This example returns 56.87.

When I change the dopar to just do to run it sequentially instead of in parallel, it returns 36.65.

If I do registerDoParallel(6) it gets the parallel time down to 42.11 but is still slower than sequentially. registerDoParallel(8) gets 40.31 still worse than sequential.

If I increase trials to 100,000 then the sequential run takes 417.16 and the parallel run with 3 workers takes 597.31. With 6 workers in parallel it takes 425.85.

My system is

Dell Optiplex 990
Windows 7 Professional 64-bit
16GB RAM
Intel i-7-2600 3.6GHz Quad-core with hyperthreading

Am I doing something wrong here? If I do the most contrived thing I can think of (replacing computational code with Sys.sleep(1)) then I get an actual reduction closely proportionate to the number of workers. I'm left wondering why the example in the guide decreases performance for me while for them it sped things up?

Uniplanar answered 24/5, 2013 at 12:15 Comment(2)

This is an almost-FAQ: You testing with Sys.sleep() was just fine, and it shows, that setting up the threads needs more time than computing. Try to increase the size of the problem, i.e. sample(10000), and you will see an improvement. However, your machine effectively only has 4 cores, so nothing works beyond 4 cores. I never have seen an effect of hyperthreading (under Windows, and without special R compiles) – Turquoise 24/5, 2013 at 12:31

@DieterMenne: Your point about this being FAQish is well taken. The fact that the guide's example didn't produce a benefit for me through me back. You were right that increasing the sample size would get me to where running parallel was an improvement. Also, thanks for the tip about the HT. I did a test with 4 vs 8 workers and it was basically the same time. – Uniplanar 24/5, 2013 at 15:0

B

9

The underlying problem is that doParallel executes attach for every task execution on the workers of the PSOCK cluster in order to add the exported variables to the package search path. This resolves various scoping issues, but can hurt performance significantly, particularly with short duration tasks and large amounts of exported data. This doesn't happen on Linux and Mac OS X with your example, since they will use mclapply, rather than clusterApplyLB, but it will happen on all platforms if you explicitly register a PSOCK cluster.

I believe that I've figured out how to resolve the task scoping problems in a different way that doesn't hurt performance, and I'm working with Revolution Analytics to get the fix into the next release of doParallel and doSNOW, which also has the same problem.

You can work around this problem by using task chunking:

ptime2 <- system.time({
  chunks <- getDoParWorkers()
  r <- foreach(n=idiv(trials, chunks=chunks), .combine='cbind') %dopar% {
    y <- lapply(seq_len(n), function(i) {
      ind <- sample(100, 100, replace=TRUE)
      result1 <- glm(x[ind,2]~x[ind,1], family=binomial(logit))
      coefficients(result1)
    })
    do.call('cbind', y)
  }
})[3]

This results in only one task per worker, so each worker only executes attach once, rather than trials / 3 times. It also results in fewer but larger socket operations, which can be performed more efficiently on most systems, but in this case, the critical issue is attach.

Betwixt answered 24/5, 2013 at 21:26 Comment(3)

with 4 workers this took 9.2 seconds. When I did chunks=1 it took 28.18 – Uniplanar 28/5, 2013 at 11:55

I was going to email the maintainer to let them know but it's Revolution Analytics and I don't want to be hit up by sales people. – Uniplanar 28/5, 2013 at 12:10

@DeanMacGregor Don't worry: I've already contacted them, particularly since I discovered what the real underlying problem is and believe I have fixed it. I'm working with Revolution right now to have it fixed in the next release of both doSNOW and doParallel. – Betwixt 28/5, 2013 at 12:41

Recommended topics

#Godot #Unity #Godot 4.X #Mongodb

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

© 2022 - 2024 — McMap. All rights reserved.