I'm running the following code (extracted from doParallel's Vignettes) on a PC (OS Linux) with 4 and 8 physical and logical cores, respectively.
Running the code with iter=1e+6
or less, every thing is fine and I can see from CPU usage that all cores are employed for this computation. However, with larger number of iterations (e.g. iter=4e+6
), it seems parallel computing does not work in which case. When I also monitor the CPU usage, just one core is involved in computations (100% usage).
Example1
require("doParallel")
require("foreach")
registerDoParallel(cores=8)
x <- iris[which(iris[,5] != "setosa"), c(1,5)]
iter=4e+6
ptime <- system.time({
r <- foreach(i=1:iter, .combine=rbind) %dopar% {
ind <- sample(100, 100, replace=TRUE)
result1 <- glm(x[ind,2]~x[ind,1], family=binomial(logit))
coefficients(result1)
}
})[3]
Do you have any idea what could be the reason? Could memory be the cause?
I googled around and I found THIS relevant to my question but the point is that I'm not given any kind of error and the OP seemingly has came up with a solution by providing necessary packages inside foreach
loop. But no package is used inside my loop, as can be seen.
UPDATE1
My problem still is not solved. As per my experiments, I don't think that memory could be the reason. I have 8GB of memory on the system on which I run the following simple parallel (over all 8 logical cores) iteration:
Example2
require("doParallel")
require("foreach")
registerDoParallel(cores=8)
iter=4e+6
ptime <- system.time({
r <- foreach(i=1:iter, .combine=rbind) %dopar% {
i
}
})[3]
I do not have problem with running of this code but when I monitor the CPU usage, just one core (out of 8) is 100%.
UPDATE2
As for Example2, @SteveWeston (thanks for pointing this out) stated that (in comments) : "The example in your update is suffering from having tiny tasks. Only the master has any real work to do, which consists of sending tasks and processing results. That's fundamentally different than the problem with the original example which did use multiple cores on a smaller number of iterations."
However, Example1 still remains unsolved. When I run it and I monitor the processes with htop
, here is what happens in more detail:
Let's name all 8 created processes p1
through p8
. The status (column S
in htop
) for p1
is R
meaning that it's running and remains unchanged. However, for p2
up to p8
, after some minutes, the status changes to D
(i.e. uninterruptible sleep) and, after some minutes, again changes to Z
(i.e. terminated but not reaped by its parent). Do you have any idea why this happens?
cl <- makePSOCKcluster(8); registerDoParallel(cl)
? – Conkliniter=15e+6
? Could you perhaps comment your hardware specifications (CPU and memory) and type of OS on which you run the code? – ScrimmageRscript.exe
processes in my Resource Monitor though they were only using a small amount of the CPU core they were running on (but that's not necessarily a problem -- it can fluctuate due to inefficient chunking). I was running it on Windows Server 2008 with 24 physical CPU's. – ConklinregisterDoParallel()
is also OK as it's written in the vignette document provided by package authors. I have the feeling that It has something to do with the memory. – Scrimmage15e+6
but about3.8e+6
. I actually tried to test with toy example (like above) if every thing goes fine and then run my experiments. In my experiments, I need to repeatforeach
for around100
times. So I think I have to break down100
times repetitions rather thanforeach
itself. – ScrimmagegetDoParRegistered()
andgetDoParWorkers()
? – UchidagetDoParRegistered()
I getTRUE
and8
forgetDoParWorkers()
. – Scrimmageiter=3e+6
? – Scrimmage