Why does parLapplyLB not actually balance load?
Asked Answered
R

2

15

I'm testing out the parLapplyLB() function to understand what it does to balance a load. But I'm not seeing any balancing happening. For example,

cl <- parallel::makeCluster(2)

system.time(
  parallel::parLapplyLB(cl, 1:4, function(y) {
    if (y == 1) {
      Sys.sleep(3)
    } else {
      Sys.sleep(0.5)
    }}))
##   user  system elapsed 
##  0.004   0.009   3.511 

parallel::stopCluster(cl)

If it was truly balancing the load, the first job (job 1) that sleeps for 3 seconds would be on the first node and the other three jobs (jobs 2:4) would sleep for a total of 1.5 seconds on the other node. In total, the system time should only be 3 seconds.

Instead, I think that jobs 1 and 2 are given to node 1 and jobs 3 and 4 are given to node 2. This results in the total time being 3 + 0.5 = 3.5 seconds. If we run the same code above with parLapply() instead of parLapplyLB(), we get the same system time of about 3.5 seconds.

What am I not understanding or doing wrong?

Ronni answered 6/7, 2016 at 18:6 Comment(2)
I think R doesn’t do automatic load balancing. I think it divides the tasks across as many cores as available, regardless of the time it takes to do each task, or when each task completes. It's not like there is a queue of tasks, and when one worker finished it grabs the next one. Each core was assigned two tasks. Hence 3 + 0.5 on the first worker, and a total of 3.5. would be happy to be wrongVertebrate
Yes that's where the 3.5 is coming from. It's not balancing the load. But the parLapplyLB claims to balance.Ronni
R
14

NOTE: Since R-3.5.0, the behavior/bug noted by the OP and explained below has been fixed. As noted in R's NEWS file at the time:

* parLapplyLB and parSapplyLB have been fixed to do load balancing
  (dynamic scheduling).  This also means that results of
  computations depending on random number generators will now
  really be non-reproducible, as documented.

ORIGINAL ANSWER (Now only relevant for R versions < 3.5.0 )

For a task like yours (and, for that matter, for any task for which I've ever needed parallel) parLapplyLB isn't really the right tool for the job. To see why not, have a look at the way that it's implemented:

parLapplyLB
# function (cl = NULL, X, fun, ...) 
# {
#     cl <- defaultCluster(cl)
#     do.call(c, clusterApplyLB(cl, x = splitList(X, length(cl)), 
#         fun = lapply, fun, ...), quote = TRUE)
# }
# <bytecode: 0x000000000f20a7e8>
# <environment: namespace:parallel>

## Have a look at what `splitList()` does:
parallel:::splitList(1:4, 2)
# [[1]]
# [1] 1 2
# 
# [[2]]
# [1] 3 4

The problem is that it first splits its list of jobs up into equal-sized sublists that it then distributes among the nodes, each of which runs lapply() on its given sublist. So here, your first node runs jobs on the first and second inputs, while the second node runs jobs using the third and fourth inputs.

Instead, use the more versatile clusterApplyLB(), which works just as you'd hope:

system.time(
  parallel::clusterApplyLB(cl, 1:4, function(y) {
    if (y == 1) {
      Sys.sleep(3)
    } else {
      Sys.sleep(0.5)
    }}))
# user  system elapsed 
# 0.00    0.00    3.09 
Ralaigh answered 6/7, 2016 at 18:45 Comment(5)
Thanks! That what I was looking for. I can't think of a case where parLapplyLB would produce something different from parLapply, and so I'm not sure what its purpose is.Ronni
Does clusterApplyLB(cl, X, fun) have the same intended behavior as parLapplyLB? I've been trying this out on my system, and it seems to give the same output when X is a list, but I'm a little nervous just swapping out parLapplyLB with clusterApplyLB..Demagoguery
useful info here, as well as user defined parlapplyLB detritus.fundacioace.com/pub/books/…Asternal
@Asternal Very interesting, especially pages 13--22 (and extra-especially pp. 20--22). Thanks!Maryettamaryjane
One thing not mentioned is avoid mclapply if youre working on tasks with large files. sendMaster doesnt like returning anything 2gig or larger. And turning off prescheduling just seemed to do lapply with only 1 core. parlapply (sock or fork) works and has similar performance times.Asternal
D
4

parLapplyLB is not balancing the load because it has a semantic bug. We found the bug and provided a fix, see here. Now, its up to the R devs to include the fix.

Doyenne answered 14/2, 2018 at 6:57 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.