How to configure batchscript to parallelize R script with future.batchtools (SLURM)
Asked Answered
G

2

7

I seek to parallize an R file on an SLURM HPC using the future.batchtools packages. While the script is executed on multiple nodes, it only use 1 CPU instead of 12 that are available.

So far, I tried different configurations (c.f. code attached) which do not lead to the expected results. My bash file with the configuration is as follows:

#!/bin/bash
#SBATCH --nodes=2
#SBATCH --cpus-per-task=12

R CMD BATCH test.R output

In R, I use a foreach loop:

# First level = cluster
# Second level = multiprocess 
# https://cran.r-project.org/web/packages/future.batchtools/vignettes/future.batchtools.html
plan(list(batchtools_slurm, multiprocess))

# Parallel for loop
result <- foreach(i in 100) %dopar% {
       Sys.sleep(100)
return(i) 
}

I would appreciate if someone can give me guidance on how to configure the code for multiple nodes and multiple cores.

Granddaughter answered 26/7, 2019 at 14:50 Comment(5)
Your R script uses a single layer of foreach parallelization, which will parallelize using batchtools_slurm. If you use two layers of foreach parallelization, the second layer will use multiprocess (given that plan()). FYI, foreach(i in 100) will only iterate over a single value.Etoile
Thanks for the response! I actually use a nested forachach with this plan (I didn't paste it here for the sake of a parsimonious example. R result <- foreach(i= 1:100) %dopar% { foreach(jRun = 1:100) %dopar% { # calulation }} This code is closer to the one I am using. Still, I do not exploit multiple knots on the HPC. Is there anything else I am overlooking?Granddaughter
Ah... you need to specify the number of cores per task when you set up the plan - something like plan(list(tweak(batchtools_slurm, resources = list("ntasks-per-node"=4)), multiprocess)), cf. github.com/HenrikBengtsson/future.batchtools. To verify that this works: With that new plan, call f <- future( availableCores() ); ncores <- value(f). That will show how many cores the second layer will have available. You should get four (4).Etoile
Thanks for your help. For others, who face the same challenge: it is necessary to start the .R files with srun in order to allow parallel processes.Granddaughter
Michael, did you manage to find the right setup in the end? If so, how about posting the solution you found here?Taunt
S
3

There are various ways of parallelizing R jobs with Slurm. A few to note:

  1. You can use a single node with multiple cores, in which case mclapply is one nice alternative to use since in principle is faster and more memmory efficient compared to, say, parLapply.

  2. You can use JOB arrays with Slurm, meaning that you can write a single R script that Slurm will run multiple times specifying the option --array=1-[# of replicates]. You can specify what to do within each job using the SLURM_ARRAY_TASK_ID environment variable (you can capture that in R using Sys.getenv("SLURM_ARRAY_TASK_ID")). And use conditionals if/else statement based on that.

  3. @george-ostrouchov mentioned, you can use MPI for which you would need to have the Rmpi package installed, but that may be a bit painful some times.

  4. Another thing you can try is creating a SOCKET cluster object using the parallel package. This would be a multi-node cluster which would allow you spanning your calculations using parLapply and others across multiple nodes.

I do have a tutorial in which I explain these options and how can you use the slurmR package (which I'm working on, soon on CRAN) or the rslurm (which is already on CRAN). You can take a look at the tutorial here.

Salamis answered 19/11, 2019 at 18:44 Comment(0)
P
0

Since you are running in batch and using more than one node, consider combining MPI with mclapply's multicore fork. These are closer to what actually happens in hardware and provide control between the number of R instances per node and core use of each instance. Example SLURM and PBS scripts and an accompanying R batch script are in https://github.com/RBigData/mpi_balance, illustrating how to balance multicore and multinode parallelism.

Peppi answered 9/8, 2019 at 4:43 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.