MPI cluster based parallel calculation in R on WestGrid (pbs file)
Asked Answered
C

1

0

I am now dealing with a large dataset and I want to use parallel calculation to accelerate the process. WestGird is a Canadian computing system which has clusters with interconnect.

I use two packages doSNOW and parallel to do parallel jobs. My question is how I should write the pbs file. When I submit the job using qsub, an error occurs: mpirun noticed that the job aborted, but has no info as to the process that caused that situation.

Here is the R script code:

install.packages("fume_1.0.tar.gz")
library(fume)
library(foreach)
library(doSNOW)
load("spei03_df.rdata",.GlobalEnv)

cl <- makeCluster(mpi.universe.size(), type='MPI' )
registerDoSNOW(cl)
MK_grid <- 
  foreach(i=1:6000, .packages="fume",.combine='rbind') %dopar% {
    abc <- mkTrend(as.matrix(spei03_data)[i,])
    data.frame(P_value=abc$`Corrected p.value`, Slope=abc$`Sen's Slope`*10,Zc=abc$Zc)
  }
    stopCluster(cl)
    save(MK_grid,file="MK_grid.rdata")
    mpi.exit()

The "fume" package is download from https://cran.r-project.org/src/contrib/Archive/fume/ .

Here is the pbs file:

#!/bin/bash
#PBS -l nodes=2:ppn=12
#PBS -l walltime=2:00:00 
module load application/R/3.3.1
cd $PBS_O_WORKDIR 

export OMP_NUM_THREADS=1
mpirun -np 1 -hostfile $PBS_NODEFILE R CMD BATCH Trend.R

Can anyone help? Thanks a lot.

Carminecarmita answered 2/12, 2016 at 21:39 Comment(0)
A
1

It's difficult to give advice on how to use a compute cluster that I've never used since each cluster is setup somewhat differently, but I can give you some general advice that may help.

Your job script looks reasonable to me. It's very similar to what I use on one of our Torque/Moab clusters. It's a good idea to verify that you're able to load all of the necessary R packages interactively because sometimes additional module files may need to be loaded. If you need to install packages yourself, make sure you install them in the standard "personal library" which is called something like "~/R/x86_64-pc-linux-gnu-library/3.3". That often avoids errors loading packages in the R script when executing in parallel.

I have more to say about your R script:

  • You need to load the Rmpi package in your R script using library(Rmpi). It isn't automatically loaded when loading doSNOW, so you will get an error when calling mpi.universe.size().

  • I don't recommend installing R packages in the R script itself. That will fail if install.script needs to prompt you for the CRAN repository, for example, since you can't execute interactive functions from an R script executed via mpirun.

  • I suggest starting mpi.universe.size() - 1 cluster workers when calling makeCluster. Since mpirun starts one worker, it may not be safe for makeCluster to spawn mpi.universe.size() additional workers since that would result in a total of mpi.universize.size() + 1 MPI processes. That works on some clusters, but it fails on at least one of our clusters.

  • While debugging, try using the makeCluster outfile='' option. Depending on your MPI installation, that may let you see error messages that would otherwise be hidden.

Aaron answered 3/12, 2016 at 15:55 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.