The R package Rsge allows job submission to SGE managed clusters. It basically saves the required environment to disk, builds job submission scripts, executes them via qsub and then collates the results and returns them to you.
Because it basically wraps calls to qsub, it should work with PBS too (although since I don't know PBS, I can't guarantee it). You can alter the qsub command and the options used by altering the Rsge associated global options (prefixed sge. in the options()
output)
Its is no longer on CRAN, but it is availible from github: https://github.com/bodepd/Rsge, although it doesn't look like its maintained any more.
To use it use one of the apply type functions supplied with the package: sge.apply , sge.parRapply, sge.parCapply, sge.parLapply and sge.parSapply, which are parallel equivalents to apply, rapply, rapply(t(x),…), lapply and sapply respectively. In addition to the standard parameters passed to the non-parallel functions a couple of other parameters are needed:
njobs: Number of parallel jobs to use
global.savelist: Character vector giving the names of variables
from the global environment that should be imported.
function.savelist: Character vector giving the variables to save from
the local environment.
packages: List of library packages to be loaded by each worker process
before computation is started.
The two savelist parameters and the packages parameters basically specify what variables, functions and packages should be loaded into the new instances of R running on the cluster machines before your code is executed. The different components of X (either list items or data.frame rows/columns) are divided between njobs different jobs and submitted as a job array to SGE. Each node starts an instance of R loads the specified variables, functions and packages, executes the code, saves and save the results to a tmp file. The controlling R instance checks when the jobs are complete, loads the data from the tmp files and joins the results back together to get the final results.
For example computing a statistic on a random sample of a gene list:
library(Rsge)
library(some.bioc.library)
gene.list <- read.delim(“gene.list.tsv”)
compute.sample <- function(gene.list) {
gene.list.sample <- sample(1000, gene.list)
statistic <- some.slow.bioc.function(gene.list.sample)
return (statistic)
}
results <- sge.parSapply(1:10000, function(x) compute.sample,
njobs = 100,
global.savelist = c(“gene.list”),
function.savelist(“compute.sample”),
packages = c(“some.bioc.library”))
system()
. You might or might not want to have qsub callRscript
as a means of getting R to execute yourmyscript.R
. – Messuagemyscript.R
called Rscript; but I can remove this and replace/bin/bash
with/usr/bin/Rscript
in the qsub command ...qsub -S /usr/bin/Rscript ...
, similar function but cleaner since it doesn't call bash and then Rscript. – Dodge