Using Rcpp within parallel code via snow to make a cluster
Asked Answered
S

3

16

I've written a function in Rcpp and compiled it with inline. Now, I want to run it in parallel on different cores, but I'm getting a strange error. Here's a minimal example, where the function funCPP1 can be compiled and runs well by itself, but cannot be called by snow's clusterCall function. The function runs well as a single process, but gives the following error when ran in parallel:

Error in checkForRemoteErrors(lapply(cl, recvResult)) : 
  2 nodes produced errors; first error: NULL value passed as symbol address

And here is some code:

## Load and compile
library(inline)
library(Rcpp)
library(snow)
src1 <- '
     Rcpp::NumericMatrix xbem(xbe);
     int nrows = xbem.nrow();
     Rcpp::NumericVector gv(g);
     for (int i = 1; i < nrows; i++) {
      xbem(i,_) = xbem(i-1,_) * gv[0] + xbem(i,_);
     }
     return xbem;
'
funCPP1 <- cxxfunction(signature(xbe = "numeric", g="numeric"),body = src1, plugin="Rcpp")

## Single process
A <- matrix(rnorm(400), 20,20)
funCPP1(A, 0.5)

## Parallel
cl <- makeCluster(2, type = "SOCK") 
clusterExport(cl, 'funCPP1') 
clusterCall(cl, funCPP1, A, 0.5)
Sabinasabine answered 20/5, 2011 at 15:40 Comment(0)
H
19

Think it through -- what does inline do? It creates a C/C++ function for you, then compiles and links it into a dynamically-loadable shared library. Where does that one sit? In R's temp directory.

So you tried the right thing by shipping the R frontend calling that shared library to the other process (which has another temp directory !!), but that does not get the dll / so file there.

Hence the advice is to create a local package, install it and have both snow processes load and call it.

(And as always: better quality answers may be had on the rcpp-devel list which is read by more Rcpp constributors than SO is.)

Husky answered 20/5, 2011 at 15:47 Comment(1)
Makes perfect sense. For some reason, I assumed this was snow-specific, which is why I posted here. Thanks!Sabinasabine
C
1

Old question, but I stumbled across it while looking through the top Rcpp tags so maybe this answer will be of use still.

I think Dirk's answer is proper when the code you've written is fully de-bugged and does what you want, but it can be a hassle to write a new package for such as small piece of code like in the example. What you can do instead is export the code block, export a "helper" function that compiles source code and run the helper. That'll make the CXX function available, then use another helper function to call it. For instance:

# Snow must still be installed, but this functionality is now in "parallel" which ships with base r.
library(parallel)

# Keep your source as an object
src1 <- '
     Rcpp::NumericMatrix xbem(xbe);
     int nrows = xbem.nrow();
     Rcpp::NumericVector gv(g);
     for (int i = 1; i < nrows; i++) {
      xbem(i,_) = xbem(i-1,_) * gv[0] + xbem(i,_);
     }
     return xbem;
'
# Save the signature
sig <- signature(xbe = "numeric", g="numeric")

# make a function that compiles the source, then assigns the compiled function 
# to the global environment
c.inline <- function(name, sig, src){
    library(Rcpp)
    funCXX <- inline::cxxfunction(sig = sig, body = src, plugin="Rcpp")
    assign(name, funCXX, envir=.GlobalEnv)
}
# and the function which retrieves and calls this newly-compiled function 
c.namecall <- function(name,...){
    funCXX <- get(name)
    funCXX(...)
}

# Keep your example matrix
A <- matrix(rnorm(400), 20,20)

# What are we calling the compiled funciton?
fxname <- "TestCXX"

## Parallel
cl <- makeCluster(2, type = "PSOCK") 

# Export all the pieces
clusterExport(cl, c("src1","c.inline","A","fxname")) 

# Call the compiler function
clusterCall(cl, c.inline, name=fxname, sig=sig, src=src1)

# Notice how the function now named "TestCXX" is available in the environment
# of every node?
clusterCall(cl, ls, envir=.GlobalEnv)

# Call the function through our wrapper
clusterCall(cl, c.namecall, name=fxname, A, 0.5)
# Works with my testing

I've written a package ctools (shameless self-promotion) which wraps up a lot of the functionality that is in the parallel and Rhpc packages for cluster computing, both with PSOCK and MPI. I already have a function called "c.sourceCpp" which calls "Rcpp::sourceCpp" on every node in much the same way as above. I'm going to add in a "c.inlineCpp" which does the above now that I see the usefulness of it.

Edit:

In light of Coatless' comments, the Rcpp::cppFunction() in fact negates the need for the c.inline helper here, though the c.namecall is still needed.

src2 <- '
 NumericMatrix TestCpp(NumericMatrix xbe, int g){
        NumericMatrix xbem(xbe);
        int nrows = xbem.nrow();
        NumericVector gv(g);
        for (int i = 1; i < nrows; i++) {
            xbem(i,_) = xbem(i-1,_) * gv[0] + xbem(i,_);
        }
        return xbem;
 }
'

clusterCall(cl, Rcpp::cppFunction, code=src2, env=.GlobalEnv)

# Call the function through our wrapper
clusterCall(cl, c.namecall, name="TestCpp", A, 0.5)
Calabar answered 5/9, 2016 at 15:45 Comment(6)
Please do not use cxxfunction. Please use cppFunction() instead.Macklin
I think it could work just the same, I just wanted to use the original example. With cppFunction() the src1 code block would be slightly different. Is there any particular reason not to use cxxfunction?Calabar
The end result is the same, but how you get there is different. In particular, the cxxfunction() present code in a non-function form way. Using cppFunction() I can write the C++ code in functional form, e.g.: Rcpp::NumericMatrix sig(Rcpp::NumericMatrix xbem, Rcpp::NumericVector gv){ int nrows = xbem.nrow(); for (int i = 1; i < nrows; i++) { xbem(i,_) = xbem(i-1,_) * gv[0] + xbem(i,_); } return xbem; }. The focus is more on the actual calculation than keeping track of inputs or casting objects.Macklin
In essence, many things have changed since the introduction of Rcpp Attributes. It's a shame to see people not taking advantage of them.Macklin
Good point. I personally prefer working with individual source files and using sourceCpp(), so I've never had to make the distinction between the two inline-style functions. Answer updated using cppFunction().Calabar
For those who come later, I decided to write a post about this... thecoatlessprofessor.com/programming/rcpp/…Macklin
K
0

I resolved it by sourcing on each cluster cluster node an R file with the wanted C inline function:

clusterEvalQ(cl, 
    {
     library(inline)
     invisible(source("your_C_func.R"))
    })

And your file your_C_func.R should contain the C function definition:

c_func <- cfunction(...)
Kahlil answered 22/3, 2019 at 21:38 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.