Where can I learn how to write C code to speed up slow R functions? [closed]
Asked Answered
A

4

119

What's the best resource for learning how to write C code for use with R? I know about the system and foreign language interfaces section of R extensions, but I find it pretty hard going. What are good resources (both online and offline) for writing C code for use with R?

To clarify, I don't want to learn how to write C code, I want to learn how to better integrate R and C. For example, how do I convert from a C integer vector to a R integer vector (or vice versa) or from a C scalar to an R vector?

Angara answered 5/11, 2010 at 13:20 Comment(0)
T
77

Well there is the good old Use the source, Luke! --- R itself has plenty of (very efficient) C code one can study, and CRAN has hundreds of packages, some from authors you trust. That provides real, tested examples to study and adapt.

But as Josh suspected, I lean more towards C++ and hence Rcpp. It also has plenty of examples.

Edit: There were two books I found helpful:

  • The first one is Venables and Ripley's "S Programming" even though it is getting long in the tooth (and there have been rumours of a 2nd edition for years). At the time there was simply nothing else.
  • The second in Chambers' "Software for Data Analysis" which is much more recent and has a much nicer R-centric feel -- and two chapters on extending R. Both C and C++ get mentioned. Plus, John shreds me for what I did with digest so that alone is worth the price of admission.

That said, John is growing fond of Rcpp (and contributing) as he finds the match between R objects and C++ objects (via Rcpp) to be very natural -- and ReferenceClasses help there.

Edit 2: With Hadley's refocussed question, I very strongly urge you to consider C++. There is so much boilerplate nonsense you have to do with C---very tedious and very avoidable. Have a look at the Rcpp-introduction vignette. Another simple example is this blog post where I show that instead of worrying about 10% differences (in one of the Radford Neal examples) we can get eightyfold increases with C++ (on what is of course a contrived example).

Edit 3: There is complexity in that you may run into C++ errors that are, to put it mildly, hard to grok. But to just use Rcpp rather than to extend it, you should hardly ever need it. And while this cost is undeniable, it is far eclipsed by the benefit of simpler code, less boilerplate, no PROTECT/UNPROTECT, no memory management etc pp. Doug Bates just yesterday stated that he finds C++ and Rcpp to be much more like writing R than writing C++. YMMV and all that.

Tapdance answered 5/11, 2010 at 13:36 Comment(6)
I expected I'd get a "use Rcpp" answer ;) It would be really useful if you could spell out the disadvantages of using C++ instead of C. One major one would seem to be that C++ is much more complex that C - does this make it harder to use? (Or in practice, can you write C++ code that's very similar to C?) I would also appreciate more reference material that's aimed at new users who are not familiar with the existing C api.Angara
See Edit 3 and yes, you can. Meyers calls C++ a 'four paradigm' language and you do not have to use all four. Using it as 'just a better C' and use Rcpp as glue to R is perfectly fine. Nobody forces a style on you -- this ain't Java ;-)Tapdance
@Dirk: thx for the elaboration. It raised the question in our office before, as C is commonly used here instead of C++. When would the use of C over C++ beneficial, or do you simply say "never C, always C++"?Muss
Hadley: Cool. We would be very interested in your feedback. Please do join rcpp-devel and do not hold back. We know we are short documentation -- but a fresh set of eyes could help tremendously.Tapdance
Joris: What part of my answer, three edits and additional comments did not clarify sufficiently ;-) ? I just don't like the C interface for the R API; it is(IMHO) inconsistent and insufficient. We give examples in our talks, and in the Rcpp-introduction vignette, see eg the big table in there. Also, I am currently working on rewriting another third-party C package in C++/Rcpp and will hopefully have an illustrated case study "soon".Tapdance
@Angara does that mean that we could expect some speed improvements in ggplot?Interinsurance
D
59

Hadley,

You can definitely write C++ code that is similar to C code.

I understand what you say about C++ being more complicated than C. This is if you want to master everything : objects, templates, STL, template meta programming, etc ... most people don't need these things and can just rely on others to it. The implementation of Rcpp is very complicated, but just because you don't know how your fridge works, it does not mean you cannot open the door and grab fresh milk ...

From your many contributions to R, what strikes me is that you find R somewhat tedious (data manipulation, graphics, string manipulatio, etc ...). Well get prepared for many more surprises with the internal C API of R. This is very tedious.

From time to time, I read the R-exts or R-ints manuals. This helps. But most of the time, when I really want to find out about something, I go into the R source, and also in the source of packages written by e.g. Simon (there is usually lots to learn there).

Rcpp is designed to make these tedious aspects of the API go away.

You can judge for yourself what you find more complicated, obfuscated, etc ... based on a few examples. This function creates a character vector using the C API:

SEXP foobar(){
  SEXP ab;
  PROTECT(ab = allocVector(STRSXP, 2));
  SET_STRING_ELT( ab, 0, mkChar("foo") );
  SET_STRING_ELT( ab, 1, mkChar("bar") );
  UNPROTECT(1);
}

Using Rcpp, you can write the same function as:

SEXP foobar(){
   return Rcpp::CharacterVector::create( "foo", "bar" ) ;
}

or:

SEXP foobar(){
   Rcpp::CharacterVector res(2) ;
   res[0] = "foo" ;
   res[1] = "bar" ;
   return res ;
}

As Dirk said, there are other examples on the several vignettes. We also usually point people towards our unit tests because each of them test a very specific part of the code and are somewhat self explanatory.

I'm obviously biased here, but I would recommend getting familiar about Rcpp instead of learning the C API of R, and then come to the mailing list if something is unclear or does not seem doable with Rcpp.

Anyway, end of the sales pitch.

I guess it all depends what sort of code you want to write eventually.

Romain

Defenestration answered 5/11, 2010 at 14:47 Comment(3)
"Rcpp is designed to make these tedious aspects of the API go away" = exactly what I'm looking for. Thanks! What would be really useful would be a v. brief C++ primer for someone who is familiar with C and wants to use Rcpp.Angara
nice, that short example of Rcpp got me sold. I am assuming allocXX and UNPROTECT(1) is handled much like how smart pointers manages the resource. i.e RAII. Is there any notable performance penalty by using Rcpp over vanilla C api?Breccia
We address that in the Rcpp-introduction with a benchmark example (which is also in the sources / installed package). In short, no penalty at all.Tapdance
D
30

@hadley: unfortunately, I don't have specific resources in mind to help you getting started on C++. I picked it up from Scott Meyers's books (Effective C++, More effective C++, etc ...) but these are not really what one could call introductory.

We almost exclusively use the .Call interface to call C++ code. The rule is easy enough :

  • The C++ function must return an R object. All R objects are SEXP.
  • The C++ function takes between 0 and 65 R objects as input (again SEXP)
  • it must (not really, but we can save this for later) be declared with C linkage, either with extern "C" or the RcppExport alias that Rcpp defines.

So a .Call function gets declared like this in some header file:

#include <Rcpp.h>

RcppExport SEXP foo( SEXP x1, SEXP x2 ) ;

and implemented like this in a .cpp file :

SEXP foo( SEXP x1, SEXP x2 ){
   ...
}

There is not much more to know about the R API to be using Rcpp.

Most people only want to deal with numeric vectors in Rcpp. You do this with the NumericVector class. There are several ways to create a numeric vector :

From an existing object that you pass down from R:

 SEXP foo( SEXP x_) {
    Rcpp::NumericVector x( x_ ) ;
    ...
 }

With given values using the ::create static function:

 Rcpp::NumericVector x = Rcpp::NumericVector::create( 1.0, 2.0, 3.0 ) ;
 Rcpp::NumericVector x = Rcpp::NumericVector::create( 
    _["a"] = 1.0, 
    _["b"] = 2.0, 
    _["c"] = 3
 ) ;

Of a given size:

 Rcpp::NumericVector x( 10 ) ;      // filled with 0.0
 Rcpp::NumericVector x( 10, 2.0 ) ; // filled with 2.0

Then once you have a vector, the most useful thing is to extract one element from it. This is done with the operator[], with 0-based indexing, so for example summing values of a numeric vector goes something like this:

SEXP sum( SEXP x_ ){
   Rcpp::NumericVector x(x_) ;
   double res = 0.0 ;
   for( int i=0; i<x.size(), i++){
      res += x[i] ;
   }
   return Rcpp::wrap( res ) ;
}

But with Rcpp sugar we can do this much more nicely now:

using namespace Rcpp ;
SEXP sum( SEXP x_ ){
   NumericVector x(x_) ;
   double res = sum( x ) ;
   return wrap( res ) ;
}

As I said before, it all depends on what sort of code you want to write. Look into what people do in packages that rely on Rcpp, check the vignettes, the unit tests, come back to us on the mailing list. We are always happy to help.

Defenestration answered 8/11, 2010 at 12:32 Comment(0)
D
20

@jbremnant: That's right. Rcpp classes implement something close to the RAII pattern. When an Rcpp object is created, the constructor takes appropriate measures to ensure the underlying R object (SEXP) is protected from the garbage collector. The destructor withdraws the protection. This is explained in the Rcpp-intrduction vignette. The underlying implementation relies on the R API functions R_PreserveObject and R_ReleaseObject

There is indeed performance penalty due to C++ encapsulation. We try to keep this at a minimum with inlining, etc ... The penalty is small, and when you take into account the gain in terms of time it takes to write and maintain code, it is not that relevant.

Calling R functions from the Rcpp class Function is slower than directly calling eval with the C api. This is because we take precautions and wrap the function call into a tryCatch block so that we capture R errors and promote them to C++ exceptions so that they can be dealt with using the standard try/catch in C++.

Most people want to use vectors (specially NumericVector), and the penalty is very small with this class. The examples/ConvolveBenchmarks directory contains several variants of the notorious convolution function from R-exts and the vignette has benchmark results. It turns out that Rcpp makes it faster than the benchmark code that uses the R API.

Defenestration answered 8/11, 2010 at 12:12 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.