Return a dynamic vector from C to R
Asked Answered
W

2

11

I am writing some code in C to be called dynamically from R.

This code generates a path of a random Poisson process up to a desired time T. So at every call to my C function, the length of the returned vector will be different depending on the random numbers generated.

What R data structure will I have to create? a LISTSXP? another one?

How can I create it, and how can I append to it? And especially how can I return it back to R?

Thanks for help.

Witty answered 9/1, 2012 at 23:30 Comment(1)
You may want to check the examples in the "Writing R Extensions" manual cran.r-project.org/doc/manuals/R-exts.pdf (chapter 5). You either need a C vector of doubles double * or an R vector REALSXP.Retrieve
S
6

It's really up to you what you want to use as temporary structure, because in the end you'll have to allocate a vector for result anyway. So whatever you'll be using is not what you'll return. There are several possibilities:

  1. use Calloc + Realloc + Free which allows you to expand the temporary memory as needed. Once you have the full set, you allocate the result vector and return it.
  2. if you can overestimate the size easily, you can over-allocate the result vector and use SETLENGTH before you return it. There are issues with this, though, because the result will remain over-allocated until duplicated later on.
  3. You can use what you hinted at which is a chained list of vector blocks, i.e. allocate and protect one pairlist of which you append vectors to the tail as you need them. At the end you allocate the return vector and copy content of the blocks you allocated. This is more convoluted than both of the above.

Each of them has drawbacks and benefits, so it really depends on your application to pick the one best suited for you.

Edit: added an example of using the pairlists approach. I would still recommend the Realloc approach since it's much easier, but nonetheless:

#define BLOCK_SIZE xxx /* some reasonable size for increments - could be adaptive, too */ 
SEXP block; /* last vector block */
SEXP root = PROTECT(list1(block = allocVector(REALSXP, BLOCK_SIZE)));
SEXP tail = root;
double *values = REAL(block);
int count = 0, total = 0;
do { /* your code to generate values - if you want to add one 
        first try to add it to the existing block, otherwise allocate new one */
    if (count == BLOCK_SIZE) { /* add a new block when needed */
        tail = SETCDR(tail, list1(block = allocVector(REALSXP, BLOCK_SIZE)));
        values = REAL(block);
        total += count;
        count = 0;
    }
    values[count++] = next_value;
} while (...);
total += count;
/* when done, we need to create the result vector */
{
    SEXP res = allocVector(REALSXP, total);
    double *res_values = REAL(res);
    while (root != R_NilValue) {
        int size = (CDR(root) == R_NilValue) ? count : BLOCK_SIZE;
        memcpy(res_values, REAL(CAR(root)), sizeof(double) * size);
        res_values += size;
        root = CDR(root);
    }
    UNPROTECT(1);
    return res;
 }
Supercargo answered 10/1, 2012 at 0:46 Comment(4)
Thanks for answer. I will then create a linked list structure to memorize my temporary data, before having the entire length and copying it into a vector. I thought such structure existed in R Internals.Witty
A pairlist (LISTSXP) is a linked list so you don't need to create that structure - the only reason to not use Realloc (which is otherwise the easiest and most convenient) is precisely to keep all objects as R objects - as I said I would allocate and protect a pairlist as the root of your chain and append at the tail newly allocated vectors as you go (that solves the protection nightmare).Supercargo
Thanks a lot.. Please, can you give a samle C code, constructing a LISTXP of doubles, and adding some values to it, then retrieving these values back.. I could not find any API reference to this kind of R internal type.. Thanks again.Witty
Thanks a lot, your code helped me a lot to understand these mechanismsWitty
H
5

If you're open to switching from C to C++, you get added layer of Rcpp for free. Here is an example from the page of the (still rather incomple) RcppExample package:

#include <RcppClassic.h>
#include <cmath>

RcppExport SEXP newRcppVectorExample(SEXP vector) {
BEGIN_RCPP

    Rcpp::NumericVector orig(vector);                // keep a copy 
    Rcpp::NumericVector vec(orig.size());            // create vector same size

    // we could query size via
    //   int n = vec.size();
    // and loop over the vector, but using the STL is so much nicer
    // so we use a STL transform() algorithm on each element
    std::transform(orig.begin(), orig.end(), vec.begin(), ::sqrt);

    return Rcpp::List::create(Rcpp::Named( "result" ) = vec,
                              Rcpp::Named( "original" ) = orig) ;

END_RCPP
}

As you see, no explicit memory allocation, freeing, PROTECT/UNPROTECT etc, and you get a first class R list object back.

There are lots more examples, included in other SO questions such as this one.

Edit: And you didn't really say what you paths would do, but as a simple illustration, here is C++ code using the Rcpp additions cumsum() and rpois() which behave just like they do in R:

R> library(inline)
R> 
R> fun <- cxxfunction(signature(ns="integer", lambdas="numeric"),
+                    plugin="Rcpp",
+                    body='
+    int n = Rcpp::as<int>(ns);
+    double lambda = Rcpp::as<double>(lambdas);
+ 
+    Rcpp::RNGScope tmp;                  // make sure RNG behaves
+ 
+    Rcpp::NumericVector vec = cumsum( rpois( n, lambda ) );
+ 
+    return vec;
+ ')
R> set.seed(42)
R> fun(3, 0.3)
[1] 1 2 2
R> fun(4, 0.4)
[1] 1 1 1 2

And as a proof, back in R, if we set the seed, we can generate exactly the same numbers:

R> set.seed(42)
R> cumsum(rpois(3, 0.3))
[1] 1 2 2
R> cumsum(rpois(4, 0.4))
[1] 1 1 1 2
R> 
Hinkle answered 10/1, 2012 at 1:1 Comment(1)
Thanks for your answer. It really seems Rcpp deserves a close look. I will try to dive into it.Witty

© 2022 - 2024 — McMap. All rights reserved.