Rcpp and int64 NA value
Asked Answered
M

2

12

How can I pass an NA value from Rcpp to R in a 64 bit vector?

My first approach would be:

// [[Rcpp::export]]                                     
Rcpp::NumericVector foo() {
  Rcpp::NumericVector res(2);

  int64_t val = 1234567890123456789;
  std::memcpy(&(res[0]), &(val), sizeof(double));
  res[1] = NA_REAL;

  res.attr("class") = "integer64";
  return res;
}

But it yields

#> foo()
integer64
[1] 1234567890123456789 9218868437227407266

I need to get

#> foo()
integer64
[1] 1234567890123456789 <NA>
Mu answered 23/4, 2020 at 11:35 Comment(9)
You can't use NA_REAL after the memcpy because the bit pattern is at that point the one of a int64.Aspectual
I'd also edit the title. The default 64 bit NA is just NA_real which is not what your question is about.Aspectual
But the memcpy copies only 64 bits (sizeof(double)) right? So res[0] gets 64 bits from val and then setting res[1] = ... uses the next 64 bits. I agree with the outcome, but don't really follow your first comment.Mu
I had hoped that NA_real uses the same bit pattern that the bit64 package uses (1000...). I guess I was wrong there.Mu
The whole point is that the content of the vector is then bit by bit an int64_t that is merely "parked" inside a double vector (aka NumericVector). There is no magic logic copy. Jems is doing all the hard work by hand. Including mapping NAs.Aspectual
I've programmed for a few years now and I think I can conclude that hope is not a valid long-term strategy. ;-)Aspectual
Ah, that makes more sense! Thanks for the background!Mu
Interestingly, using std::memcpy(&(res[1]), &(NA_REAL), sizeof(double)); doesn't work...Mu
That. Is. What. I. Have. Been. Trying. To. Explain. Look at eg the R source for the existing NA defines. Look at some packages using int64 and see what they do.Aspectual
M
6

Alright, I think I found an answer... (not beautiful, but working).

Short Answer:

// [[Rcpp::export]]                                     
Rcpp::NumericVector foo() {
  Rcpp::NumericVector res(2);

  int64_t val = 1234567890123456789;
  std::memcpy(&(res[0]), &(val), sizeof(double));

  # This is the magic:
  int64_t v = 1ULL << 63;
  std::memcpy(&(res[1]), &(v), sizeof(double));

  res.attr("class") = "integer64";
  return res;
}

which results in

#> foo()
integer64
[1] 1234567890123456789 <NA>

Longer Answer

Inspecting how bit64 stores an NA

# the last value is the max value of a 64 bit number
a <- bit64::as.integer64(c(1, 2, NA, 9223372036854775807))
a
#> integer64
#> [1] 1    2    <NA> <NA>
bit64::as.bitstring(a[3])
#> [1] "1000000000000000000000000000000000000000000000000000000000000000"
bit64::as.bitstring(a[4])
#> [1] "1000000000000000000000000000000000000000000000000000000000000000"

Created on 2020-04-23 by the reprex package (v0.3.0)

we see that it is a 10000.... This can be recreated in Rcpp with int64_t val = 1ULL << 63;. Using memcpy() instead of a simple assign with = ensures that no bits are changed!

Mu answered 23/4, 2020 at 11:47 Comment(1)
Yes. If you look at some source packages you will see corresponding #define statement to declare one bit pattern (often either min or max) to be the NA value.Aspectual
A
7

It's really much, much simpler. We have the behaviour of an int64 in R offered by (several) add-on packages the best of which is bit64 giving us the integer64 S3 class and associated behavior.

And it defines the NA internally as follows:

#define NA_INTEGER64 LLONG_MIN

And that is all that there is. R and its packages are foremost C code, and LLONG_MIN exists there and goes (almost) back all the way to founding fathers.

There are two lessons here. The first is the extension of IEEE defining NaN and Inf for floating point values. R actually goes way beyond and adds NA for each of its types. In pretty much the way above: by reserving one particular bit pattern. (Which, in one case, is the birthday of one of the two original R creators.)

The other is to admire the metric ton of work Jens did with the bit64 package and all the required conversion and operator functions. Seamlessly converting all possibly values, including NA, NaN, Inf, ... is no small task.

And it is a neat topic that not too many people know. I am glad you asked the question because we now have a record here.

Aspectual answered 23/4, 2020 at 12:30 Comment(0)
M
6

Alright, I think I found an answer... (not beautiful, but working).

Short Answer:

// [[Rcpp::export]]                                     
Rcpp::NumericVector foo() {
  Rcpp::NumericVector res(2);

  int64_t val = 1234567890123456789;
  std::memcpy(&(res[0]), &(val), sizeof(double));

  # This is the magic:
  int64_t v = 1ULL << 63;
  std::memcpy(&(res[1]), &(v), sizeof(double));

  res.attr("class") = "integer64";
  return res;
}

which results in

#> foo()
integer64
[1] 1234567890123456789 <NA>

Longer Answer

Inspecting how bit64 stores an NA

# the last value is the max value of a 64 bit number
a <- bit64::as.integer64(c(1, 2, NA, 9223372036854775807))
a
#> integer64
#> [1] 1    2    <NA> <NA>
bit64::as.bitstring(a[3])
#> [1] "1000000000000000000000000000000000000000000000000000000000000000"
bit64::as.bitstring(a[4])
#> [1] "1000000000000000000000000000000000000000000000000000000000000000"

Created on 2020-04-23 by the reprex package (v0.3.0)

we see that it is a 10000.... This can be recreated in Rcpp with int64_t val = 1ULL << 63;. Using memcpy() instead of a simple assign with = ensures that no bits are changed!

Mu answered 23/4, 2020 at 11:47 Comment(1)
Yes. If you look at some source packages you will see corresponding #define statement to declare one bit pattern (often either min or max) to be the NA value.Aspectual

© 2022 - 2024 — McMap. All rights reserved.