How does R represent NA internally?
Asked Answered
M

1

4

R seems to support an efficient NA value in floating point arrays. How does it represent it internally?

My (perhaps flawed) understanding is that modern CPUs can carry out floating point calculations in hardware, including efficient handling of Inf, -Inf and NaN values. How does NA fit into this, and how is it implemented without compromising performance?

Macaque answered 4/8, 2018 at 10:54 Comment(1)
For integers: #56508248Baler
M
5

With IEEE floats +Inf and -Inf is represented with all bits in the exponent (second till 13. bit) set to one and all bits in the mantissa set to zero, whereas NaN has a non-zero mantissa. R uses different values for the mantissa to represent NaN as well as NA_real_. We can use a simple C++ function to make this explicit:

Rcpp::cppFunction('void print_hex(double x) {
    uint64_t y;
    static_assert(sizeof x == sizeof y, "Size does not match!");
    std::memcpy(&y, &x, sizeof y);
    Rcpp::Rcout << std::hex << y << std::endl;
}', plugins = "cpp11", includes = "#include <cstdint>")
print_hex(NA_real_)
#> 7ff00000000007a2
print_hex(NaN)
#> 7ff8000000000000
print_hex(Inf)
#> 7ff0000000000000
print_hex(-Inf)
#> fff0000000000000

Here some source code references.

Myriam answered 4/8, 2018 at 14:30 Comment(2)
I find this answer a bit confusing. I'm not an expert, but I don't think Infs are considered a kind of NaN. If the exponent is all ones, and the mantissa is all zeros, that's an Inf (positive or negative depending on the sign bit). If the exponent is all ones, and the mantissa has at least one high bit, that's a NaN. In this case, the mantissa is called the "payload" of the NaN. R represents NA as a NaN with a particular payload, 0x80000000007a2. For the "plain old" NaN, R uses the payload 0x8000000000000, as you can see if you do print_hex(NaN). en.wikipedia.org/wiki/NaNRagouzis
Agreed, Inf is not an NaN. I tried to make this clear with an update.Myriam

© 2022 - 2024 — McMap. All rights reserved.