Internal representation of int NA
Asked Answered
P

2

6

This is question about R internals. How integer NA values are represented in R? Unlike floating there is no magic bit sequence to represent NaNs.

# Create big array. newer versions of R won't allocate memory to store data
# Instead star/end values are stored internally
a <- 1:1e6 # 

# Change some random value. This will cause and array to be allocated
a[123] <- NA
typeof(a)

At this point a is still an array of integers. How a[123] represented internally? Does R use some magic number to indicate that an integer is NA?

My primary interest in internal representation of integers is related to binary read/write (readBin/writeBin). How to handle NA when performing binary I/O with external sources, e.g. via sockets?

Paste answered 8/6, 2019 at 15:34 Comment(1)
Maybe this will give you an answer : #51685361Bathometer
I
7

R uses the minimum integer value to represent NA. On a 4-byte system, valid integer values are usually -2,147,483,648 to 2,147,483,647 but in R

> .Machine$integer.max
[1] 2147483647
> -.Machine$integer.max
[1] -2147483647
> -.Machine$integer.max - 1L
[1] NA
Warning message:
In -.Machine$integer.max - 1L : NAs produced by integer overflow

Also,

> .Internal(inspect(NA_integer_))
@7fe69bbb79c0 13 INTSXP g0c1 [NAM(7)] (len=1, tl=0) -2147483648
Incapacitate answered 8/6, 2019 at 15:44 Comment(2)
I have updated my question, but I think the answer is evident from your postPaste
.Machine$integer.max + 1L is outside the 'usual' range, so is replaced by the NA value.Incapacitate
C
0

In memory, the way integers are stored is:

  • first 0L
  • then positive integers (ascending)
  • then NA_integer_
  • then negative integers (ascending)

So I wouldn't say that NA_integer_ is stored as the minimal integer, even if .Internal(inspect(NA_integer_)) suggests it. Maybe better to say it is stored where other languages usually store the minimal integer?

Look at the 4 last elements of each vector below.

# first zero
serialize(0L, connection = NULL)
#>  [1] 58 0a 00 00 00 03 00 04 04 00 00 03 05 00 00 00 00 05 55 54 46 2d 38 00 00
#> [26] 00 0d 00 00 00 01 00 00 00 00

# then positive integers (ascending)
serialize(1L, connection = NULL)
#>  [1] 58 0a 00 00 00 03 00 04 04 00 00 03 05 00 00 00 00 05 55 54 46 2d 38 00 00
#> [26] 00 0d 00 00 00 01 00 00 00 01
serialize(.Machine$integer.max, connection = NULL)
#>  [1] 58 0a 00 00 00 03 00 04 04 00 00 03 05 00 00 00 00 05 55 54 46 2d 38 00 00
#> [26] 00 0d 00 00 00 01 7f ff ff ff

# then NA_integer_
serialize(NA_integer_, connection = NULL)
#>  [1] 58 0a 00 00 00 03 00 04 04 00 00 03 05 00 00 00 00 05 55 54 46 2d 38 00 00
#> [26] 00 0d 00 00 00 01 80 00 00 00

# then negative integers (ascending, i.e. descending in absolute value)
serialize(-.Machine$integer.max, connection = NULL)
#>  [1] 58 0a 00 00 00 03 00 04 04 00 00 03 05 00 00 00 00 05 55 54 46 2d 38 00 00
#> [26] 00 0d 00 00 00 01 80 00 00 01
serialize(-1L, connection = NULL)
#>  [1] 58 0a 00 00 00 03 00 04 04 00 00 03 05 00 00 00 00 05 55 54 46 2d 38 00 00
#> [26] 00 0d 00 00 00 01 ff ff ff ff

Created on 2024-10-09 with reprex v2.1.0

Copious answered 9/10 at 8:17 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.