as.integer() on an int64 dataframe produces unexpected result
Asked Answered
P

1

6

I was reviewing some code and came across this odd result. If you have a dataframe with one value of type integer and you coerce it to integer you get what I think you would expect:

library(dplyr)

tibble(x = as.integer(c(1))) %>% as.integer()

[1] 1

But if it's of type int64, you get something weird:

library(bit64)

tibble(x = as.integer64(c(1))) %>% as.integer()

[1] 0

What gives? I assume it has something to do with the int64 class. But why would I get zero? Is this just bad error handling?

Update

OK, there's a hint to what's going on when you call dput on the int64 dataframe:

structure(list(x = structure(4.94065645841247e-324, 
                             class = "integer64")), 
          row.names = c(NA, -1L), 
          class = c("tbl_df", "tbl", "data.frame"))

So as.integer() is rightly converting 4.94065645841247e-324 to zero. But why is that what's stored in the DF?

Also, to see that this is not a bit64 issue, I get a very similar structure on the actual df I get back from my database:

structure(list(max = structure(2.78554211125295e-320,
                               class = "integer64")),
          class = "data.frame", 
          row.names = c(NA, -1L))
Penult answered 15/12, 2021 at 14:59 Comment(3)
That double is how the integer is being represented in the R object storing the data, like I mentioned below. The class just help informs r which method to use when you want to convert. In fact the DBI documentation mentions bit64 specificallyPedagogics
ok, so is int64 an option in base r?Penult
No, they’re just using the precision of a double to store more information, allowing for bigger numbers.Pedagogics
P
1

I think this is a limitation of bit64. bit64 uses the S3 Method as.integer.integer64 to convert from int64 to int, but only for vectors (unlike base as.integer which can be applied to other objects). The base as.integer doesn't know how to convert int64 to int on a data.frame or otherwise.

So after loading bit64, as.integer will call actually as.integer.integer64 on all int64 vectors, but not on a data.frame or tibble.

Pedagogics answered 15/12, 2021 at 15:34 Comment(8)
There's quite a lot of strange behavior. Look at output of matrix(as.integer64(x)) for any value of x, it's always approximately 0. Considering the documentation mentions these should be usable within vectors, matrices, etc. seems quite strange. I imagine this behavior must be related.Hymenium
Agreed. I just tried: as.vector(as.integer64(1), mode = 'integer'), which I think is roughly how base as.integer would handle a data.frame. It returns 0Pedagogics
Looking at the docs, it seems from base R's pov, integer64s are just doubles. as.integer is just converting the integer64's representation as a double in to integer, but that representation is not the actual value.Pedagogics
Interesting! Good find!Hymenium
So I just used bit64 to make things easier here, but the actual example came out of database without the use of bit64 I'll update the post.Penult
I think you're up against the same issue, if you are using DBI for example. as.integer should work on column mutate since the .integer64 method will me called, but it would have no way to work on a data.frame.Pedagogics
Check out the update above--this is going on in the structure of the object.Penult
So, bottom line: should this be considered a "bug" in bit64? You definitely shouldn't get 0 on a dataframe that's int64 when you call as.integer. It should either give you the expected result or throw an error.Penult

© 2022 - 2024 — McMap. All rights reserved.