Long Numbers As A Character String
Asked Answered
S

6

21

As part of my dataset, one of the columns is a series of 24-digit numbers.

Example:

bigonumber <- 429382748394831049284934

When I import it using either data.table::fread or read.csv, it shows up as numeric in exponential format (EG: 4.293827e+23).

options(digits=...) won't work since the number is longer than 22 digits.

When I do

as.character(bigonumber) 

what I get is "4.29382748394831e+23"

Is there a way to get bigonumber converted to a character string and show all of the digits as characters? I don't need to do any math on it, but I do need to search against it and do dplyr joins on it.

I need to this after import, since the column number varies from month to month.

(Yes, in the perfect world, my upstream data provider would use a hash instead of a long number and a static number of columns that stay the same every month, but I don't get to dictate that to them.)

Salamander answered 1/9, 2015 at 19:39 Comment(1)
?fread and ?read.csv both include and explain colClassesServiceberry
F
16

You can specify colClasses on your fread or read.csv statement.

bignums
429382748394831049284934
429382748394831049284935
429382748394831049284936
429382748394831049284937
429382748394831049284938
429382748394831049284939

bignums <- read.csv("~/Desktop/bignums.txt", sep="", colClasses = 'character')
Friar answered 1/9, 2015 at 19:50 Comment(2)
This is the right answer. colClasses works for fread too.Rone
You can even do read.csv("~/Desktop/bignums.txt", sep="", colClasses = c(bignums='character')) to keep only one column.Sawhorse
A
14

You can suppress the scientific notation with

options(scipen=999)

If you define the number then

bigonumber <- 429382748394831049284934

you can convert it into a string:

big.o.string <- as.character(bigonumber)

Unfortunately, this does not work because R converts the number to a double, thereby losing precision:

#[1] "429382748394831019507712"

The last digits are not preserved, as pointed out by @SabDeM. Even setting

options(digits=22)

doesn't help, and in any case 22 is the largest number that is allowed; and in your case there are 24 digits. So it seems that you will have to read the data directly as character or factor. Great answers have been posted showing how this can be achieved.

As a side note, there is a package called gmp that allows using arbitrarily large integer numbers. However, there is a catch: they have to be read as characters (again, in order to prevent R's internal conversion into double).

library(gmp)
bigonumber <- as.bigz("429382748394831049284934")
> bigonumber
Big Integer ('bigz') :
[1] 429382748394831049284934
> class(bigonumber)
[1] "bigz"

The advantage is that you can indeed treat these entries as numbers and perform calculations while preserving all the digits.

> bigonumber * 2
#Big Integer ('bigz') :
#[1] 858765496789662098569868

This package and my answer here may not solve your problem, because reading the numbers directly as characters is an easier way to achieve your goal, but I thought I might post this anyway as an information for users who may need to use large integers with more than 22 digits.

Atmospheric answered 1/9, 2015 at 19:45 Comment(0)
H
3

Use digest::digest on bigonumber to generate an md5 hash of the number yourself?

bigonumber <- 429382748394831049284934
hash_big <- digest::digest(bigonumber)
hash_big
# "e47e7d8a9e1b7d74af6a492bf4f27193"
Havens answered 1/9, 2015 at 19:48 Comment(2)
That is super tasty. I think going forward I'll do that as a new column.Salamander
but this only works if you haven't already lost precision by trying to store a too-many-digit number as numeric ...Rone
H
3

I saw this before I posted my answer, but dont see it here anymore.

set options(scipen) to a big value so that there is no truncation:

options(scipen = 999)
bigonumber <- 429382748394831049284934
bigonumber
# [1] 429382748394831019507712
as.character(bigonumber)
# [1] "429382748394831019507712"
Havens answered 1/9, 2015 at 19:57 Comment(1)
This looks like the best of a few good options. Thanks! ETA: Hrmph. The last few are wrong indeed. No idea what causes that.Salamander
H
2

Use "scan" to read the file - the "what" parameter lets you define the input type of each column.

Highness answered 1/9, 2015 at 19:49 Comment(0)
L
2

If you want numbers as numbers you can't print all values. The digits options allows a maximum of 22 digits. The range is from 1 to 22. It uses the print.default method. You can set it with:

options( digits = 22 )

Even with this options, the numbers will change. I ignore why that happens, most likely due to the fact that the object your are about to print (the number) is longer than the allowed amount of digits and so R does some weird stuff. I'll investigate about it.

Leach answered 1/9, 2015 at 19:49 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.