How to convert a data frame column to numeric type?
Asked Answered
S

19

323

How do you convert a data frame column to a numeric type?

Shem answered 18/2, 2010 at 12:17 Comment(0)
T
333

Since (still) nobody got check-mark, I assume that you have some practical issue in mind, mostly because you haven't specified what type of vector you want to convert to numeric. I suggest that you should apply transform function in order to complete your task.

Now I'm about to demonstrate certain "conversion anomaly":

# create dummy data.frame
d <- data.frame(char = letters[1:5], 
                fake_char = as.character(1:5), 
                fac = factor(1:5), 
                char_fac = factor(letters[1:5]), 
                num = 1:5, stringsAsFactors = FALSE)

Let us have a glance at data.frame

> d
  char fake_char fac char_fac num
1    a         1   1        a   1
2    b         2   2        b   2
3    c         3   3        c   3
4    d         4   4        d   4
5    e         5   5        e   5

and let us run:

> sapply(d, mode)
       char   fake_char         fac    char_fac         num 
"character" "character"   "numeric"   "numeric"   "numeric" 
> sapply(d, class)
       char   fake_char         fac    char_fac         num 
"character" "character"    "factor"    "factor"   "integer" 

Now you probably ask yourself "Where's an anomaly?" Well, I've bumped into quite peculiar things in R, and this is not the most confounding thing, but it can confuse you, especially if you read this before rolling into bed.

Here goes: first two columns are character. I've deliberately called 2nd one fake_char. Spot the similarity of this character variable with one that Dirk created in his reply. It's actually a numerical vector converted to character. 3rd and 4th column are factor, and the last one is "purely" numeric.

If you utilize transform function, you can convert the fake_char into numeric, but not the char variable itself.

> transform(d, char = as.numeric(char))
  char fake_char fac char_fac num
1   NA         1   1        a   1
2   NA         2   2        b   2
3   NA         3   3        c   3
4   NA         4   4        d   4
5   NA         5   5        e   5
Warning message:
In eval(expr, envir, enclos) : NAs introduced by coercion

but if you do same thing on fake_char and char_fac, you'll be lucky, and get away with no NA's:

> transform(d, fake_char = as.numeric(fake_char), 
               char_fac = as.numeric(char_fac))

  char fake_char fac char_fac num
1    a         1   1        1   1
2    b         2   2        2   2
3    c         3   3        3   3
4    d         4   4        4   4
5    e         5   5        5   5

If you save transformed data.frame and check for mode and class, you'll get:

> D <- transform(d, fake_char = as.numeric(fake_char), 
                    char_fac = as.numeric(char_fac))

> sapply(D, mode)
       char   fake_char         fac    char_fac         num 
"character"   "numeric"   "numeric"   "numeric"   "numeric" 
> sapply(D, class)
       char   fake_char         fac    char_fac         num 
"character"   "numeric"    "factor"   "numeric"   "integer"

So, the conclusion is: Yes, you can convert character vector into a numeric one, but only if it's elements are "convertible" to numeric. If there's just one character element in vector, you'll get error when trying to convert that vector to numerical one.

And just to prove my point:

> err <- c(1, "b", 3, 4, "e")
> mode(err)
[1] "character"
> class(err)
[1] "character"
> char <- as.numeric(err)
Warning message:
NAs introduced by coercion 
> char
[1]  1 NA  3  4 NA

And now, just for fun (or practice), try to guess the output of these commands:

> fac <- as.factor(err)
> fac
???
> num <- as.numeric(fac)
> num
???

Kind regards to Patrick Burns! =)

Teaser answered 19/2, 2010 at 0:31 Comment(6)
'stringsAsFactors = FALSE' is important for when reading in data files.Volar
I know this is old ... but... why did you choose transform() over df$fake_char <- as.integer(df$fake_char) ? There are multiple ways to do the same operation in R and I get stuck understanding the "correct" way of doing it. Thank you.Sago
So it is absolutely impossible to turn err <- c(1, "b", 3, 4, "e") into a numeric vector? In excel, there's a button that allows you to "convert to number". making whatever value the column a numeric. I am trying to mimic that in r.Headway
Warning != Error. You don't get an error converting mixed numeric/character to numeric, you get a warning and some NA values.Sherbet
I really don't understand why there are so many different ways to convert datatypes in R, do we really need mutate, transform, apply, when all of this can be done with a simple assignment?Ronen
tl;dr use transformDivisor
L
167

Something that has helped me: if you have ranges of variables to convert (or just more than one), you can use sapply.

A bit nonsensical but just for example:

data(cars)
cars[, 1:2] <- sapply(cars[, 1:2], as.factor)

Say columns 3, 6-15 and 37 of you dataframe need to be converted to numeric one could:

dat[, c(3,6:15,37)] <- sapply(dat[, c(3,6:15,37)], as.numeric)
Lipstick answered 18/2, 2010 at 16:15 Comment(4)
as.factor in the above code makes the column characterTecla
sapply is better than transform, when handling vectors of indices rather than variable namesGermanic
@Tecla is correct, at least with my data. The original df won't take on the "converted" columns as factors; they'll remain character. If you wrap the sapply call in as.data.frame() on the right hand side, as @Mehrad Mahmoudian suggested below, it will work.Tend
Will this work for a matrix? I'm trying it with the exact same code, yet when I check the class() of a column after, it still says "character" and not "numeric"Condom
P
114

if x is the column name of dataframe dat, and x is of type factor, use:

as.numeric(as.character(dat$x))
Pennyroyal answered 18/2, 2010 at 12:22 Comment(4)
adding as.character indeed is what I was looking for. Otherwise the conversion sometimes goes wrong. At least in my case.Scirrhus
Why is the as.character needed? I was getting an error: Error: (list) object cannot be coerced to type 'double' though I was reasonably sure that my vector had no characters / punctuations. Then i tried as.numeric(as.character(dat$x)) and it worked. Now i'm not sure whether my column is in fact only integers or not!Elkeelkhound
If you do as.numeric to a factor it will convert the levels to numeric not the actual values. Hence as.character is needed to first convert the factor to character and then as.numericTecla
This is the best answer hereHildredhildreth
C
43

I would have added a comment (cant low rating)

Just to add on user276042 and pangratz

dat$x = as.numeric(as.character(dat$x))

This will override the values of existing column x

Combustion answered 6/12, 2014 at 5:58 Comment(0)
D
20

With the following code you can convert all data frame columns to numeric (X is the data frame that we want to convert it's columns):

as.data.frame(lapply(X, as.numeric))

and for converting whole matrix into numeric you have two ways: Either:

mode(X) <- "numeric"

or:

X <- apply(X, 2, as.numeric)

Alternatively you can use data.matrix function to convert everything into numeric, although be aware that the factors might not get converted correctly, so it is safer to convert everything to character first:

X <- sapply(X, as.character)
X <- data.matrix(X)

I usually use this last one if I want to convert to matrix and numeric simultaneously

Decant answered 18/3, 2014 at 23:27 Comment(0)
M
19

While your question is strictly on numeric, there are many conversions that are difficult to understand when beginning R. I'll aim to address methods to help. This question is similar to This Question.

Type conversion can be a pain in R because (1) factors can't be converted directly to numeric, they need to be converted to character class first, (2) dates are a special case that you typically need to deal with separately, and (3) looping across data frame columns can be tricky. Fortunately, the "tidyverse" has solved most of the issues.

This solution uses mutate_each() to apply a function to all columns in a data frame. In this case, we want to apply the type.convert() function, which converts strings to numeric where it can. Because R loves factors (not sure why) character columns that should stay character get changed to factor. To fix this, the mutate_if() function is used to detect columns that are factors and change to character. Last, I wanted to show how lubridate can be used to change a timestamp in character class to date-time because this is also often a sticking block for beginners.


library(tidyverse) 
library(lubridate)

# Recreate data that needs converted to numeric, date-time, etc
data_df
#> # A tibble: 5 × 9
#>             TIMESTAMP SYMBOL    EX  PRICE  SIZE  COND   BID BIDSIZ   OFR
#>                 <chr>  <chr> <chr>  <chr> <chr> <chr> <chr>  <chr> <chr>
#> 1 2012-05-04 09:30:00    BAC     T 7.8900 38538     F  7.89    523  7.90
#> 2 2012-05-04 09:30:01    BAC     Z 7.8850   288     @  7.88  61033  7.90
#> 3 2012-05-04 09:30:03    BAC     X 7.8900  1000     @  7.88   1974  7.89
#> 4 2012-05-04 09:30:07    BAC     T 7.8900 19052     F  7.88   1058  7.89
#> 5 2012-05-04 09:30:08    BAC     Y 7.8900 85053     F  7.88 108101  7.90

# Converting columns to numeric using "tidyverse"
data_df %>%
    mutate_all(type.convert) %>%
    mutate_if(is.factor, as.character) %>%
    mutate(TIMESTAMP = as_datetime(TIMESTAMP, tz = Sys.timezone()))
#> # A tibble: 5 × 9
#>             TIMESTAMP SYMBOL    EX PRICE  SIZE  COND   BID BIDSIZ   OFR
#>                <dttm>  <chr> <chr> <dbl> <int> <chr> <dbl>  <int> <dbl>
#> 1 2012-05-04 09:30:00    BAC     T 7.890 38538     F  7.89    523  7.90
#> 2 2012-05-04 09:30:01    BAC     Z 7.885   288     @  7.88  61033  7.90
#> 3 2012-05-04 09:30:03    BAC     X 7.890  1000     @  7.88   1974  7.89
#> 4 2012-05-04 09:30:07    BAC     T 7.890 19052     F  7.88   1058  7.89
#> 5 2012-05-04 09:30:08    BAC     Y 7.890 85053     F  7.88 108101  7.90
Mycenae answered 5/3, 2017 at 14:13 Comment(1)
Note that if you use mutate_all(type.convert, as.is=TRUE) instead of mutate_all(type.convert), you can remove/avoid mutate_if(is.factor, as.character) to shorten the command. as.is is an argument in type.convert() that indicates whether it should convert strings as characters or as factors. By default, as.is=FALSE in type.convert() (i.e., converts strings to factor class instead of character class).Behka
S
17

If you run into problems with:

as.numeric(as.character(dat$x))

Take a look to your decimal marks. If they are "," instead of "." (e.g. "5,3") the above won't work.

A potential solution is:

as.numeric(gsub(",", ".", dat$x))

I believe this is quite common in some non English speaking countries.

Stanwood answered 15/7, 2015 at 14:12 Comment(0)
S
15

Tim is correct, and Shane has an omission. Here are additional examples:

R> df <- data.frame(a = as.character(10:15))
R> df <- data.frame(df, num = as.numeric(df$a), 
                        numchr = as.numeric(as.character(df$a)))
R> df
   a num numchr
1 10   1     10
2 11   2     11
3 12   3     12
4 13   4     13
5 14   5     14
6 15   6     15
R> summary(df)
  a          num           numchr    
 10:1   Min.   :1.00   Min.   :10.0  
 11:1   1st Qu.:2.25   1st Qu.:11.2  
 12:1   Median :3.50   Median :12.5  
 13:1   Mean   :3.50   Mean   :12.5  
 14:1   3rd Qu.:4.75   3rd Qu.:13.8  
 15:1   Max.   :6.00   Max.   :15.0  
R> 

Our data.frame now has a summary of the factor column (counts) and numeric summaries of the as.numeric() --- which is wrong as it got the numeric factor levels --- and the (correct) summary of the as.numeric(as.character()).

Speak answered 18/2, 2010 at 14:41 Comment(1)
My pleasure. This is one of the more silly corners of the language, and I think it featured in the older 'R Gotchas' question here.Speak
R
14

Universal way using type.convert() and rapply():

convert_types <- function(x) {
    stopifnot(is.list(x))
    x[] <- rapply(x, utils::type.convert, classes = "character",
                  how = "replace", as.is = TRUE)
    return(x)
}
d <- data.frame(char = letters[1:5], 
                fake_char = as.character(1:5), 
                fac = factor(1:5), 
                char_fac = factor(letters[1:5]), 
                num = 1:5, stringsAsFactors = FALSE)
sapply(d, class)
#>        char   fake_char         fac    char_fac         num 
#> "character" "character"    "factor"    "factor"   "integer"
sapply(convert_types(d), class)
#>        char   fake_char         fac    char_fac         num 
#> "character"   "integer"    "factor"    "factor"   "integer"
Roomette answered 10/10, 2015 at 5:35 Comment(3)
This is the most flexible solution--deserves some upvotes!Stubbs
Should be a top answer. Just remove as.is = TRUE if you want to convert your character to either numeric or factorsOnslaught
trying to change bunch of columns in a data.frame that has type matrix to numeric changes classes=matrix errored out first argument must be of mode characterDanger
E
6

To convert a data frame column to numeric you just have to do:-

factor to numeric:-

data_frame$column <- as.numeric(as.character(data_frame$column))
Emptyheaded answered 18/4, 2015 at 7:25 Comment(2)
Again, this answer doesn't add anything to the current set of answers. Also, it's not the preferred way to convert a factor to numeric. See https://mcmap.net/q/63494/-how-to-convert-a-factor-to-integer-numeric-without-loss-of-information for the preferred way.Euromarket
A better answer was: sapply(data_frame,function(x) as.numeric(as.character(x)))Hothouse
M
2

Though others have covered the topic pretty well, I'd like to add this additional quick thought/hint. You could use regexp to check in advance whether characters potentially consist only of numerics.

for(i in seq_along(names(df)){
     potential_numcol[i] <- all(!grepl("[a-zA-Z]",d[,i]))
}
# and now just convert only the numeric ones
d <- sapply(d[,potential_numcol],as.numeric)

For more sophisticated regular expressions and a neat why to learn/experience their power see this really nice website: http://regexr.com/

Mariammarian answered 24/10, 2014 at 8:53 Comment(0)
M
1

Considering there might exist char columns, this is based on @Abdou in Get column types of excel sheet automatically answer:

makenumcols<-function(df){
  df<-as.data.frame(df)
  df[] <- lapply(df, as.character)
  cond <- apply(df, 2, function(x) {
    x <- x[!is.na(x)]
    all(suppressWarnings(!is.na(as.numeric(x))))
  })
  numeric_cols <- names(df)[cond]
  df[,numeric_cols] <- sapply(df[,numeric_cols], as.numeric)
  return(df)
}
df<-makenumcols(df)
Marcomarconi answered 15/6, 2017 at 14:32 Comment(0)
F
1

If the dataframe has multiple types of columns, some characters, some numeric try the following to convert just the columns that contain numeric values to numeric:

for (i in 1:length(data[1,])){
  if(length(as.numeric(data[,i][!is.na(data[,i])])[!is.na(as.numeric(data[,i][!is.na(data[,i])]))])==0){}
  else {
    data[,i]<-as.numeric(data[,i])
  }
}
Forehand answered 11/1, 2018 at 22:8 Comment(0)
L
1

with hablar::convert

To easily convert multiple columns to different data types you can use hablar::convert. Simple syntax: df %>% convert(num(a)) converts the column a from df to numeric.

Detailed example

Lets convert all columns of mtcars to character.

df <- mtcars %>% mutate_all(as.character) %>% as_tibble()

> df
# A tibble: 32 x 11
   mpg   cyl   disp  hp    drat  wt    qsec  vs    am    gear  carb 
   <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
 1 21    6     160   110   3.9   2.62  16.46 0     1     4     4    
 2 21    6     160   110   3.9   2.875 17.02 0     1     4     4    
 3 22.8  4     108   93    3.85  2.32  18.61 1     1     4     1    

With hablar::convert:

library(hablar)

# Convert columns to integer, numeric and factor
df %>% 
  convert(int(cyl, vs),
          num(disp:wt),
          fct(gear))

results in:

# A tibble: 32 x 11
   mpg     cyl  disp    hp  drat    wt qsec     vs am    gear  carb 
   <chr> <int> <dbl> <dbl> <dbl> <dbl> <chr> <int> <chr> <fct> <chr>
 1 21        6  160    110  3.9   2.62 16.46     0 1     4     4    
 2 21        6  160    110  3.9   2.88 17.02     0 1     4     4    
 3 22.8      4  108     93  3.85  2.32 18.61     1 1     4     1    
 4 21.4      6  258    110  3.08  3.22 19.44     1 0     3     1   
Lianeliang answered 4/11, 2018 at 11:2 Comment(0)
G
1

If you don't care about preserving the factors, and want to apply it to any column that can get converted to numeric, I used the script below. if df is your original dataframe, you can use the script below.

df[] <- lapply(df, as.character)
df <- data.frame(lapply(df, function(x) ifelse(!is.na(as.numeric(x)), as.numeric(x),  x)))

I referenced Shane's and Joran's solution btw

Galipot answered 18/5, 2020 at 7:16 Comment(0)
D
0

In my PC (R v.3.2.3), apply or sapply give error. lapply works well.

dt[,2:4] <- lapply(dt[,2:4], function (x) as.factor(as.numeric(x)))
Dialogism answered 11/3, 2016 at 4:13 Comment(0)
J
0

To convert character to numeric you have to convert it into factor by applying

BankFinal1 <- transform(BankLoan,   LoanApproval=as.factor(LoanApproval))
BankFinal1 <- transform(BankFinal1, LoanApp=as.factor(LoanApproval))

You have to make two columns with the same data, because one column cannot convert into numeric. If you do one conversion it gives the below error

transform(BankData, LoanApp=as.numeric(LoanApproval))
Warning message:
  In eval(substitute(list(...)), `_data`, parent.frame()) :
  NAs introduced by coercion

so, after doing two column of the same data apply

BankFinal1 <- transform(BankFinal1, LoanApp      = as.numeric(LoanApp), 
                                    LoanApproval = as.numeric(LoanApproval))

it will transform the character to numeric successfully

Jewbaiting answered 27/7, 2017 at 9:33 Comment(0)
D
0

df ist your dataframe. x is a column of df you want to convert

as.numeric(factor(df$x))
Dory answered 4/12, 2019 at 16:57 Comment(0)
B
0

Convert to numeric only columns with digits with or without decimal separator

# detect which columsn have numeric characters (digits) with or without decimal separator "."
columns_with_digits <- sapply(df, function(x) 
  all(grepl("^\\d+\\.?\\d*$", x))  
)

# run as.numeric only in the detected columns 
df[, columns_with_digits] <- data.frame(lapply(df[, columns_with_digits], as.numeric))

See an example with iris below

library(dplyr) # for glimpse

# get example data
df <- iris

# convert from numeric columns to charactere
df$Sepal.Length <- as.character(df$Sepal.Length)
df$Sepal.Width <- as.character(df$Sepal.Width)
df$Petal.Length <- as.character(df$Petal.Length)
df$Petal.Width <- as.character(df$Petal.Width)

glimpse(df)

Check the data with glimpse()

>glimpse(df)
Rows: 150
Columns: 5
$ Sepal.Length <chr> "5.1", "4.9", "4.7", "4.6", "5",…
$ Sepal.Width  <chr> "3.5", "3", "3.2", "3.1", "3.6",…
$ Petal.Length <chr> "1.4", "1.4", "1.3", "1.5", "1.4…
$ Petal.Width  <chr> "0.2", "0.2", "0.2", "0.2", "0.2…
$ Species      <fct> setosa, setosa, setosa, setosa, …

Detect which columns have numeric characters (digits) with or without decimal separator point (.) using regular expressions (regex)

# detect which columns have numeric characters (digits) with or without decimal separator (.)
columns_with_digits <- sapply(df, function(x) 
  all(grepl("^\\d+\\.?\\d*$", x))
)
# where: 
# ^ indicates the begginig of the string
# \\d+ corresponds to a sequence of one or more digits 
# \\.? indicates the that points is optional (it can appear zero or more times due the ?)
# \\d* corresponds to zero or more digits after the 'optional' point 
# $ indicates the end of the string

Proceed to convert with lapply

# run as.numeric only in the detected columns 
df[, columns_with_digits] <- data.frame(lapply(df[, columns_with_digits], as.numeric))

Check final output

# check again 
glimpse(df)
Columns: 5
$ Sepal.Length <dbl> 5.1, 4.9, 4.7, 4.6, 5.0, 5.4, 4.…
$ Sepal.Width  <dbl> 3.5, 3.0, 3.2, 3.1, 3.6, 3.9, 3.…
$ Petal.Length <dbl> 1.4, 1.4, 1.3, 1.5, 1.4, 1.7, 1.…
$ Petal.Width  <dbl> 0.2, 0.2, 0.2, 0.2, 0.2, 0.4, 0.…
$ Species      <fct> setosa, setosa, setosa, setosa, …
Bronchopneumonia answered 28/2 at 11:59 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.