Change column class based on other dataframe
Asked Answered
C

2

6

I have a data frame and I am trying to convert class of each variable of dt based on col_type.

Find example below for more detail.

> dt
  id <- c(1,2,3,4)
   a <- c(1,4,5,6)
   b <- as.character(c(0,1,1,4))
   c <- as.character(c(0,1,1,0))
   d <- c(0,1,1,0)
  dt <- data.frame(id,a,b,c,d, stringsAsFactors = FALSE)

> str(dt)
'data.frame':   4 obs. of  5 variables:
 $ id: num  1 2 3 4
 $ a : num  1 4 5 6
 $ b : chr  "0" "1" "1" "4"
 $ c : chr  "0" "1" "1" "0"
 $ d : num  0 1 1 0

Now, I am trying to convert class of each column based on below data frame.

> var  
  var <- c("id","a","b","c","d")
  type <- c("character","numeric","numeric","integer","character")
  col_type <- data.frame(var,type, stringsAsFactors = FALSE)


> col_type
  var      type
1  id character
2   a   numeric
3   b   numeric
4   c   integer
5   d character

I want to convert id to class mention in col_type data frame and so on for all other columns.

My Attempts:

setDT(dt)
for(i in 1:ncol(dt)){
  if(colnames(dt)[i]%in%col_type$var){
    a <- col_type[col_type$var==paste0(intersect(colnames(dt)[i],col_type$var)),]
    dt[,col_type$var[i]:=eval(parse(text = paste0("as.",col_type$type[i],"(",col_type$var[i],")")))]
  }
  
}

Note- My solution works but it is really slow and I am wondering if I can do it more efficiently and cleanly.

Suggestions will be appreciated.

Covenanter answered 27/3, 2018 at 16:58 Comment(0)
M
2

I would read the data in with the colClasses argument derived from the col_type table:

library(data.table)
library(magrittr)
setDT(col_type)

res = capture.output(fwrite(dt)) %>% paste(collapse="\n") %>% 
  fread(colClasses = col_type[, setNames(type, var)])

str(res)
Classes ‘data.table’ and 'data.frame':  4 obs. of  5 variables:
 $ id: chr  "1" "2" "3" "4"
 $ a : num  1 4 5 6
 $ b : num  0 1 1 4
 $ c : int  0 1 1 0
 $ d : chr  "0" "1" "1" "0"
 - attr(*, ".internal.selfref")=<externalptr> 

If you can do this when the data is read in initially, it simplifies to...

 res = fread("file.csv", colClasses = col_type[, setNames(type, var)])

It's straightforward to do all of this without data.table.


If somehow the data is never read into R (received as RDS?), there's:

setDT(dt)
res = dt[, Map(as, .SD, col_type$type), .SDcols=col_type$var]

str(res)
Classes ‘data.table’ and 'data.frame':  4 obs. of  5 variables:
 $ id: chr  "1" "2" "3" "4"
 $ a : num  1 4 5 6
 $ b : num  0 1 1 4
 $ c : int  0 1 1 0
 $ d : chr  "0" "1" "1" "0"
 - attr(*, ".internal.selfref")=<externalptr> 

See showMethods("coerce") as some conversions might fail, e.g.: as(letters[1:3], "factor")

Munguia answered 27/3, 2018 at 17:59 Comment(6)
I know about this colClasses parameter but I can't write and read it. My data size is really large and its not feasible.Covenanter
any other alternative to make my solution efficient?Covenanter
@Rushabh You only have the data in R stored with the wrong column classes; not on disk prior to reading into R? Not terribly efficient, but you can use the ?as function for conversion, at least to the classes you've identified. It's pretty weak, though, eg as("2018-01-01", "Date") fails. Anyway, I'll edit it in.Munguia
your map solution is working but I am get warning "In asMethod(object) : NAs introduced by coercion"Covenanter
@Rushabh I guess the warning is just there for you to recognize cases like as.numeric(c(1, 2, "zip")) since the last element cannot be converted to a number.Munguia
alright got it. I really appreciate your help. learned something new.Covenanter
C
1

Consider base R's get() inside Map which can be used to retrieve a function from its string literal using as.* functions. Then bind list of vectors into a dataframe.

vec_list <- Map(function(v, t) get(paste0("as.", t))(dt[[v]]), col_type$var, col_type$type)

dt_new <- data.frame(vec_list, stringsAsFactors = FALSE)

str(dt_new)
# 'data.frame': 4 obs. of  5 variables:
# $ id: chr  "1" "2" "3" "4"
# $ a : num  1 4 5 6
# $ b : num  0 1 1 4
# $ c : int  0 1 1 0
# $ d : chr  "0" "1" "1" "0"

Possibly wrap get() in tryCatch if conversions can potentially fail.

Crumpton answered 27/3, 2018 at 18:33 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.