Convert data.frame columns from factors to characters
Asked Answered
M

18

401

I have a data frame. Let's call him bob:

> head(bob)
                 phenotype                         exclusion
GSM399350 3- 4- 8- 25- 44+ 11b- 11c- 19- NK1.1- Gr1- TER119-
GSM399351 3- 4- 8- 25- 44+ 11b- 11c- 19- NK1.1- Gr1- TER119-
GSM399352 3- 4- 8- 25- 44+ 11b- 11c- 19- NK1.1- Gr1- TER119-
GSM399353 3- 4- 8- 25+ 44+ 11b- 11c- 19- NK1.1- Gr1- TER119-
GSM399354 3- 4- 8- 25+ 44+ 11b- 11c- 19- NK1.1- Gr1- TER119-
GSM399355 3- 4- 8- 25+ 44+ 11b- 11c- 19- NK1.1- Gr1- TER119-

I'd like to concatenate the rows of this data frame (this will be another question). But look:

> class(bob$phenotype)
[1] "factor"

Bob's columns are factors. So, for example:

> as.character(head(bob))
[1] "c(3, 3, 3, 6, 6, 6)"       "c(3, 3, 3, 3, 3, 3)"      
[3] "c(29, 29, 29, 30, 30, 30)"

I don't begin to understand this, but I guess these are indices into the levels of the factors of the columns (of the court of king caractacus) of bob? Not what I need.

Strangely I can go through the columns of bob by hand, and do

bob$phenotype <- as.character(bob$phenotype)

which works fine. And, after some typing, I can get a data.frame whose columns are characters rather than factors. So my question is: how can I do this automatically? How do I convert a data.frame with factor columns into a data.frame with character columns without having to manually go through each column?

Bonus question: why does the manual approach work?

Mason answered 17/5, 2010 at 16:52 Comment(1)
would be nice if you would make the question reproducible, so include the structure of bob.Gretta
C
393

Just following on Matt and Dirk. If you want to recreate your existing data frame without changing the global option, you can recreate it with an apply statement:

bob <- data.frame(lapply(bob, as.character), stringsAsFactors=FALSE)

This will convert all variables to class "character", if you want to only convert factors, see Marek's solution below.

As @hadley points out, the following is more concise.

bob[] <- lapply(bob, as.character)

In both cases, lapply outputs a list; however, owing to the magical properties of R, the use of [] in the second case keeps the data.frame class of the bob object, thereby eliminating the need to convert back to a data.frame using as.data.frame with the argument stringsAsFactors = FALSE.

Cognate answered 17/5, 2010 at 17:21 Comment(9)
Shane, that'll also turn numerical columns into character.Prolific
@Dirk: That's true, although it isn't clear whether that's a problem here. Clearly, creating things correctly up front is the best solution. I don't think that it's easy to automatically convert data types across a data frame. One option is to use the above but then use type.convert after casting everything to character, then recast factors back to character again.Cognate
This seems to discard row names.Glaucous
To better understand lapply() and friends, you may want to see this useful summaryAlmaalmaata
@Glaucous did you use bob[] <- in the example or bob <- ?; the first keeps the data.frame; the second changes the data.frame to a list, dropping the rownames. I will update the answerCamisole
@david you are correct, not sure what evidence I had for my first commentGlaucous
A variant that only converts factor columns to character using an anonymous function: iris[] <- lapply(iris, function(x) if (is.factor(x)) as.character(x) else {x})Mohandas
iris[] <- sapply(iris, function(x) if (is.factor(x)) as.character(x) else x) Sapply also gives the same output on presence of []Reception
How did you figure out that [] = lapply returns a dataframeMagnifico
D
348

To replace only factors:

i <- sapply(bob, is.factor)
bob[i] <- lapply(bob[i], as.character)

In package dplyr in version 0.5.0 new function mutate_if was introduced:

library(dplyr)
bob %>% mutate_if(is.factor, as.character) -> bob

...and in version 1.0.0 was replaced by across:

library(dplyr)
bob %>% mutate(across(where(is.factor), as.character)) -> bob

Package purrr from RStudio gives another alternative:

library(purrr)
bob %>% modify_if(is.factor, as.character) -> bob
Defecate answered 17/5, 2010 at 22:8 Comment(5)
Not working for me, sadly. Don't know why. Probably because I have colnames?Questionary
@mohawkjohn Shouldn't be issue. You got error or results not as you expected?Defecate
Note: The purrr line returns a list, not a data.frame!Fraley
This also works if you already have an i that is a vector of colnames().Marvamarve
@Fraley should be modify_if instead of map_if from the very beginning :)Defecate
P
43

The global option

stringsAsFactors: The default setting for arguments of data.frame and read.table.

may be something you want to set to FALSE in your startup files (e.g. ~/.Rprofile). Please see help(options).

Prolific answered 17/5, 2010 at 17:2 Comment(2)
The problem with this is that when you execute your code in an environment where that .Rprofile file is missing you'll get bugs!Abutment
I tend to call it at the beginning of scripts rather than setting is in the .Rprofile.Glossator
M
26

If you understand how factors are stored, you can avoid using apply-based functions to accomplish this. Which isn't at all to imply that the apply solutions don't work well.

Factors are structured as numeric indices tied to a list of 'levels'. This can be seen if you convert a factor to numeric. So:

> fact <- as.factor(c("a","b","a","d")
> fact
[1] a b a d
Levels: a b d

> as.numeric(fact)
[1] 1 2 1 3

The numbers returned in the last line correspond to the levels of the factor.

> levels(fact)
[1] "a" "b" "d"

Notice that levels() returns an array of characters. You can use this fact to easily and compactly convert factors to strings or numerics like this:

> fact_character <- levels(fact)[as.numeric(fact)]
> fact_character
[1] "a" "b" "a" "d"

This also works for numeric values, provided you wrap your expression in as.numeric().

> num_fact <- factor(c(1,2,3,6,5,4))
> num_fact
[1] 1 2 3 6 5 4
Levels: 1 2 3 4 5 6
> num_num <- as.numeric(levels(num_fact)[as.numeric(num_fact)])
> num_num
[1] 1 2 3 6 5 4
Midsection answered 21/3, 2013 at 17:40 Comment(1)
This answer does not address the problem, which is how do I convert all of the factor columns in my data frame to character. as.character(f), is better in both readability and efficiency to levels(f)[as.numeric(f)]. If you wanted to be clever, you could use levels(f)[f] instead. Note that when converting a factor with numeric values, you do get some benefit from as.numeric(levels(f))[f] over, e.g., as.numeric(as.character(f)), but this is because you only have to convert the levels to numeric and then subset. as.character(f) is just fine as it is.Addend
S
22

If you want a new data frame bobc where every factor vector in bobf is converted to a character vector, try this:

bobc <- rapply(bobf, as.character, classes="factor", how="replace")

If you then want to convert it back, you can create a logical vector of which columns are factors, and use that to selectively apply factor

f <- sapply(bobf, class) == "factor"
bobc[,f] <- lapply(bobc[,f], factor)
Stillas answered 5/1, 2012 at 6:4 Comment(3)
+1 for doing only what was necessary (i.e. not converting the entire data.frame to character). This solution is robust to a data.frame that contains mixed types.Mycology
This example should be in the `Examples' section for rapply, like at: stat.ethz.ch/R-manual/R-devel/library/base/html/rapply.html . Anyone know how to request that that be so?Doha
If you want to end up with a data frame, simple wrap the rapply in a data.frame call (using the stringsAsFactors set to FALSE argument)Nicker
N
16

I typically make this function apart of all my projects. Quick and easy.

unfactorize <- function(df){
  for(i in which(sapply(df, class) == "factor")) df[[i]] = as.character(df[[i]])
  return(df)
}
Nietzsche answered 10/1, 2013 at 22:25 Comment(0)
L
11

Another way is to convert it using apply

bob2 <- apply(bob,2,as.character)

And a better one (the previous is of class 'matrix')

bob2 <- as.data.frame(as.matrix(bob),stringsAsFactors=F)
Loftus answered 17/5, 2010 at 17:15 Comment(1)
Following @Shane's comment: in order to get data.frame, do as.data.frame(lapply(...Intervention
L
9

Update: Here's an example of something that doesn't work. I thought it would, but I think that the stringsAsFactors option only works on character strings - it leaves the factors alone.

Try this:

bob2 <- data.frame(bob, stringsAsFactors = FALSE)

Generally speaking, whenever you're having problems with factors that should be characters, there's a stringsAsFactors setting somewhere to help you (including a global setting).

Lipinski answered 17/5, 2010 at 17:0 Comment(2)
This does work, if he sets it when creating bob to begin with (but not after the fact).Cognate
Right. Just wanted to be clear that this doesn't solve the problem, per se - but thanks for noting that it does prevent it.Lipinski
I
8

Or you can try transform:

newbob <- transform(bob, phenotype = as.character(phenotype))

Just be sure to put every factor you'd like to convert to character.

Or you can do something like this and kill all the pests with one blow:

newbob_char <- as.data.frame(lapply(bob[sapply(bob, is.factor)], as.character), stringsAsFactors = FALSE)
newbob_rest <- bob[!(sapply(bob, is.factor))]
newbob <- cbind(newbob_char, newbob_rest)

It's not good idea to shove the data in code like this, I could do the sapply part separately (actually, it's much easier to do it like that), but you get the point... I haven't checked the code, 'cause I'm not at home, so I hope it works! =)

This approach, however, has a downside... you must reorganize columns afterwards, while with transform you can do whatever you like, but at cost of "pedestrian-style-code-writting"...

So there... =)

Intervention answered 17/5, 2010 at 17:49 Comment(0)
P
7

At the beginning of your data frame include stringsAsFactors = FALSE to ignore all misunderstandings.

Pasquale answered 16/1, 2016 at 15:21 Comment(0)
G
6

If you would use data.table package for the operations on data.frame then the problem is not present.

library(data.table)
dt = data.table(col1 = c("a","b","c"), col2 = 1:3)
sapply(dt, class)
#       col1        col2 
#"character"   "integer" 

If you have a factor columns in you dataset already and you want to convert them to character you can do the following.

library(data.table)
dt = data.table(col1 = factor(c("a","b","c")), col2 = 1:3)
sapply(dt, class)
#     col1      col2 
# "factor" "integer" 
upd.cols = sapply(dt, is.factor)
dt[, names(dt)[upd.cols] := lapply(.SD, as.character), .SDcols = upd.cols]
sapply(dt, class)
#       col1        col2 
#"character"   "integer" 
Gretta answered 9/12, 2015 at 20:55 Comment(1)
DT circumvents the sapply fix proposed by Marek: In [<-.data.table(*tmp*, sapply(bob, is.factor), : Coerced 'character' RHS to 'double' to match the column's type. Either change the target column to 'character' first (by creating a new 'character' vector length 1234 (nrows of entire table) and assign that; i.e. 'replace' column), or coerce RHS to 'double' (e.g. 1L, NA_[real|integer]_, as.*, etc) to make your intent clear and for speed. Or, set the column type correctly up front when you create the table and stick to it, please. It's easier to fix the DF and recreate the DT.Headwater
S
3

This works for me - I finally figured a one liner

df <- as.data.frame(lapply(df,function (y) if(class(y)=="factor" ) as.character(y) else y),stringsAsFactors=F)
Synoptic answered 24/10, 2014 at 16:0 Comment(0)
V
3

New function "across" was introduced in dplyr version 1.0.0. The new function will supersede scoped variables (_if, _at, _all). Here's the official documentation

library(dplyr)
bob <- bob %>% 
       mutate(across(where(is.factor), as.character))
Vituperation answered 13/8, 2020 at 13:37 Comment(2)
I include this change in my answer. Thanks to bring my attention.Defecate
no problem. I tried to edit your answer but it got rejected by review team.Vituperation
D
2

You should use convert in hablar which gives readable syntax compatible with tidyverse pipes:

library(dplyr)
library(hablar)

df <- tibble(a = factor(c(1, 2, 3, 4)),
             b = factor(c(5, 6, 7, 8)))

df %>% convert(chr(a:b))

which gives you:

  a     b    
  <chr> <chr>
1 1     5    
2 2     6    
3 3     7    
4 4     8   
Disclaimer answered 10/6, 2019 at 21:25 Comment(0)
L
2

With the dplyr-package loaded use

bob=bob%>%mutate_at("phenotype", as.character)

if you only want to change the phenotype-column specifically.

Lobelia answered 10/2, 2020 at 12:16 Comment(0)
I
1

This function does the trick

df <- stacomirtools::killfactor(df)
Irairacund answered 13/11, 2017 at 16:46 Comment(0)
K
0

Maybe a newer option?

library("tidyverse")

bob <- bob %>% group_by_if(is.factor, as.character)
Karinekariotta answered 14/8, 2019 at 16:9 Comment(0)
R
0

This works transforming all to character and then the numeric to numeric:

makenumcols<-function(df){
  df<-as.data.frame(df)
  df[] <- lapply(df, as.character)
  cond <- apply(df, 2, function(x) {
    x <- x[!is.na(x)]
    all(suppressWarnings(!is.na(as.numeric(x))))
  })
  numeric_cols <- names(df)[cond]
  df[,numeric_cols] <- sapply(df[,numeric_cols], as.numeric)
  return(df)
}

Adapted from: Get column types of excel sheet automatically

Rustyrut answered 27/8, 2019 at 19:23 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.