Why does as.factor return a character when used inside apply?
Asked Answered
R

1

17

I want to convert variables into factors using apply():

a <- data.frame(x1 = rnorm(100),
                x2 = sample(c("a","b"), 100, replace = T),
                x3 = factor(c(rep("a",50) , rep("b",50))))

a2 <- apply(a, 2,as.factor)
apply(a2, 2,class)

results in:

         x1          x2          x3 
"character" "character" "character" 

I don't understand why this results in character vectors instead of factor vectors.

Rhadamanthus answered 6/3, 2010 at 10:57 Comment(0)
L
34

apply converts your data.frame to a character matrix. Use lapply:

lapply(a, class)
# $x1
# [1] "numeric"
# $x2
# [1] "factor"
# $x3
# [1] "factor"

In second command apply converts result to character matrix, using lapply:

a2 <- lapply(a, as.factor)
lapply(a2, class)
# $x1
# [1] "factor"
# $x2
# [1] "factor"
# $x3
# [1] "factor"

But for simple lookout you could use str:

str(a)
# 'data.frame':   100 obs. of  3 variables:
#  $ x1: num  -1.79 -1.091 1.307 1.142 -0.972 ...
#  $ x2: Factor w/ 2 levels "a","b": 2 1 1 1 2 1 1 1 1 2 ...
#  $ x3: Factor w/ 2 levels "a","b": 1 1 1 1 1 1 1 1 1 1 ...

Additional explanation according to comments:

Why does the lapply work while apply doesn't?

The first thing that apply does is to convert an argument to a matrix. So apply(a) is equivalent to apply(as.matrix(a)). As you can see str(as.matrix(a)) gives you:

chr [1:100, 1:3] " 0.075124364" "-1.608618269" "-1.487629526" ...
- attr(*, "dimnames")=List of 2
  ..$ : NULL
  ..$ : chr [1:3] "x1" "x2" "x3"

There are no more factors, so class return "character" for all columns.
lapply works on columns so gives you what you want (it does something like class(a$column_name) for each column).

You can see in help to apply why apply and as.factor doesn't work :

In all cases the result is coerced by as.vector to one of the basic vector types before the dimensions are set, so that (for example) factor results will be coerced to a character array.

Why sapply and as.factor doesn't work you can see in help to sapply:

Value (...) An atomic vector or matrix or list of the same length as X (...) If simplification occurs, the output type is determined from the highest type of the return values in the hierarchy NULL < raw < logical < integer < real < complex < character < list < expression, after coercion of pairlists to lists.

You never get matrix of factors or data.frame.

How to convert output to data.frame?

Simple, use as.data.frame as you wrote in comment:

a2 <- as.data.frame(lapply(a, as.factor))
str(a2)
'data.frame':   100 obs. of  3 variables:
 $ x1: Factor w/ 100 levels "-2.49629293159922",..: 60 6 7 63 45 93 56 98 40 61 ...
 $ x2: Factor w/ 2 levels "a","b": 1 1 2 2 2 2 2 1 2 2 ...
 $ x3: Factor w/ 2 levels "a","b": 1 1 1 1 1 1 1 1 1 1 ...

But if you want to replace selected character columns with factor there is a trick:

a3 <- data.frame(x1=letters, x2=LETTERS, x3=LETTERS, stringsAsFactors=FALSE)
str(a3)
'data.frame':   26 obs. of  3 variables:
 $ x1: chr  "a" "b" "c" "d" ...
 $ x2: chr  "A" "B" "C" "D" ...
 $ x3: chr  "A" "B" "C" "D" ...

columns_to_change <- c("x1","x2")
a3[, columns_to_change] <- lapply(a3[, columns_to_change], as.factor)
str(a3)
'data.frame':   26 obs. of  3 variables:
 $ x1: Factor w/ 26 levels "a","b","c","d",..: 1 2 3 4 5 6 7 8 9 10 ...
 $ x2: Factor w/ 26 levels "A","B","C","D",..: 1 2 3 4 5 6 7 8 9 10 ...
 $ x3: chr  "A" "B" "C" "D" ...

You could use it to replace all columns using:

a3 <- data.frame(x1=letters, x2=LETTERS, x3=LETTERS, stringsAsFactors=FALSE)
a3[, ] <- lapply(a3, as.factor)
str(a3)
'data.frame':   26 obs. of  3 variables:
 $ x1: Factor w/ 26 levels "a","b","c","d",..: 1 2 3 4 5 6 7 8 9 10 ...
 $ x2: Factor w/ 26 levels "A","B","C","D",..: 1 2 3 4 5 6 7 8 9 10 ...
 $ x3: Factor w/ 26 levels "A","B","C","D",..: 1 2 3 4 5 6 7 8 9 10 ...
Loaning answered 6/3, 2010 at 11:42 Comment(3)
Great Marek, Thanks. I see that the remaining thing to do is use as.data.frame on the output. I do wonder though, why does the lapply works while apply doesn't ? Thanks, TalRhadamanthus
Yup... if you want data.frame use as.data.frame(lapply(dtf, fun)). sapply will do the same thing as apply. Don't know why, but maybe it has something to do with the fact that data.frame is actually a list... lapply returns list, so it's easily convertible to data.frame if you do that on sapply or apply output, you're trying to coerce numeric to data.frame, hence mess things up... it is strange, but not an "unforeseen" behaviour, I must admit!Middling
Or do a[] <- lapply(a, as.factor)Ourself

© 2022 - 2025 — McMap. All rights reserved.