Classification - Usage of factor levels
Asked Answered
I

5

13

I am currently working on a predictive model for a churn problem.
Whenever I try to run the following model, I get this error: At least one of the class levels is not a valid R variable name. This will cause errors when class probabilities are generated because the variables names will be converted to X0, X1. Please use factor levels that can be used as valid R variable names.

fivestats <- function(...) c( twoClassSummary(...), defaultSummary(...))
fitControl.default    <- trainControl( 
    method  = "repeatedcv"
  , number  = 10
  , repeats = 1 
  , verboseIter = TRUE
  , summaryFunction  = fivestats
  , classProbs = TRUE
  , allowParallel = TRUE)
set.seed(1984)

rpartGrid             <-  expand.grid(cp = seq(from = 0, to = 0.1, by = 0.001))
rparttree.fit.roc <- train( 
    churn ~ .
  , data      = training.dt  
  , method    = "rpart"
  , trControl = fitControl.default
  , tuneGrid  = rpartGrid
  , metric = 'ROC'
  , maximize = TRUE
)

In the attached picture you see my data, I already transformed some data from chr to factor variable.

DATA OVERVIEW

I do not get what my problem is, if I would transform the entire data into factors, then for instance the variable total_airtime_out will probably have around 9000 factors.

Thanks for any kind of help!

Ingaingaberg answered 20/5, 2017 at 10:24 Comment(1)
Can you please add dummy data or a sample + the code (incl. packages) with which one can recreate your error message? Thanks.Danelledanete
O
36

It's not exactly possible for me to reproduce your error, but my educated guess is that the error message tells you everything you need to know:

At least one of the class levels is not a valid R variable name. This will cause errors when class probabilities are generated because the variables names will be converted to X0, X1. Please use factor levels that can be used as valid R variable names.

Emphasis mine. Looking at your response variable, its levels are "0" and "1", these aren't valid variable names in R (you can't do 0 <- "my value"). Presumably this problem will go away if you rename the levels of the response variable with something like

levels(training.dt$churn) <- c("first_class", "second_class")

as per this Q.

Ogive answered 20/5, 2017 at 12:57 Comment(0)
P
9

How about this base function:

 make.names(churn) ~ .,

to "make syntactically valid names out of character vectors"?

Source

Pretense answered 5/12, 2018 at 12:26 Comment(0)
T
1

I had the same issue and fixed it by setting classProbs = FALSE in the trainControl() this solved the issue and kept the level 0 and 1

Theism answered 14/12, 2019 at 3:5 Comment(1)
but why does it work is a nice question?Stook
R
0

Adding to the correct answer of @einar, here's the dplyr syntax of converting the factor levels:

training.dt  %>% 
  mutate(churn = factor(churn, 
          levels = make.names(levels(churn))))

I slightly prefer to change only the labels of the factor levels, as the levels change the underlying data, like this:

training.dt  %>% 
  mutate(churn = factor(churn, 
          labels = make.names(levels(churn))))
Reconnoiter answered 30/5, 2019 at 7:2 Comment(0)
T
0

I got the same problem,

class(iris$Species); levels(iris$Species)
iris.lvls <- factor(iris, levels = c("1", "2", "3"))
class(iris.lvls); levels(iris.lvls)
Traps answered 4/11, 2020 at 9:49 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.