R Caret Package Error - At least one of the class levels is not a valid R variable name
Asked Answered
A

4

12

I am receiving the following error in R when stacking using the caret package.

"Error: At least one of the class levels is not a valid R variable name; This will cause errors when class probabilities are generated because the variables names will be converted to not5, X5sets . Please use factor levels that can be used as valid R variable names (see ?make.names for help)."

The below is the code I am trying to run.

library(caretEnsemble)
control <- trainControl(method="repeatedcv", number=10, repeats=3, savePredictions=TRUE, classProbs=TRUE)
algorithmList <- c('rpart', 'knn', 'svmRadial')
set.seed(222)
models <- caretList(Tsets ~ MatchSurface + MatchRound + AgeDiff + SameHand + HeightDiff, data=up_sample, trControl=control, methodList=algorithmList)
results <- resamples(models)

When I remove classProbs=TRUE, the code runs but I want to keep this as there is further code I am trying to run after this which requires it. All of my variables are factors or integers and I have changed all classes so they do not have "0"'s and "1"s. Therefore I cant figure out why the code wont run.

I have attached a picture of the data structure below. Would be great if anyone had some advice.

Data Structure

Atthia answered 25/6, 2018 at 11:17 Comment(2)
Change the names of levels in Tsets column so they do not start with a number.Fierro
Did you look at ?make.names like the error message suggests? It explains what is required for a column name to be valid. The error message also says specifically that "5sets" will not be a valid column name; run make.names(c("not5", "5sets")) to see this for yourselfHortenciahortensa
Q
16

Try changing your target variable to "yes"/"no" instead of 1/0.

Quintessa answered 28/3, 2019 at 12:19 Comment(0)
P
5

When caretList() runs a tree-based model (here rpart, but also applies to random forests), it converts the factor levels into variables which are used to split the tree. For these variables, names starting with a number are not allowed nor that they contain spaces. So for each of these variables, you can convert the level names to valid labels with the following code.

up_sample %>% 
  mutate(Tsets = factor(Tsets, 
                        labels = make.names(levels(Tsets))))
Positronium answered 30/5, 2019 at 6:47 Comment(0)
M
4

you have to change your traincontrol options Try to change the value of

classProbs = F

or you have to change the levels of the output variable to "yes/No" instead of "1/0"

levels(var)=c("Yes","No")
Mosira answered 21/12, 2020 at 12:20 Comment(0)
A
0

The error lies in up_sample column Tsets. I see there's two levels named "not5" and "5sets" and I'm assuming the column contains 0s and 1s.

The 0s and 1s must be converted to non-number values such as "yes"/"no" or "not5"/"5sets" with a line such as

up_sample$Tsets <- ifelse(up_sample$Tsets == 1, "5sets", "not5")

before your control <- trainControl(meth... line.

Appreciate answered 8/7 at 16:14 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.