The Problem
The problem is that the rpart is using a tree based algorithm, which can only handle a limited number of factors in a given feature. So you may have a variable that has been set to a factor with more than 53 categories:
> rf.1 <- randomForest(x = rf.train.2,
+ y = rf.label,
+ ntree = 1000)
Error in randomForest.default(x = rf.train.2, y = rf.label, ntree = 1000) :
Can not handle categorical predictors with more than 53 categories.
At the base of your problem, caret is running that function, so make sure you fix up your categorical variables with more than 53 levels.
Here is where my problem lied before (notice zipcode coming in as a factor):
# ------------------------------- #
# RANDOM FOREST WITH CV 10 FOLDS #
# ------------------------------- #
rf.train.2 <- df_train[, c("v1",
"v2",
"v3",
"v4",
"v5",
"v6",
"v7",
"v8",
"zipcode",
"price",
"made_purchase")]
rf.train.2 <- data.frame(v1=as.factor(rf.train.2$v1),
v2=as.factor(rf.train.2$v2),
v3=as.factor(rf.train.2$v3),
v4=as.factor(rf.train.2$v4),
v5=as.factor(rf.train.2$v5),
v6=as.factor(rf.train.2$v6),
v7=as.factor(rf.train.2$v7),
v8=as.factor(rf.train.2$v8),
zipcode=as.factor(rf.train.2$zipcode),
price=rf.train.2$price,
made_purchase=as.factor(rf.train.2$made_purchase))
rf.label <- rf.train.2[,"made_purchase"]
The Solution
Remove all categorical variables that have more than 53 levels.
Here is my fixed up code, adjusting the categorical variable zipcode, you could even have wrapped it in a numeric wrapper like this: as.numeric(rf.train.2$zipcode)
.
# ------------------------------- #
# RANDOM FOREST WITH CV 10 FOLDS #
# ------------------------------- #
rf.train.2 <- df_train[, c("v1",
"v2",
"v3",
"v4",
"v5",
"v6",
"v7",
"v8",
"zipcode",
"price",
"made_purchase")]
rf.train.2 <- data.frame(v1=as.factor(rf.train.2$v1),
v2=as.factor(rf.train.2$v2),
v3=as.factor(rf.train.2$v3),
v4=as.factor(rf.train.2$v4),
v5=as.factor(rf.train.2$v5),
v6=as.factor(rf.train.2$v6),
v7=as.factor(rf.train.2$v7),
v8=as.factor(rf.train.2$v8),
zipcode=rf.train.2$zipcode,
price=rf.train.2$price,
made_purchase=as.factor(rf.train.2$made_purchase))
rf.label <- rf.train.2[,"made_purchase"]