How to use all features in rpart?

Asked 23/9, 2014 at 19:24 Answered 19/12, 2015 at 4:0

r decision-tree document-classification rpart

I'm using the rpart package for decision tree classification. I have a data frame with around 4000 features (columns). I want to use all features in rpart() for my model. How can I do that? Basically, rpart() will ask me to use the function in this way:

dt <- rpart(class ~ feature1 + feature2 + ....)

My features are words in documents so I have more than 4k features. Each feature is represented by a word. Is there any possibility to use all features without writing them?

Tinworks answered 23/9, 2014 at 19:24 Comment(0)

I figured it out:

dt <- rpart(class ~ ., data)

"." represents all features.

Tinworks answered 23/9, 2014 at 19:53 Comment(0)

The caret library is really useful because you can easily apply different models and compare their performance. It can call rpart but uses a slightly different syntax to include all features.

library(caret)

library(data.table)

mt <- data.table(mtcars)

tr <- train(x=mt[,-'hp', with=FALSE], y = mt[, hp], method='rpart')

plot(tr$finalModel)
text(tr$finalModel)

Using all 4000 features for a decision tree could result in overfitting, especially if your number of observations is not huge. Caret provides built-in cross-validation. You might also want to look at model='rf' for random forests.

Waterford answered 19/12, 2015 at 4:0 Comment(0)

Recommended topics

Hot tags