ROC curve in R using rpart package?

Asked 13/6, 2015 at 11:34 Answered 3/10, 2016 at 12:13

I split Train data set and Test data set.

I used a package rpart for CART (classification tree) in R (only train set). And I want to carry out a ROC analysis using the ROCR package.

Variable is `n. use' (response varible... 1=yes, 0=no):

> Pred2 = prediction(Pred.cart, Test$n.use)
Error in prediction(Pred.cart, Test$n.use) : 
  **Format of predictions is invalid.**

This is my code. What is problem? And what is right type ("class" or "prob"?

library(rpart)
train.cart = rpart(n.use~., data=Train, method="class")

Pred.cart = predict(train.cart, newdata = Test, type = "class")

Pred2 = prediction(Pred.cart, Test$n.use)
roc.cart = performance(Pred2, "tpr", "fpr")

Sonia answered 13/6, 2015 at 11:34 Comment(0)

The prediction() function from the ROCR package expects the predicted "success" probabilities and the observed factor of failures vs. successes. In order to obtain the former you need to apply predict(..., type = "prob") to the rpart object (i.e., not "class"). However, as this returns a matrix of probabilities with one column per response class you need to select the "success" class column.

As your example, unfortunately, is not reproducible I'm using the kyphosis data from the rpart package for illustration:

library("rpart")
data("kyphosis", package = "rpart")
rp <- rpart(Kyphosis ~ ., data = kyphosis)

Then you can apply the prediction() function from ROCR. Here, I'm using the in-sample (training) data but the same can be applied out of sample (test data):

library("ROCR")
pred <- prediction(predict(rp, type = "prob")[, 2], kyphosis$Kyphosis)

And you can visualize the ROC curve:

plot(performance(pred, "tpr", "fpr"))
abline(0, 1, lty = 2)

Or the accuracy across cutoffs:

plot(performance(pred, "acc"))

Or any of the other plots and summaries supported by ROCR.

ROCR plots

Sigil answered 14/6, 2015 at 8:55 Comment(3)

Awesome answer! Where do you get all this knowledge? – Gherardi 1/9, 2017 at 14:51

Also what is 'success' class? For kyphosis data set it is fairly obvious present case, but if we are working with a and b outcomes. Which one should be designate as success? – Gherardi 1/9, 2017 at 15:25

Your choice. Both would work and you can choose which one you find easier to interpret. – Sigil 1/9, 2017 at 18:27

library("ROCR")
Pred.cart = predict(train.cart, newdata = Test, type = "prob")[,2] 
Pred2 = prediction(Pred.cart, Test$n.use) 
plot(performance(Pred2, "tpr", "fpr"))
abline(0, 1, lty = 2)

This snippet will work for you.

for more details refer to link : Classification Trees (R)

Mayamayakovski answered 3/10, 2016 at 12:13 Comment(0)

Recommended topics

Hot tags