How to compute ROC and AUC under ROC after training using caret in R?
Asked Answered
W

2

17

I have used caret package's train function with 10-fold cross validation. I also have got class probabilities for predicted classes by setting classProbs = TRUE in trControl, as follows:

myTrainingControl <- trainControl(method = "cv", 
                              number = 10, 
                              savePredictions = TRUE, 
                              classProbs = TRUE, 
                              verboseIter = TRUE)

randomForestFit = train(x = input[3:154], 
                        y = as.factor(input$Target), 
                        method = "rf", 
                        trControl = myTrainingControl, 
                        preProcess = c("center","scale"), 
                        ntree = 50)

The output predictions I am getting is as follows.

  pred obs    0    1 rowIndex mtry Resample

1    0   1 0.52 0.48       28   12   Fold01
2    0   0 0.58 0.42       43   12   Fold01
3    0   1 0.58 0.42       51   12   Fold01
4    0   0 0.68 0.32       55   12   Fold01
5    0   0 0.62 0.38       59   12   Fold01
6    0   1 0.92 0.08       71   12   Fold01

Now I want to calculate ROC and AUC under ROC using this data. How would I achieve this?

Warton answered 21/5, 2015 at 6:30 Comment(4)
Have you done a search? There seems to be an easy example for this.Galasyn
@Galasyn that link is deadCaravan
@Caravan That was four years ago... Google will still find many relevant examples.Galasyn
This is a straightforward way of doing that and more: cran.r-project.org/web/packages/MLeval/index.html. See answer below for more detail.Humanism
R
32

A sample example for AUC:

rf_output=randomForest(x=predictor_data, y=target, importance = TRUE, ntree = 10001, proximity=TRUE, sampsize=sampsizes)

library(ROCR)
predictions=as.vector(rf_output$votes[,2])
pred=prediction(predictions,target)

perf_AUC=performance(pred,"auc") #Calculate the AUC value
[email protected][[1]]

perf_ROC=performance(pred,"tpr","fpr") #plot the actual ROC curve
plot(perf_ROC, main="ROC plot")
text(0.5,0.5,paste("AUC = ",format(AUC, digits=5, scientific=FALSE)))

or using pROC and caret

library(caret)
library(pROC)
data(iris)


iris <- iris[iris$Species == "virginica" | iris$Species == "versicolor", ]
iris$Species <- factor(iris$Species)  # setosa should be removed from factor



samples <- sample(NROW(iris), NROW(iris) * .5)
data.train <- iris[samples, ]
data.test <- iris[-samples, ]
forest.model <- train(Species ~., data.train)

result.predicted.prob <- predict(forest.model, data.test, type="prob") # Prediction

result.roc <- roc(data.test$Species, result.predicted.prob$versicolor) # Draw ROC curve.
plot(result.roc, print.thres="best", print.thres.best.method="closest.topleft")

result.coords <- coords(result.roc, "best", best.method="closest.topleft", ret=c("threshold", "accuracy"))
print(result.coords)#to get threshold and accuracy
Regulation answered 21/5, 2015 at 6:48 Comment(4)
train() in forest.model <- train(Species ~., data.train) doesn't work, with error : Error: package e1071 is required, R version 3.5Caravan
install the package 'e1071'Reste
@RUser are there any way that I can calculate auc under caret package? I am using twoclasssummary and already set my classprob to true and i am using roc as metric,both of my predicted value and the label are either 0 or 1, how can I calculate the auc of my prediction?Harebrained
Protip: Instead of using interim variables, you can format this all in to one long dplyr pipe. Like this: library(randomForest); library(ROCR); library(dplyr); library(magrittr); rf_output %>% extract("votes") %>% extract(,2) %>% as.vector() %>% prediction(target) %>% performance("auc") %>% slot("y.values") %>% extract2(1) %>% print()Passerby
H
9

Update 2019. This is what MLeval was written for (https://cran.r-project.org/web/packages/MLeval/index.html), it works with the Caret train output object to make ROCs, PR curves, calibration curves, and calculate metrics, such as ROC-AUC, sensitivity, specificity etc. It just uses one line to do all of this which is helpful for my analyses and may be of interest.

library(caret)
library(MLeval)

myTrainingControl <- trainControl(method = "cv", 
                                  number = 10, 
                                  savePredictions = TRUE, 
                                  classProbs = TRUE, 
                                  verboseIter = TRUE)

randomForestFit = train(x = Sonar[,1:60], 
                        y = as.factor(Sonar$Class), 
                        method = "rf", 
                        trControl = myTrainingControl, 
                        preProcess = c("center","scale"), 
                        ntree = 50)

##

x <- evalm(randomForestFit)

## get roc curve plotted in ggplot2

x$roc

## get AUC and other metrics

x$stdres
Humanism answered 2/12, 2019 at 8:2 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.