R logistic regression area under curve
Asked Answered
W

3

25

I am performing logistic regression using this page. My code is as below.

mydata <- read.csv("http://www.ats.ucla.edu/stat/data/binary.csv")
mylogit <- glm(admit ~ gre, data = mydata, family = "binomial")
summary(mylogit)
prob=predict(mylogit,type=c("response"))
mydata$prob=prob

After running this code mydata dataframe has two columns - 'admit' and 'prob'. Shouldn't those two columns sufficient to get the ROC curve?

How can I get the ROC curve.

Secondly, by loooking at mydata, it seems that model is predicting probablity of admit=1.

Is that correct?

How to find out which particular event the model is predicting?

Thanks

UPDATE: It seems that below three commands are very useful. They provide the cut-off which will have maximum accuracy and then help to get the ROC curve.

coords(g, "best")

mydata$prediction=ifelse(prob>=0.3126844,1,0)

confusionMatrix(mydata$prediction,mydata$admit
Warily answered 26/8, 2013 at 16:49 Comment(3)
Wouldn't it be very simple to test your uncertainty about what is being predicted with a small dataset? Or just look at the results of with(mydata, table(admit,gre))? Logistic regression is just estimating over a bunch of tables.)Phototonus
yes...we can do that way..and i followed the same method to arrive at the conclusion that the current case it is predicting admit=1..but thought that R will have some shortcut which will confirm my thinking. Any comment on finding out the threshold which will give maximum accuracy from roc object?Warily
regarding "Any comment on finding out the threshold which will give maximum accuracy from roc object? ": i think that the answer is coords(g, "best")...Warily
V
47

The ROC curve compares the rank of prediction and answer. Therefore, you could evaluate the ROC curve with package pROC as follow:

mydata <- read.csv("http://www.ats.ucla.edu/stat/data/binary.csv")
mylogit <- glm(admit ~ gre, data = mydata, family = "binomial")
summary(mylogit)
prob=predict(mylogit,type=c("response"))
mydata$prob=prob
library(pROC)
g <- roc(admit ~ prob, data = mydata)
plot(g)    
Volotta answered 26/8, 2013 at 17:0 Comment(4)
that makes sense. If possible please answer "Secondly, by loooking at mydata, it seems that model is predicting probablity of admit=1. is that correct? how to find out which particular event the model is predicting?" too. I looked at the roc object and understand that g$sensitivities and g$specificities will give me specific values, but if i want to find out the threshold which will give maximum accuracy then can i get that number from roc object?Warily
@Volotta the "admit" variable is the predicted class or the actual class?Anitraaniweta
That URL to get the data now seems to be out of date. For anyone else interested in reproducing this example, what seems to work now is mydata <- read.csv("stats.idre.ucla.edu/stat/data/binary.csv") (using https:// prepended tho' that doesn't want to appear in the comment)Scrimshaw
How can you calculate the area under the curve of two linear regressions estimated by a lmer model? Thanks stats.stackexchange.com/questions/570145/…Aqualung
V
12

another way to plot ROC Curve...

library(Deducer)
modelfit <- glm(formula=admit ~ gre + gpa, family=binomial(), data=mydata, na.action=na.omit)
rocplot(modelfit)
Victualage answered 10/9, 2014 at 13:7 Comment(1)
You'll need Java installed for this or you'll get an error, just FYI. Error: .onLoad failed in loadNamespace() for 'rJava', details: call: fun(libname, pkgname) error: JAVA_HOME cannot be determined from the RegistryInconsiderable
V
5
#Another way to plot ROC

mydata <- read.csv("http://www.ats.ucla.edu/stat/data/binary.csv")   
mylogit <- glm(admit ~ gre, data = mydata, family = "binomial")    
summary(mylogit)     
prob=predict(mylogit,type=c("response"))    
library("ROCR")    
pred <- prediction(prob, mydata$admit)    
perf <- performance(pred, measure = "tpr", x.measure = "fpr")     
plot(perf, col=rainbow(7), main="ROC curve Admissions", xlab="Specificity", 
     ylab="Sensitivity")    
abline(0, 1) #add a 45 degree line
Vestavestal answered 24/11, 2015 at 17:25 Comment(3)
Can you add some explanation to your answer?Persis
@Vestavestal AUC can be calculated as auc = performance(pred, "auc")Leann
@SIslam Thank you for your comment! question's title is AUC and instead everybody is talking about ROC. They are related concepts, but not the same.Swap

© 2022 - 2024 — McMap. All rights reserved.