How to get p value after ROC analysis with pRoc package?
Asked Answered
L

1

5

After ROC analysis of a set of data, how to calculate p-value? With the same statistics, I saw that the p-value can be output in SPSS. The sample code is as follows:

library(pROC)
data(aSAH)
head(aSAH)
#    gos6 outcome gender age wfns s100b  ndka
# 29    5    Good Female  42    1  0.13  3.01
# 30    5    Good Female  37    1  0.14  8.54
# 31    5    Good Female  42    1  0.10  8.09
# 32    5    Good Female  27    1  0.04 10.42
# 33    1    Poor Female  42    3  0.13 17.40
# 34    1    Poor   Male  48    2  0.10 12.75

(rr <- roc(aSAH$outcome, aSAH$s100b, plot=T))
# Setting levels: control = Good, case = Poor
# Setting direction: controls < cases
# 
# Call:
#   roc.default(response = aSAH$outcome, predictor = aSAH$s100b,     plot = F)
# 
# Data: aSAH$s100b in 72 controls (aSAH$outcome Good) < 41 cases (aSAH$outcome Poor).
# Area under the curve: 0.7314

Edit:

The p value calculated in SPSS is 0.000007, but the p-value calculated by verification::roc.area() is 0.000022546, is the calculation method of roc.area() and SPSS inconsistent?

levels(aSAH$outcome) <- c(0, 1)
library(verification)
ra <- roc.area(as.numeric(as.vector(aSAH$outcome)), rr$predictor)
ra$p.value
# [1] 0.00002254601
Loreanloredana answered 25/5, 2020 at 7:12 Comment(2)
You shouldn't modify your question significantly, especially after you received an answer. If you have a followup question about the difference between roc.area and SPSS, that's a new follow-up question, so you should ask a new question.Sororicide
@Sororicide Thank you for your concern about the question's quality. However, actually the OP made the edit for valid further clarification and announced it to me in a comment to my answer. Thereafter I tried to triage them to CV. Since their question there was downvoted an closed as off-topic I have taken the liberty of roll back your rollback and answering the question here.Hanus
H
7

There's no option to get the p-value in pROC::roc, you may set option ci=TRUE to get confidence intervals instead. pROC::roc yields an invisible output that you may grab by assigning it to an object.

library(pROC)
data(aSAH)
rr <- pROC::roc(aSAH$outcome, aSAH$s100b, ci=TRUE)

Using str(rr) reveals how to access the ci:

rr$ci
# 95% CI: 0.6301-0.8326 (DeLong)

So you have already a confidence interval.

Moreover, you can also get the variance, using pROC::var*, from which you could calculate a standard error manually.

(v <- var(rr))
# [1] 0.002668682
b <- rr$auc - .5
se <- sqrt(v)
(se <- sqrt(v))
# [1] 0.05165929

* Note, that there is also a bootstrap option pROC::var(rr, method="bootstrap").

This is identical to the one calculated by Stata,

# . roctab outcome_num s100b, summary
# 
# ROC                    -Asymptotic Normal--
#   Obs       Area     Std. Err.      [95% Conf. Interval]
# ------------------------------------------------------------
#   113     0.7314       0.0517        0.63012     0.83262
# .
# . display r(se)
# .05165929

where Stata Base Reference Manual 14 - roctab (p. 2329) states:

By default, roctab calculates the standard error for the area under the curve by using an algorithm suggested by DeLong, DeLong, and Clarke-Pearson (1988) and asymptotic normal confidence intervals.

Once we have the standard error, we also may calculate a p-value based on the z-distribution (Ref.).

z <- (b / se)
2 * pt(-abs(z), df=Inf)  ## two-sided test
# [1] 0.000007508474

This p-value is close to your SPSS value, so it's likely that it's calculated with an algorithm similar to Stata (compare: IBM SPSS Statistics 24 Algorithms, p. 888:889).

However, the calculation of the p value of a ROC analysis might be controversial. E.g. the method you show in your edit (see also first link below) is based on a Mann–Whitney U-statistic.

You might want to dig a little deeper into the subject before you decide which method is best suited for your analysis. I provide you with some reading suggestions here:

Hanus answered 25/5, 2020 at 7:51 Comment(3)
Thanks for jay.sf, but the p-value I got from roc.area () in the verification package is inconsistent with the p-value in SPSS. The p-value calculated in SPSS is 0.000007, but the p-value calculated by roc.area () is 0.000022546, is the calculation method of roc.area () and SPSS inconsistent?Loreanloredana
@Loreanloredana This is indeed an interesting question, probably SPSS and roc.area() are using different methods to calculate the p-value, can't tell you more. Since this is now mainly a statistical issue, I suggest to ask on Cross Validated a question about the reasons for the difference in p-values.Hanus
Thank you very much for your patience and the related information you provided. This is the best answer I have obtained so far. Maybe I can write a function to get p value by the above code.Loreanloredana

© 2022 - 2024 — McMap. All rights reserved.