How to plot a ROC curve using ROCR package in r, *with only a classification contingency table*

You cannot generate the full ROC curve with a single contingency table because a contingency table provides only a single sensitivity/specificity pair (for whatever predictive cutoff was used to generate the contingency table).

If you had many contingency tables that were generated with different cutoffs, you would be able to approximate the ROC curve (basically it will be a linear interpolation between the sensitivity/specificity values in your contingency tables). As an example, let's consider predicting whether a flower is versicolor in the iris dataset using logistic regression:

iris$isv <- as.numeric(iris$Species == "versicolor")
mod <- glm(isv~Sepal.Length+Sepal.Width, data=iris, family="binomial")

We could use the standard ROCR code to compute the ROC curve for this model:

library(ROCR)
pred1 <- prediction(predict(mod), iris$isv)
perf1 <- performance(pred1,"tpr","fpr")
plot(perf1)

enter image description here

Now let's assume that instead of mod all we have is contingency tables with a number of cutoffs values for predictions:

tables <- lapply(seq(0, 1, .1), function(x) table(iris$isv, factor(predict(mod, type="response") >= x, levels=c(F, T))))

# Predict TRUE if predicted probability at least 0
tables[[1]]
#     FALSE TRUE
#   0     0  100
#   1     0   50

# Predict TRUE if predicted probability at least 0.5
tables[[6]]
#     FALSE TRUE
#   0    86   14
#   1    29   21

# Predict TRUE if predicted probability at least 1
tables[[11]]
#     FALSE TRUE
#   0   100    0
#   1    50    0

From one table to the next some predictions changed from TRUE to FALSE due to the increased cutoff, and by comparing column 1 of the successive table we can determine which of these represent true negative and false negative predictions. Iterating through our ordered list of contingency tables we can create fake predicted value/outcome pairs that we can pass to ROCR, ensuring that we match the sensitivity/specificity for each contingency table.

fake.info <- do.call(rbind, lapply(1:(length(tables)-1), function(idx) {
  true.neg <- tables[[idx+1]][1,1] - tables[[idx]][1,1]
  false.neg <- tables[[idx+1]][2,1] - tables[[idx]][2,1]
  if (true.neg <= 0 & false.neg <= 0) {
    return(NULL)
  } else {
    return(data.frame(fake.pred=idx,
                      outcome=rep(c(0, 1), times=c(true.neg, false.neg))))
  }
}))

Now we can pass the faked predictions to ROCR as usual:

pred2 <- prediction(fake.info$fake.pred, fake.info$outcome)
perf2 <- performance(pred2,"tpr","fpr")
plot(perf2)

enter image description here

Basically what we have done is a linear interpolation of the points that we do have on the ROC curve. If you had contingency tables for many cutoffs you could more closely approximate the true ROC curve. If you don't have a wide range of cutoffs you can't hope to accurately reproduce the full ROC curve.

Recommended topics

Hot tags