Is there a good McNemar's test implemented in Python? I don't see it anywhere in Scipy.stats or Scikit-Learn. I may have overlooked some other good packages. Please recommend.
McNemar's Test is almost THE test for comparing two classification algorithms/models given a holdout test set (not through K-fold or resampling method to mimic a test set). Two common alternatives are: t-test for comparing directly true positive proportions p_A
and p_B
from two algorithms and models A
and B
by 1) assuming variances follow binomial distributions or 2) estimating variances using repetitive resampling train and test sets.
The latter two, however, have been shown having high Type-1 errors (claims models are statistically different but in essence they are the same). McNemar's test is still considered the best if comparing two classification algorithms or models. See Dietterich10.
Or as an alternative, how do people statistically compare two classification models in practice if not by McNemar's test?