MNLogit in statsmodel returning nan
Asked Answered
B

1

6

I'm trying to use statsmodels' MNLogit function on the famous iris data set. I get: "Current function value: nan" when I try to fit a model. Here is the code I am using:

import statsmodels.api as st
iris = st.datasets.get_rdataset('iris','datasets')
y = iris.data.Species
x = iris.data.ix[:, 0:4]
x = st.add_constant(x, prepend = False)
mdl = st.MNLogit(y, x)
mdl_fit = mdl.fit()
print (mdl_fit.summary())
Belie answered 20/7, 2015 at 0:29 Comment(0)
R
6

In the iris example we can perfectly predict Setosa. This causes problems with (partial) perfect separation in Logit and MNLogit.

Perfect separation is good for prediction, but the parameters of logit go to infinity. In this case I get a Singular Matrix error instead of Nans with a relatively recent version of statsmodels master (on Windows).

The default optimizer for the discrete models is Newton which fails when the Hessian becomes singular. Other optimizers that don't use the information from the Hessian are able to finish the optimization. For example using 'bfgs', I get

>>> mdl_fit = mdl.fit(method='bfgs')
Warning: Maximum number of iterations has been exceeded.
         Current function value: 0.057112
         Iterations: 35
         Function evaluations: 37
         Gradient evaluations: 37
e:\josef\eclipsegworkspace\statsmodels-git\statsmodels-all-new2_py27\statsmodels\statsmodels\base\model.py:471: ConvergenceWarning: Maximum Likelihood optimization failed to converge. Check mle_retvals
  "Check mle_retvals", ConvergenceWarning)

The predicted probabilities for Setosa are essentially (1, 0, 0), that is they are perfectly predicted

>>> fitted = mdl_fit.predict()
>>> fitted[y=='setosa'].min(0)
array([  9.99497636e-01,   2.07389867e-11,   1.71740822e-38])
>>> fitted[y=='setosa'].max(0)
array([  1.00000000e+00,   5.02363854e-04,   1.05778255e-20])

However, because of perfect separation the parameters are not identified, the values are determined mostly by the stopping criterion of the optimizer and the standard errors are very large.

>>> print(mdl_fit.summary())
                          MNLogit Regression Results                          
==============================================================================
Dep. Variable:                Species   No. Observations:                  150
Model:                        MNLogit   Df Residuals:                      140
Method:                           MLE   Df Model:                            8
Date:                Mon, 20 Jul 2015   Pseudo R-squ.:                  0.9480
Time:                        04:08:04   Log-Likelihood:                -8.5668
converged:                      False   LL-Null:                       -164.79
                                        LLR p-value:                 9.200e-63
=====================================================================================
Species=versicolor       coef    std err          z      P>|z|      [95.0% Conf. Int.]
--------------------------------------------------------------------------------------
Sepal.Length          -1.4959    444.817     -0.003      0.997      -873.321   870.330
Sepal.Width           -8.0560    282.766     -0.028      0.977      -562.267   546.155
Petal.Length          11.9301    374.116      0.032      0.975      -721.323   745.184
Petal.Width            1.7039    759.366      0.002      0.998     -1486.627  1490.035
const                  1.6444   1550.515      0.001      0.999     -3037.309  3040.597
--------------------------------------------------------------------------------------
Species=virginica       coef    std err          z      P>|z|      [95.0% Conf. Int.]
-------------------------------------------------------------------------------------
Sepal.Length         -8.0348    444.835     -0.018      0.986      -879.896   863.827
Sepal.Width         -15.8195    282.793     -0.056      0.955      -570.083   538.444
Petal.Length         22.1797    374.155      0.059      0.953      -711.152   755.511
Petal.Width          14.0603    759.384      0.019      0.985     -1474.304  1502.425
const                -6.5053   1550.533     -0.004      0.997     -3045.494  3032.483
=====================================================================================

About the implementation in statsmodels

Logit checks specifically for perfect separation and raises an Exception that can optionally be weakened to a Warning. For other models like MNLogit, there is not yet an explicit check for perfect separation, largely for the lack of good test cases and easily identifiable general conditions. (several issues like https://github.com/statsmodels/statsmodels/issues/516 are still open)

My strategy in general:

When there is a convergence failure, then try different optimizers and different starting values (start_params). If some optimizers succeed, then it might be a difficult optimization problem, either with the curvature of the objective function, badly scaled explanatory variables or similar. A useful check is to use the parameter estimates of robust optimizers like nm or powell as starting values for the optimizers that are more strict, like newton or bfgs.

If the results are still not good after convergence of some optimizers, then it might be an inherent problem with the data like perfect separation in Logit, Probit and several other models or a singular or near singular design matrix. In that case the model has to be changed. Recommendation for perfect separation can be found with a internet search.

Rambutan answered 20/7, 2015 at 8:28 Comment(4)
Got it. Not a problem with Matlab's mnrfit, but something I need to handle with MNLogit. Thanks for helping me understand and bound the problem.Belie
I don't know what Matlab does, but returning random or arbitrary unidentified results is not a good approach for a statistics package, although statsmodels also still does it in some cases. I only found this related to complete separation in matlab: mathworks.com/matlabcentral/newsreader/view_thread/303156 (Hiding the problem doesn't mean it has gone away.)Rambutan
But why it returns params only for two classes instead of 3?Lorenelorens
@Lorenelorens That's a different question. Probabilities have to add up to 1. If we know the probabilities of all but one class, then the probability for the remaining class is just 1 minus probability of all others. So we cannot have separate params for the last class because it is already determined by this restriction. Note, predict still returns the probabilities for all classes.Rambutan

© 2022 - 2024 — McMap. All rights reserved.