Statsmodels logistic regression convergence problems - McMap

About

Statsmodels logistic regression convergence problems

Asked 11/12, 2014 at 1:24 Answered 11/12, 2014 at 1:24

numpy mathematical-optimization numerical-methods logistic-regression statsmodels

T

0

6

I'm trying to run a logistic regression in statsmodels on a large design matrix (~200 columns). The features include a number of interactions, categorical features and semi-sparse (70%) integer features. Although my design matrix is not actually ill-conditioned, it seems to be somewhat close (according to numpy.linalg.matrix_rank, it is full-rank with tol=1e-3 but not with tol=1e-2). As a result, I'm struggling to get logistic regression to converge with any of the methods in statsmodels. Here's what I've tried so far:

method='newton': Did not converge after 1000 iterations; raised a singular matrix LinAlgError while trying to invert the Hessian.
method='bfgs': Warned of possible precision loss. Claimed convergence after 0 iterations, obviously had not actually converged.
method='nm': Claimed that it had converged, but model had a negative pseudo-R-squared and many coefficients were still zero (and very different from values they had converged to with better-conditioned submodels). I tried cranking down xtol to 1e-8 to no avail.
fit_regularized(method='l1'): reported Inequality constraints incompatible (Exit mode 4). Then raised a singular matrix LinAlgError while trying to compute the restricted Hessian inverse.

Treasury answered 11/12, 2014 at 1:24 Comment(10)

Can you share your data somewhere? – Advanced 12/12, 2014 at 3:46

Alas, no; it's proprietary. – Treasury 12/12, 2014 at 4:45

I found that standardizing the data helped with the convergence issues. This is an ok solution; I can't use formulas with it (because centering each column in the formula is a pain) and it makes the coefficients harder to interpret, but it at least gets it to converge. – Treasury 12/12, 2014 at 4:49

The above "solution" also still failed when I added an 18-level categorical feature. There's some chance that this was due to actual collinearity, although I doubt it. I'll try to create example (random) data that exhibits the problem tomorrow. – Treasury 12/12, 2014 at 4:51

Not wholly surprised. We have some code to do this internally but it's not hooked up by default yet. – Advanced 12/12, 2014 at 4:51

Ah, excellent! That would make life a lot easier. Thanks for your excellent work on statsmodels--I know I'm asking a lot of it! – Treasury 12/12, 2014 at 4:52

Do you use the parameters of the smaller model as starting values for the larger model when you add variables/terms? – Fascista 12/12, 2014 at 14:6

It would be helpful if you have a ready example that fails and standardization makes it work. We could add it to the documentation. – Advanced 15/12, 2014 at 0:20

I tried to reproduce the problem with publicly-available code and data. I didn't get all the way there in the time I allotted, but I at least managed to reproduce some of the parts of it here. Namely, I found that when using a spline basis the matrix appeared to be full-rank only for very low tolerances, and that logistic regression appeared to converge but gave meaningless confidence intervals and p-values. However, standardization didn't make this one work any better. – Treasury 15/12, 2014 at 21:0

A similar question in Cross Validated – Jarvisjary 26/3 at 10:10

Recommended topics

#Godot #Unity #Godot 4.X #Mongodb

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

© 2022 - 2024 — McMap. All rights reserved.