I'm trying to run a logistic regression in statsmodels on a large design matrix (~200 columns). The features include a number of interactions, categorical features and semi-sparse (70%) integer features. Although my design matrix is not actually ill-conditioned, it seems to be somewhat close (according to numpy.linalg.matrix_rank
, it is full-rank with tol=1e-3
but not with tol=1e-2
). As a result, I'm struggling to get logistic regression to converge with any of the methods in statsmodels. Here's what I've tried so far:
method='newton'
: Did not converge after 1000 iterations; raised a singular matrixLinAlgError
while trying to invert the Hessian.method='bfgs'
: Warned of possible precision loss. Claimed convergence after 0 iterations, obviously had not actually converged.method='nm'
: Claimed that it had converged, but model had a negative pseudo-R-squared and many coefficients were still zero (and very different from values they had converged to with better-conditioned submodels). I tried cranking downxtol
to1e-8
to no avail.fit_regularized(method='l1')
: reportedInequality constraints incompatible (Exit mode 4)
. Then raised a singular matrixLinAlgError
while trying to compute the restricted Hessian inverse.