NaN, inf or invalid value detected in weights detected error when training statsmodels GLM model
Asked Answered
H

1

7

I am using my data to train a GLM model (poisson family) using python statsmodels package. The data I have contains both numeric values and categorical values. I did standardization for numeric values and one-hot-encoding on categorical values (drop the first level). When I fit the data into the model, I got the following exceptions:

~/miniconda3/envs/losscost/lib/python3.7/site-packages/insite/losscost/losscost.py in evaluate(self, x, control, peril_descs)
    271                     family=sm.families.Poisson(link=sm.families.links.log()),
    272                 )
--> 273                 freq_fitted = freq_glm.fit()
    274                 freq_results[name].append(freq_fitted)
    275 

~/miniconda3/envs/losscost/lib/python3.7/site-packages/statsmodels/genmod/generalized_linear_model.py in fit(self, start_params, maxiter, method, tol, scale, cov_type, cov_kwds, use_t, full_output, disp, max_start_irls, **kwargs)
   1025             return self._fit_irls(start_params=start_params, maxiter=maxiter,
   1026                                   tol=tol, scale=scale, cov_type=cov_type,
-> 1027                                   cov_kwds=cov_kwds, use_t=use_t, **kwargs)
   1028         else:
   1029             self._optim_hessian = kwargs.get('optim_hessian')

~/miniconda3/envs/losscost/lib/python3.7/site-packages/statsmodels/genmod/generalized_linear_model.py in _fit_irls(self, start_params, maxiter, tol, scale, cov_type, cov_kwds, use_t, **kwargs)
   1163             wls_mod = reg_tools._MinimalWLS(wlsendog, wlsexog,
   1164                                             self.weights, check_endog=True,
-> 1165                                             check_weights=True)
   1166             wls_results = wls_mod.fit(method=wls_method)
   1167             lin_pred = np.dot(self.exog, wls_results.params)

~/miniconda3/envs/losscost/lib/python3.7/site-packages/statsmodels/regression/_tools.py in __init__(self, endog, exog, weights, check_endog, check_weights)
     46         if check_weights:
     47             if not np.all(np.isfinite(w_half)):
---> 48                 raise ValueError(self.msg.format('weights'))
     49 
     50         if check_endog:

ValueError: NaN, inf or invalid value detected in weights, estimation infeasible.

I tried to train only on numeric values and it works fine. What could be the reason to cause this issue?

Hypotaxis answered 8/7, 2020 at 17:38 Comment(1)
Hi there, this looks like a question for Stack Overflow (the programming site) rather than Cross Validated (the statistics site). Being on both sites, I know from experience that when this is asked on Stack Overflow, they will ask for both 1. data you are using (a subset will be fine, or a toy version) and 2. what commands you used.Cadge
T
7

Can you add a keyword argument to your fit call like this and see if it helps:

model = sm.GLM(...)
model.fit(method="lbfgs")

I think the Inf/NaN is in the IRLS weights. IRLS is slightly less robust than direct optimization.

Also, make sure your design matrix is not singular:

model = sm.GLM(...)
u, s, vt = numpy.linalg.svd(model.exog, 0)
print(s)

All elements of s (the singular values) should be strictly positive.

If you continue to have troubles, what is the sample size and dimension of your model?

Tell answered 9/7, 2020 at 2:27 Comment(2)
Thank you for your answer! If I change to use lbfgs method, the model throw out convergence warnings and seems never converge. For the design matrix, it is not singular, but two values are very small (ie. 1.24707074e-10). BTW, data size is (271976, 86)Hypotaxis
Your design matrix is nearly singular. You could use: model.fit_regularized(L1_wt=0, alpha=0.1) or similar to fit with ridge penalty. Or perhaps there are a few columns with nearly zero variance that could be dropped.Tell

© 2022 - 2024 — McMap. All rights reserved.