statsmodels logistic regression odds ratio
Asked Answered
M

3

12

I'm wondering how can I get odds ratio from a fitted logistic regression models in python statsmodels.

>>> import statsmodels.api as sm
>>> import numpy as np
>>> X = np.random.normal(0, 1, (100, 3))
>>> y = np.random.choice([0, 1], 100)
>>> res = sm.Logit(y, X).fit()
Optimization terminated successfully.
         Current function value: 0.683158
         Iterations 4
>>> res.summary()
<class 'statsmodels.iolib.summary.Summary'>
"""
                           Logit Regression Results                           
==============================================================================
Dep. Variable:                      y   No. Observations:                  100
Model:                          Logit   Df Residuals:                       97
Method:                           MLE   Df Model:                            2
Date:                Sun, 05 Jun 2016   Pseudo R-squ.:                0.009835
Time:                        23:25:06   Log-Likelihood:                -68.316
converged:                       True   LL-Null:                       -68.994
                                        LLR p-value:                    0.5073
==============================================================================
                 coef    std err          z      P>|z|      [95.0% Conf. Int.]
------------------------------------------------------------------------------
x1            -0.0033      0.181     -0.018      0.985        -0.359     0.352
x2             0.0565      0.213      0.265      0.791        -0.362     0.475
x3             0.2985      0.216      1.380      0.168        -0.125     0.723
==============================================================================
"""
>>> 
Multiplex answered 5/6, 2016 at 22:26 Comment(9)
According to the site OR=np.exp(res.params) . I'm not 100% sure that that formula is rightMultiplex
Is your question about the math of how to get the odds ratio, or the programming of how to get it from statsmodels. See for instance the very end of this page, which says "The end result of all the mathematical manipulations is that the odds ratio can be computed by raising e to the power of the logistic coefficient".Stronghold
The point is that I'm not sure that this is true in multivariate regression. i.e. If more than one input variable is used.Multiplex
If your question is about the stats involved, you're probably better off asking on Cross Validation.Stronghold
I did some time ago stats.stackexchange.com/questions/208136/…. This is why I think the formula is wrong.Multiplex
@Multiplex I'm not sure what that answer means. oddsratios are exp(params) in Logit, and you can get the confidence interval for the oddsratios by endpoint transformation by just using exp(confint()) where confint is for the estimated parameters.Neukam
see for example Stata's eform stata.com/manuals14/rglm.pdf which has the interpretation for Logit, Poisson, and similar applies to a few more other models that are based on an exp transformation, eg. hazard ratio, IIRC.Neukam
can you confirm OR=exp(coef) in multivariate logistic regression?Multiplex
Yes, that's what I'm saying, confirmed (because exp makes it multiplicative so other terms cancel in the ratio). However, oddsratio is usually used for binary 0-1 regressors, otherwise you would have to look at the interpretation of the effect of a unit change or of the slope effect of a continuous variable.Neukam
C
23

You can get the odds ratio with:

np.exp(res.params)

To also get the confidence intervals (source):

params = res.params
conf = res.conf_int()
conf['Odds Ratio'] = params
conf.columns = ['5%', '95%', 'Odds Ratio']
print(np.exp(conf))

Disclaimer: I've just put together the comments to your question.

Crowe answered 10/12, 2017 at 16:22 Comment(2)
I think you forgot to use np.exp(res.params) when assigning params as odds ratios in your code block.Linnette
Hi applied the exponent in the printAccompaniment
T
2

Not sure about statsmodels, to do it in sklearn:

from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split

x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=1)

logisticRegr = LogisticRegression()
logisticRegr.fit(x_train, y_train)

df=pd.DataFrame({'odds_ratio':(np.exp(logisticRegr.coef_).T).tolist(),'variable':x.columns.tolist()})
df['odds_ratio'] = df['odds_ratio'].str.get(0)

df=df.sort_values('odds_ratio', ascending=False)
df
Terpineol answered 8/1, 2021 at 14:7 Comment(1)
watch out! sklearn uses a regularized regression by default which biases the coef_ numbers. best to use statsmodels if your primary interest is the model coefficients as opposed to the model predictions.Upkeep
L
0

As an option basically equivalent to lincolnfrias' one, but maybe more handy (and directly usable in stargazer tables), consider the following:

from stargazer.utils import LogitOdds

odds = LogitOdds(original_logit_model)

see this stargazer issue for more background.

Loreeloreen answered 13/10, 2021 at 14:54 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.