statsmodels logistic regression odds ratio

Asked 5/6, 2016 at 22:26 Answered 13/10, 2021 at 14:54

Solved python logistic-regression statsmodels

I'm wondering how can I get odds ratio from a fitted logistic regression models in python statsmodels.

>>> import statsmodels.api as sm
>>> import numpy as np
>>> X = np.random.normal(0, 1, (100, 3))
>>> y = np.random.choice([0, 1], 100)
>>> res = sm.Logit(y, X).fit()
Optimization terminated successfully.
         Current function value: 0.683158
         Iterations 4
>>> res.summary()
<class 'statsmodels.iolib.summary.Summary'>
"""
                           Logit Regression Results                           
==============================================================================
Dep. Variable:                      y   No. Observations:                  100
Model:                          Logit   Df Residuals:                       97
Method:                           MLE   Df Model:                            2
Date:                Sun, 05 Jun 2016   Pseudo R-squ.:                0.009835
Time:                        23:25:06   Log-Likelihood:                -68.316
converged:                       True   LL-Null:                       -68.994
                                        LLR p-value:                    0.5073
==============================================================================
                 coef    std err          z      P>|z|      [95.0% Conf. Int.]
------------------------------------------------------------------------------
x1            -0.0033      0.181     -0.018      0.985        -0.359     0.352
x2             0.0565      0.213      0.265      0.791        -0.362     0.475
x3             0.2985      0.216      1.380      0.168        -0.125     0.723
==============================================================================
"""
>>>

Multiplex answered 5/6, 2016 at 22:26 Comment(9)

According to the site OR=np.exp(res.params) . I'm not 100% sure that that formula is right – Multiplex 5/6, 2016 at 22:53

Is your question about the math of how to get the odds ratio, or the programming of how to get it from statsmodels. See for instance the very end of this page, which says "The end result of all the mathematical manipulations is that the odds ratio can be computed by raising e to the power of the logistic coefficient". – Stronghold 5/6, 2016 at 23:0

The point is that I'm not sure that this is true in multivariate regression. i.e. If more than one input variable is used. – Multiplex 5/6, 2016 at 23:37

If your question is about the stats involved, you're probably better off asking on Cross Validation. – Stronghold 5/6, 2016 at 23:41

I did some time ago stats.stackexchange.com/questions/208136/…. This is why I think the formula is wrong. – Multiplex 6/6, 2016 at 0:7

@Multiplex I'm not sure what that answer means. oddsratios are exp(params) in Logit, and you can get the confidence interval for the oddsratios by endpoint transformation by just using exp(confint()) where confint is for the estimated parameters. – Neukam 6/6, 2016 at 0:12

see for example Stata's eform stata.com/manuals14/rglm.pdf which has the interpretation for Logit, Poisson, and similar applies to a few more other models that are based on an exp transformation, eg. hazard ratio, IIRC. – Neukam 6/6, 2016 at 0:16

can you confirm OR=exp(coef) in multivariate logistic regression? – Multiplex 6/6, 2016 at 0:19

Yes, that's what I'm saying, confirmed (because exp makes it multiplicative so other terms cancel in the ratio). However, oddsratio is usually used for binary 0-1 regressors, otherwise you would have to look at the interpretation of the effect of a unit change or of the slope effect of a continuous variable. – Neukam 6/6, 2016 at 0:27

You can get the odds ratio with:

np.exp(res.params)

To also get the confidence intervals (source):

params = res.params
conf = res.conf_int()
conf['Odds Ratio'] = params
conf.columns = ['5%', '95%', 'Odds Ratio']
print(np.exp(conf))

Disclaimer: I've just put together the comments to your question.

Crowe answered 10/12, 2017 at 16:22 Comment(2)

I think you forgot to use np.exp(res.params) when assigning params as odds ratios in your code block. – Linnette 28/6, 2022 at 15:35

Hi applied the exponent in the print – Accompaniment 31/7, 2022 at 17:57

Not sure about statsmodels, to do it in sklearn:

from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split

x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=1)

logisticRegr = LogisticRegression()
logisticRegr.fit(x_train, y_train)

df=pd.DataFrame({'odds_ratio':(np.exp(logisticRegr.coef_).T).tolist(),'variable':x.columns.tolist()})
df['odds_ratio'] = df['odds_ratio'].str.get(0)

df=df.sort_values('odds_ratio', ascending=False)
df

Terpineol answered 8/1, 2021 at 14:7 Comment(1)

watch out! sklearn uses a regularized regression by default which biases the coef_ numbers. best to use statsmodels if your primary interest is the model coefficients as opposed to the model predictions. – Upkeep 30/1, 2023 at 20:7

As an option basically equivalent to lincolnfrias' one, but maybe more handy (and directly usable in stargazer tables), consider the following:

from stargazer.utils import LogitOdds

odds = LogitOdds(original_logit_model)

see this stargazer issue for more background.

Loreeloreen answered 13/10, 2021 at 14:54 Comment(0)

Recommended topics

Hot tags