Calculating marginal effects in Python with statsmodels' get_margeff command in logit model with interaction terms
Asked Answered
G

0

11

I encountered a problem when working with statsmodels' get_margeff command for a logit model with interaction terms. While in a main effects models the effects are correctly calculated and correspond to Stata and R results, this is not the case when interaction terms are involved. Here the effects are wrong and also a marginal effect for the interaction term is reported which does not make sense. The following code illustrates that:

import pandas as pd
import statsmodels.formula.api as sm
import statsmodels.api as sm2

df=sm2.datasets.heart.load_pandas().data

regression = sm.logit(formula='censors~survival+age', data=df).fit()   
#only for illustration purposes; does not make real sense 

print(regression.get_margeff().summary()) 
# the calculation of marginal effects here is corrects and corresponds to Stata and R results

                dy/dx    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
survival      -0.0004   7.95e-05     -4.672      0.000      -0.001      -0.000
age            0.0148      0.005      3.262      0.001       0.006       0.024
==============================================================================
regression = sm.logit(formula='censors~survival+age+survival*age', data=df).fit() 
print(regression.get_margeff().summary()) 
## effects for survival and age are not correct and a marginal effect for survival:age is reported which does not make sense
================================================================================
                  dy/dx    std err          z      P>|z|      [0.025      0.975]
--------------------------------------------------------------------------------
survival        -0.0009      0.001     -1.040      0.298      -0.003       0.001
age              0.0120      0.006      1.857      0.063      -0.001       0.025
survival:age   1.08e-05    1.8e-05      0.599      0.549   -2.45e-05    4.61e-05
================================================================================

Does anybody know how this can be fixed, so that the marginal effects for survival and age [used only for illustration purposes here] in the second model correspond to Stata and R results?

EDIT, Apr. 11:

In response to user "StupidWolf" here are the respective Stata results:

use "heart.dta"
qui logit censors survival age
margins, dydx(*)


Average marginal effects                        Number of obs     =         69
Model VCE    : OIM

Expression   : Pr(censors), predict()
dy/dx w.r.t. : survival age

------------------------------------------------------------------------------
             |            Delta-method
             |      dy/dx   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
    survival |  -.0003716   .0000795    -4.67   0.000    -.0005275   -.0002157
         age |    .014813   .0045409     3.26   0.001     .0059131    .0237129
------------------------------------------------------------------------------


qui logit censors survival age c.survival#c.age
margins, dydx(*)


Average marginal effects                        Number of obs     =         69
Model VCE    : OIM

Expression   : Pr(censors), predict()
dy/dx w.r.t. : survival age

------------------------------------------------------------------------------
             |            Delta-method
             |      dy/dx   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
    survival |  -.0003816   .0000814    -4.68   0.000    -.0005412   -.0002219
         age |   .0162289   .0051163     3.17   0.002     .0062012    .0262567
------------------------------------------------------------------------------

There is a broad discussion on why marginal effects should not be calculated for interaction terms, c.f. for instance: https://www3.nd.edu/~rwilliam/stats/Margins01.pdf https://www.stata.com/statalist/archive/2013-01/msg00293.html

Georgenegeorges answered 10/4, 2020 at 20:34 Comment(3)
how is it different from R and why do you think it is wrong in statsmodel? In any case without the comparison or the data.. it's hard to imagine the problemAntidisestablishmentarianism
I added the information on the Stata results aboveGeorgenegeorges
statsmodels margins currently do not handle effects coming from several columns of the design matrix, only effects of individual columns are computed.Ceasefire

© 2022 - 2024 — McMap. All rights reserved.