Anova test for GLM in python

S

3

7

I am trying to get the F-statistic and p-value for each of the covariates in GLM. In Python I am using the stats mode.formula.api to conduct the GLM.

formula = 'PropNo_Pred ~ Geography + log10BMI + Cat_OpCavity + CatLes_neles + CatRural_urban + \
        CatPred_Control + CatNative_Intro + Midpoint_of_study'

mod1 = smf.glm(formula=formula, data=A2, family=sm.families.Binomial()).fit()
mod1.summary()

After that I am trying to do the ANOVA test for this model using the anova in statsmodels.stats

table1 = anova_lm(mod3)
print table1

However I am getting an error saying: 'GLMResults' object has no attribute 'ssr'

Looks like this anova_lm function only applies to linear model is there a module in python that does anova test for GLMs?

Sundaysundberg answered 6/12, 2014 at 5:28 Comment(1)

a preliminary answer for ANOVA type 3 kind of result groups.google.com/d/msg/pystatsmodels/qQxWdSi_fQk/0O3eAgINYhkJ – Othaothe 7/12, 2014 at 14:13

T

3

There isn't, unfortunately. However, you can roll your own by using the model's hypothesis testing methods on each of the terms. In fact, some of their ANOVA methods do not even use the attribute ssr (which is the model's sum of squared residuals, thus obviously undefined for a binomial GLM). You could probably modify this code to do a GLM ANOVA.

Tiddlywinks answered 7/12, 2014 at 3:40 Comment(1)

It would be awesome if you could add an example of how to use the hypothesis testing methods for the data in the experiment. – Congener 24/2, 2015 at 11:27

S

6

Here is my attempt to roll your own.

The F-statistic for nested models is defined as:

(D_s - D_b ) / (addtl_parameters * phi_b)

Where:

D_s is deviance of small model
D_b is deviance of larger ("big)" model
addtl_parameters is the difference in degrees of freedom between models.
phi_b is the estimate of dispersion parameter for the larger model'

"Statistical theory says that the F-statistic follows an F distribution, with a numerator degrees of freedom equal to the number of added parameters and a denominator degrees of freedom equal to n - p_b, or the number of records minus the number of parameters in the big model."

We translate this into code with:

from scipy import stats

def calculate_nested_f_statistic(small_model, big_model):
    """Given two fitted GLMs, the larger of which contains the parameter space of the smaller, return the F Stat and P value corresponding to the larger model adding explanatory power"""
    addtl_params = big_model.df_model - small_model.df_model
    f_stat = (small_model.deviance - big_model.deviance) / (addtl_params * big_model.scale)
    df_numerator = addtl_params
    # use fitted values to obtain n_obs from model object:
    df_denom = (big_model.fittedvalues.shape[0] - big_model.df_model)
    p_value = stats.f.sf(f_stat, df_numerator, df_denom)
    return (f_stat, p_value)

Here is a reproducible example, following the gamma GLM example in statsmodels (https://www.statsmodels.org/stable/glm.html):

import numpy as np
import statsmodels.api as sm
data2 = sm.datasets.scotland.load()
data2.exog = sm.add_constant(data2.exog, prepend=False)

big_model = sm.GLM(data2.endog, data2.exog, family=sm.families.Gamma()).fit()
# Drop one covariate (column):
smaller_model = sm.GLM(data2.endog, data2.exog[:, 1:], family=sm.families.Gamma()).fit()

# Using function defined in answer:
calculate_nested_f_statistic(smaller_model, big_model)
# (9.519052917304652, 0.004914748992474178)

Source: https://www.casact.org/pubs/monographs/papers/05-Goldburd-Khare-Tevet.pdf

Susi answered 20/3, 2020 at 6:8 Comment(0)

T

3

There isn't, unfortunately. However, you can roll your own by using the model's hypothesis testing methods on each of the terms. In fact, some of their ANOVA methods do not even use the attribute ssr (which is the model's sum of squared residuals, thus obviously undefined for a binomial GLM). You could probably modify this code to do a GLM ANOVA.

Tiddlywinks answered 7/12, 2014 at 3:40 Comment(1)

It would be awesome if you could add an example of how to use the hypothesis testing methods for the data in the experiment. – Congener 24/2, 2015 at 11:27

D

-1

Thanks for the answer but i think there is a small finger mistake the function should be

def calculate_nested_f_statistic(small_model, big_model):

"""Given two fitted GLMs, the larger of which contains the parameter space of the smaller, return the F Stat and P value corresponding to the larger model adding explanatory power"""

   addtl_params = small_model.df_model - big_model.df_model 

   f_stat = (small_model.deviance - big_model.deviance) / (addtl_params * big_model.scale)

   df_numerator = addtl_params

   # use fitted values to obtain n_obs from model object:

   df_denom = (big_model.fittedvalues.shape[0] - big_model.df_model)

   p_value = stats.f.sf(f_stat, df_numerator, df_denom)

   return (f_stat, p_value)

Dumuzi answered 24/5, 2023 at 15:8 Comment(3)

did you run this? this should raise an exception – Othaothe 24/5, 2023 at 16:20

Hello, please don't just submit code in your answer(s), add some details as to why you think this is the optimal solution. – Zeta 26/5, 2023 at 17:29

Yes i run it but i ll again later, the answer aboce was good I just inversed the 2 parts of the equation inin addtl_param – Dumuzi 27/5, 2023 at 17:46

Recommended topics

Hot tags