Does 'statsmodels' or another Python package offer an equivalent to R's 'step' function?
Asked Answered
H

2

8

Is there a statsmodels or other Python equivalent for R's step functionality for selecting a formula-based model using AIC?

Housekeeper answered 15/3, 2014 at 19:26 Comment(0)
A
10

I really suspect that you are doing the same online course as I do -- the following allows you to get the right answers. If the task at hand is not very computationally heavy (and it isn't in the course), then we can sidestep all the smart details of the step function, and just try all the subsets of the predictors.

For each subset we can calculate AIC as ACI = 2*nvars - 2*result.llf.
And then we just find a subset with the minimal AIC:

import itertools
import numpy as np
import pandas as pd
import statsmodels.api as sm
AICs = {}
for k in range(1,len(predictorcols)+1):
    for variables in itertools.combinations(predictorcols, k):
        predictors = train[list(variables)]
        predictors['Intercept'] = 1
        res = sm.OLS(target, predictors).fit()
        AICs[variables] = 2*(k+1) - 2*res.llf
pd.Series(AICs).idxmin()
Armoury answered 15/3, 2014 at 22:49 Comment(1)
Good approach. In the end I just did it in R and copied over the model into Python.Housekeeper
C
2

The first answer didn't work for me but the below one did. Heavily copied from Kostya.

AICs = {}
for k in range(1,len(predictorcols)+1):
    for variables in itertools.combinations(predictorcols, k):
        predictors = list(variables)
        i = True
        independent =''
        for p in predictors:
            if i:
                independent = p
                i=False
            else:
                independent+='+ {}'.format(p)
        regresion = '$DependentVariable$ ~ {}'.format(independent)
        res = sm.ols(regresion, data=train).fit()
        AICs[variables] = 2*(k+1) - 2*res.llf
pd.Series(AICs).idxmin()
Caustic answered 29/4, 2016 at 2:50 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.