I experience the same problem, so I had to find a way around. I don't have much experience, and this doesn't fix the root issue with OLSInfluence
. But it gives you summary_frame
.
I will use pandas dataframes as the source of the data. Even if you have it in other objects (like arrays) you can transform them into a dataframe with relative ease. To show how it works, I will import the Boston housing prices data set from sklearn.datasets
:
import pandas as pd
from sklearn.datasets import load_boston
#imports dataset
boston = load_boston()
#generates DataFrame bos
bos = pd.DataFrame(boston.data)
#adds columns names to bos
bos.columns = boston.feature_names
#adds column 'PRICE' to bos
bos['PRICE'] = boston.target
Now let us consider the relation between the column 'RM'
and the column 'PRICE'
, with 'RM'
as independent variable. For simplicity, let us consider simple OLS. Here comes the actual answer:
from statsmodels.formula.api import ols
m = ols('PRICE ~ RM',bos).fit()
infl = m.get_influence()
sm_fr = infl.summary_frame()
sm_fr
has the columns cooks_d
and dffits
that you look for.
get_influence
is easier. In the case here the call argument is wrong. It should have results provided to OLSInfluence and not to summary, i.e.st_inf.OLSInfluence(results).summary_frame()
should work. – Kitchenware