I am running an analysis that could benefit from clustering by BEA regions. I have not used the clustered standard error option in Statsmodels before, so I am unclear of whether or not I am messing up the syntax, or the option is broken. Any help would be greatly appreciated.
Here is the relevant section of code (note that the topline_specs
dict returns Patsy-style formulas):
#Capture topline specs
topline_specs={'GO':spec_dict['PC_GO']['Total']['TYPE']['BOTH'],
'RV':spec_dict['PC_RV']['Total']['TYPE']['BOTH'],
'ISSUER':spec_dict['PROP']['ISSUER']['TYPE']['BOTH'],
'PURPOSE':spec_dict['PROP']['PURPOSE']['TYPE']['BOTH']}
#Estimate each model
topline_mods={'GO':smf.ols(formula=topline_specs['GO'],data=data_d).fit(cov_type='cluster',
cov_kwds={'groups':data_d['BEA_INT']})}
topline_mods['GO']
The traceback stems from a numpy call. It returns the following:
ValueError: The weights and list don't have the same length.
Everything I could find on the use of clustered standard errors looked like the cov_kwds
argument can take a Series from the DataFrame housing the model data. What am I missing?
formula="GO ~ RV + ISSUER + PURPOSE"
. Otherwise you can use the data directlyOLS(data_d['GO'], sm.add_constant(data_d[['RV', ....]])).fit(...)
– Fugitive'groups'
have matching entries if missing values where removed from the data by the fomula/data handling inols
. – Fugitive