I have the following panel stored in df
:
state | district | year | y | constant | x1 | x2 | time | |
---|---|---|---|---|---|---|---|---|
0 | 01 | 01001 | 2009 | 12 | 1 | 0.956007 | 639673 | 1 |
1 | 01 | 01001 | 2010 | 20 | 1 | 0.972175 | 639673 | 2 |
2 | 01 | 01001 | 2011 | 22 | 1 | 0.988343 | 639673 | 3 |
3 | 01 | 01002 | 2009 | 0 | 1 | 0 | 33746 | 1 |
4 | 01 | 01002 | 2010 | 1 | 1 | 0.225071 | 33746 | 2 |
5 | 01 | 01002 | 2011 | 5 | 1 | 0.450142 | 33746 | 3 |
6 | 01 | 01003 | 2009 | 0 | 1 | 0 | 45196 | 1 |
7 | 01 | 01003 | 2010 | 5 | 1 | 0.427477 | 45196 | 2 |
8 | 01 | 01003 | 2011 | 9 | 1 | 0.854955 | 45196 | 3 |
y
is the number of protests in each districtconstant
is a column full of onesx1
is the proportion of the district's area covered by a mobile network providerx2
is the population count in each district (note that it is fixed in time)
How can I run the following model in Python?
Here's what I tried
# Transform `x2` to match model
df['x2'] = df['x2'].multiply(df['time'], axis=0)
# District fixed effects
df['delta'] = pd.Categorical(df['district'])
# State-time fixed effects
df['eta'] = pd.Categorical(df['state'] + df['year'].astype(str))
# Set indexes
df.set_index(['district','year'])
from linearmodels.panel import PanelOLS
m = PanelOLS(dependent=df['y'], exog=df[['constant','x1','x2','delta','eta']])
ValueError: exog does not have full column rank. If you wish to proceed with model estimation irrespective of the numerical accuracy of coefficient estimates, you can set rank_check=False.
What am I doing wrong?