a Panel regression in Python
Asked Answered
L

2

7

I'm trying to run a panel regression on pandas Dataframes:

Currently I have two dataframes each containing 52 rows(dates)*99 columns(99stocks) :Markdown file with data representation

When running:

est=sm.OLS(Stockslist,averages).fit()
est.summary()

I get the ValueError: shapes (52,99) and (52,99) not aligned: 99 (dim 1) != 52 (dim 0)

Can somebody point me out what I am doing wrong? The model is simply y(i,t)=x(i,t)+error term so no intercept. However I would like to add time effects in the future.

Kind regards, Jeroen

Lubricous answered 17/4, 2016 at 21:16 Comment(6)
statsmodels OLS is for univariate dependent variable. You need to stack or np.ravel or reshape the individual timeseries. Do you want a single slope parameter for all stocks?Calipee
I have twice 52 individual time series. Instead of running 52 individual ols regressions I want a panel regression that captures all the stocks in a single regression. So yes I want a single slope instead of 52 different onesLubricous
That case is just equivalent to a single OLS regression in long form. So just reshape both DataFrames to 52 * 99 rows. Dummy variables for fixed effects can be created, for example, from firm labels or indices.Calipee
what do you mean by 'long form'(code wise)Lubricous
stack might do it pandas.pydata.org/pandas-docs/stable/… In numpy I would just use ravel or reshape with order='F' for stacking by columns.Calipee
Hey I thought that would do the trick,however I know get a different value error ValueError: The indices for endog and exog are not aligned.However I show that the indices are exactly the same and follow the exact same structure: linkLubricous
L
3

as you mentioned above I changed my code in the following way:

  1. I transformed the stacks into two dataframes
  2. I concated them into a single multi index dataframe
  3. ran the regression and added time effects

    <class 'pandas.core.frame.DataFrame'>
    MultiIndex: 5096 entries, (2015-04-03 00:00:00, AB INBEV) to (25/03/16, ZC.PA)
    Data columns (total 2 columns):
    indvalues    5096 non-null float64
    avgvalues    5096 non-null float64
    dtypes: float64(2)
    memory usage: 119.4+ KB
    
    from pandas.stats.plm import PanelOLS
    regression=PanelOLS(y=df["indvalues"], x=df[["avgvalues"]], time_effects=True)
    

the regression now works very nicely! Thank you Stefan Jansen

Lubricous answered 19/4, 2016 at 0:7 Comment(2)
I was still wondering if statsmodels doesn't offer any panel regression optionsLubricous
For more serious econometrics you're better off with R, I'm afraid, or any of the commercial packages. Here's an attempt to implement something but not sure it has move beyond the gist stage: gist.github.com/vincentarelbundock/5053686Acima
A
6

Try the below - I've copied the stock data from the above link and added random data for the x column. For a panel regression you need a 'MultiIndex' as mentioned in the comments.

df = pd.DataFrame(df.set_index('dates').stack())
df.columns = ['y']
df['x'] = np.random.random(size=len(df.index))
df.info()

MultiIndex: 100 entries, (2015-04-03 00:00:00, AB INBEV) to (2015-05-01 00:00:00, ZC.PA)
Data columns (total 2 columns):
y    100 non-null float64
x    100 non-null float64
dtypes: float64(2)
memory usage: 2.3+ KB

regression = PanelOLS(y=df['y'], x=df[['x']])

regression

-------------------------Summary of Regression Analysis-------------------------

Formula: Y ~ <x> + <intercept>

Number of Observations:         100
Number of Degrees of Freedom:   2

R-squared:         0.0042
Adj R-squared:    -0.0060

Rmse:              0.2259

F-stat (1, 98):     0.4086, p-value:     0.5242

Degrees of Freedom: model 1, resid 98

-----------------------Summary of Estimated Coefficients------------------------
      Variable       Coef    Std Err     t-stat    p-value    CI 2.5%   CI 97.5%
--------------------------------------------------------------------------------
             x    -0.0507     0.0794      -0.64     0.5242    -0.2063     0.1048
     intercept     2.1952     0.0448      49.05     0.0000     2.1075     2.2829
---------------------------------End of Summary---------------------------------
Acima answered 18/4, 2016 at 22:4 Comment(0)
L
3

as you mentioned above I changed my code in the following way:

  1. I transformed the stacks into two dataframes
  2. I concated them into a single multi index dataframe
  3. ran the regression and added time effects

    <class 'pandas.core.frame.DataFrame'>
    MultiIndex: 5096 entries, (2015-04-03 00:00:00, AB INBEV) to (25/03/16, ZC.PA)
    Data columns (total 2 columns):
    indvalues    5096 non-null float64
    avgvalues    5096 non-null float64
    dtypes: float64(2)
    memory usage: 119.4+ KB
    
    from pandas.stats.plm import PanelOLS
    regression=PanelOLS(y=df["indvalues"], x=df[["avgvalues"]], time_effects=True)
    

the regression now works very nicely! Thank you Stefan Jansen

Lubricous answered 19/4, 2016 at 0:7 Comment(2)
I was still wondering if statsmodels doesn't offer any panel regression optionsLubricous
For more serious econometrics you're better off with R, I'm afraid, or any of the commercial packages. Here's an attempt to implement something but not sure it has move beyond the gist stage: gist.github.com/vincentarelbundock/5053686Acima

© 2022 - 2024 — McMap. All rights reserved.