Fixed Effects Regression with Interaction Term Causes Error
Asked Answered
U

3

8

I am trying to estimate a panel dataset with an interaction term for geographical areas (LoadArea, DischargeArea) which signifies a route. Using the fixed effects specification, it does not like the interaction term (LoadArea * DischargeArea) and produces the following error when you summarize the regression:

mult_fe<-plm(log(DayRate)~LoadArea *DischargeArea + factor(Laycan.Day.Diff) + CapUtil + Age
+ I(Age^2) + WFRDWT + lag_BDTI, data=mult_reg1,model="within");


summary(mult_fe)
Error in crossprod(t(X), beta) : non-conformable arguments

This works fine in a normal OLS regression replacing plm with the lm function. Question is why isn't it working for my model?

Unclose answered 23/5, 2013 at 15:57 Comment(0)
K
4

This is a problem of collinearity among your variables.

The lm command automatically places NAs in the beta vector for variables that were not estimated due to colinearity, but PLM does not.

When you have LoadArea*DischargeArea PLM will three variables to your model:

LoadArea + DischargeArea + LoadArea:DischargeArea

After that PLM will demean them.

In this case, and without further information on your data my guess is that one of these variables is perfectly collinear with one of the factors levels in:

as.factor(Laycan.Day.Diff)

In your case I would try to estimate the model without the factor. If it works you know the factors are causing the problem. If it comes to that you can then convert each factor to a explicit 0/1 dummy and add them one by one until you understand where the problem is coming from.

To determine which variables are collinear you could try something like:

require(data.table)
tmp      <- data.table(var1=1:10,var2=55:64,userid=rep(c(1,2),5))
cols     <- c('var1','var2')
newnames <- c('demeaned_var1','demeaned_var2')
tmp[,(newnames):=.SD-lapply(.SD,mean),.SDcols=cols,by=userid]
cor(tmp[,newnames,with=F])

Line 5 is the demeaning. This other stack overflow post describes the operations of the data.table that i used above in detail.

The output of the code above will be:

> 
              demeaned_var1 demeaned_var2
demeaned_var1             1             1
demeaned_var2             1             1

This will tell you which demeaned vars are perfectly collinear.

Kosiur answered 10/6, 2013 at 9:48 Comment(3)
I am having the same problem. But in my model I have 41 independent variables. How can I know which ones are causing multicollinearity?Barnet
If you have data.table (which is a great package) you can do it easily by demeaning all your vars manually and then calculating the correlation table. Something like what i pasted above.Kosiur
Since a while, the plm package has two functions to detect linear dependence: detect_lin_dep and alias. Be sure to read their documentation because linear dependence after data transformations (e.g. the within/demeaning transformation) can be hard to spot.Strive
S
6

Please note that plm() is playing fine all along, its the summary.plm() function that's breaking bad! Delving deeper into the function reveals the trouble in the part where it calculates R^2.

Read more here on the same problem at stackexchange

Quick and not so elegant workarounds include:

(1) Replacing LoadArea:DischargeArea with LoadArea*DischargeArea

(2) Manually create separate interaction variable

LoadxDischarge <- LoadArea*DischargeArea 
Seismograph answered 3/6, 2015 at 18:44 Comment(0)
K
4

This is a problem of collinearity among your variables.

The lm command automatically places NAs in the beta vector for variables that were not estimated due to colinearity, but PLM does not.

When you have LoadArea*DischargeArea PLM will three variables to your model:

LoadArea + DischargeArea + LoadArea:DischargeArea

After that PLM will demean them.

In this case, and without further information on your data my guess is that one of these variables is perfectly collinear with one of the factors levels in:

as.factor(Laycan.Day.Diff)

In your case I would try to estimate the model without the factor. If it works you know the factors are causing the problem. If it comes to that you can then convert each factor to a explicit 0/1 dummy and add them one by one until you understand where the problem is coming from.

To determine which variables are collinear you could try something like:

require(data.table)
tmp      <- data.table(var1=1:10,var2=55:64,userid=rep(c(1,2),5))
cols     <- c('var1','var2')
newnames <- c('demeaned_var1','demeaned_var2')
tmp[,(newnames):=.SD-lapply(.SD,mean),.SDcols=cols,by=userid]
cor(tmp[,newnames,with=F])

Line 5 is the demeaning. This other stack overflow post describes the operations of the data.table that i used above in detail.

The output of the code above will be:

> 
              demeaned_var1 demeaned_var2
demeaned_var1             1             1
demeaned_var2             1             1

This will tell you which demeaned vars are perfectly collinear.

Kosiur answered 10/6, 2013 at 9:48 Comment(3)
I am having the same problem. But in my model I have 41 independent variables. How can I know which ones are causing multicollinearity?Barnet
If you have data.table (which is a great package) you can do it easily by demeaning all your vars manually and then calculating the correlation table. Something like what i pasted above.Kosiur
Since a while, the plm package has two functions to detect linear dependence: detect_lin_dep and alias. Be sure to read their documentation because linear dependence after data transformations (e.g. the within/demeaning transformation) can be hard to spot.Strive
P
0

A way to get at least the standard errors etc. is to use

library("sandwich")
library("lmtest")
coeftest(mult_fe)
Pothunter answered 14/6, 2016 at 21:17 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.