When I run a cluster standard error panel specification with plm
and lfe
I get results that differ at the second significant figure. Does anyone know why they differ in their calculation of the SE's?
set.seed(572015)
library(lfe)
library(plm)
library(lmtest)
# clustering example
x <- c(sapply(sample(1:20), rep, times = 1000)) + rnorm(20*1000, sd = 1)
y <- 5 + 10*x + rnorm(20*1000, sd = 10) + c(sapply(rnorm(20, sd = 10), rep, times = 1000))
facX <- factor(sapply(1:20, rep, times = 1000))
mydata <- data.frame(y=y,x=x,facX=facX, state=rep(1:1000, 20))
model <- plm(y ~ x, data = mydata, index = c("facX", "state"), effect = "individual", model = "within")
plmTest <- coeftest(model,vcov=vcovHC(model,type = "HC1", cluster="group"))
lfeTest <- summary(felm(y ~ x | facX | 0 | facX))
data.frame(lfeClusterSE=lfeTest$coefficients[2],
plmClusterSE=plmTest[2])
lfeClusterSE plmClusterSE
1 0.06746538 0.06572588
multiwayvcov::cluster.vcov
it is easy to see the algebra used to obtain the Stata small-sample degrees-of-freedom correction, namely:(df$M/(df$M - 1)) * ((df$N - 1)/(df$N - df$K))
. But what would be the equivalent df correction as used issandwich(..., adjust=TRUE)
? In this answer you explain that the difference between the two is that for Stata the division is by1/(n - 1)
, and forsandwich
it is by1/(n - k)
. Yet I'm unsure how this translates into appropriate algebra... Do I replace(df$N - 1)
by(df$N - df$K)
above? – Pah