plm: using fixef() to manually calculate fitted values for a fixed effects twoways model
Asked Answered
M

4

14

Please note: I am trying to get the code to work with both time & individual fixed effects, and an unbalanced dataset. The sample code below works with a balanced dataset.

See edit below too, please

I am trying to manually calculate the fitted values of a fixed effects model (with both individual and time effects) using the plm package. This is more of an exercise to confirm I understand the mechanics of the model and the package, I know I can get the fitted values themselves from the plm object, from the two related questions (here and here).

From the plm vignette (p.2), the underlying model is:

y_it = alpha + beta_transposed * x_it + (mu_i + lambda_t + epsilon_it)

where mu_i is the individual component of the error term (a.k.a. "individual effect"), and lambda_t is the "time effect".

The fixed effects can be extracted by using fixef() and I thought I could use them (together with the independent variables) to calculate the fitted values for the model, using (with two independent variables) in this way:

fit_it = alpha + beta_1 * x1 + beta_2 * x2 + mu_i + lambda_t

This is where I fail -- the values I get are nowhere near the fitted values (which I get as the difference between the actual values and the residuals in the model object). For one, I do not see alpha anywhere. I tried playing with the fixed effects being shown as differences from the first, from the mean, etc., with no success.

What I am missing? It could well be a misunderstanding of the model, or an error in the code, I am afraid... Thanks in advance.

PS: One of the related questions hints that pmodel.response() should be related to my issue (and the reason there is no plm.fit function), but its help page does not help me understand what this function actually does, and I cannot find any examples how to interpret the result it produces.

Thanks!

Sample code of what I did:

library(data.table); library(plm)

set.seed(100)
DT <- data.table(CJ(id=c("a","b","c","d"), time=c(1:10)))
DT[, x1:=rnorm(40)]
DT[, x2:=rnorm(40)]
DT[, y:=x1 + 2*x2 + rnorm(40)/10]
DT <- DT[!(id=="a" & time==4)] # just to make it an unbalanced panel
setkey(DT, id, time)

summary(plmFEit <- plm(data=DT, id=c("id","time"), formula=y ~ x1 + x2, model="within", effect="twoways"))

# Extract the fitted values from the plm object
FV <- data.table(plmFEit$model, residuals=as.numeric(plmFEit$residuals))
FV[, y := as.numeric(y)]
FV[, x1 := as.numeric(x1)]
FV[, x2 := as.numeric(x2)]

DT <- merge(x=DT, y=FV, by=c("y","x1","x2"), all=TRUE)
DT[, fitted.plm := as.numeric(y) - as.numeric(residuals)]

FEI <- data.table(as.matrix(fixef(object=plmFEit, effect="individual", type="level")), keep.rownames=TRUE) # as.matrix needed to preserve the names?
setnames(FEI, c("id","fei"))
setkey(FEI, id)
setkey(DT, id)
DT <- DT[FEI] # merge the fei into the data, each id gets a single number for every row

FET <- data.table(as.matrix(fixef(object=plmFEit, effect="time", type="level")), keep.rownames=TRUE) # as.matrix needed to preserve the names?
setnames(FET, c("time","fet"))
FET[, time := as.integer(time)] # fixef returns time as character
setkey(FET, time)
setkey(DT, time)
DT <- DT[FET] # merge the fet into the data, each time gets a single number for every row

# calculate the fitted values (called calc to distinguish from those from plm)
DT[, fitted.calc := as.numeric(coef(plmFEit)[1] * x1 + coef(plmFEit)[2]*x2 + fei + fet)]
DT[, diff := as.numeric(fitted.plm - fitted.calc)]

all.equal(DT$fitted.plm, DT$fitted.calc)

My session is as follows:

R version 3.2.2 (2015-08-14)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 8 x64 (build 9200)

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] plm_1.4-0           Formula_1.2-1       RJSONIO_1.3-0       jsonlite_0.9.17     readxl_0.1.0.9000   data.table_1.9.7    bit64_0.9-5         bit_1.1-12          RevoUtilsMath_3.2.2

loaded via a namespace (and not attached):
 [1] bdsmatrix_1.3-2  Rcpp_0.12.1      lattice_0.20-33  zoo_1.7-12       MASS_7.3-44      grid_3.2.2       chron_2.3-47     nlme_3.1-122     curl_0.9.3       rstudioapi_0.3.1 sandwich_2.3-4  
[12] tools_3.2.2  

Edit: (2015-02-22) Since this has attracted some interest, I will try to clarify further. I was trying to fit a "fixed effects" model (a.k.a. "within" or "least squares dummy variables", as the plm package vignette calls it on p.3, top paragraph) -- same slope(s), different intercepts.

This is the same as running an ordinary OLS regression after adding dummies for time and id. Using the code below I can duplicate the fitted values from the plm package using base lm(). With the dummies, it is explicit that the first elements of both id and time are the group to compare to. What I still cannot do is how to use the facilities of the plm package to do the same I can easily accomplish using lm().

# fit the same with lm() and match the fitted values to those from plm()
lmF <- lm(data = DT, formula = y ~ x1 + x2 + factor(time) + factor(id))
time.lm <- coef(lmF)[grep(x = names(coef(lmF)), pattern = "time", fixed = TRUE)]
time.lm <- c(0, unname(time.lm)) # no need for names, the position index corresponds to time

id.lm <- coef(lmF)[grep(x = names(coef(lmF)), pattern = "id", fixed = TRUE)]
id.lm <- c(0, unname(id.lm))
names(id.lm) <- c("a","b","c","d") # set names so that individual values can be looked up below when generating the fit

DT[, by=list(id, time), fitted.lm := coef(lmF)[["(Intercept)"]]  +  coef(lmF)[["x1"]] * x1  +  coef(lmF)[["x2"]] * x2  +  time.lm[[time]]  +  id.lm[[id]]]
all.equal(DT$fitted.plm, DT$fitted.lm)

Hope this is useful to others who might be interested. The issue might be something about how plm and fixef deal with the missing value I intentionally created. I tried playing with the type= parameter of fixef but to no effect.

Monogamy answered 6/10, 2013 at 14:2 Comment(3)
Are you estimating random slopes and intercepts or just random intercepts?Aquarist
Please note, your sample code won't work for balanced panels either. You would need DT[, fitted.calc := as.numeric(coef(plmFEit)[1] * x1 + coef(plmFEit)[2]*x2 + fei + fet - within_intercept(plmFEit))] to get the same values. within_intercept (currently, only in the dev version of plm) gives the overall (artifical) intercept of the FE model. Here, it accounts for the shared id/time effect.Sinclare
Thanks. I haven't used the package recently but the info on the development version is useful. We're getting closer to an answer :)Monogamy
S
0

Edit: adapted to two-ways unbalanced model, needs plm version >= 2.4-0

Is this what you wanted? Extract the fixed effects by fixef. Here is an example for the Grunfeld data on an unbalanced two-way model (works the same for the balanced two-way model):

gtw_u <- plm(inv ~ value + capital, data = Grunfeld[-200, ], effect = "twoways")
yhat <- as.numeric(gtw_u$model[ , 1] - gtw_u$residuals) # reference
pred_beta <- as.numeric(tcrossprod(coef(gtw_u), as.matrix(gtw_u$model[ , -1])))
pred_effs <- as.numeric(fixef(gtw_u, "twoways")) # sum of ind and time effects

all.equal(pred_effs + pred_beta, yhat) # TRUE -> matches fitted values (yhat)

If you want to split the sum of individual and time effects (given by effect = "twoways") in its components, you will need to choose a reference and two come naturally to mind which are both given below:

# Splits of summed up individual and time effects:
# use one "level" and one "dfirst"
ii <- index(gtw_u)[[1L]]; it <- index(gtw_u)[[2L]]
eff_id_dfirst <- c(0, as.numeric(fixef(gtw_u, "individual", "dfirst")))[ii]
eff_ti_dfirst <- c(0, as.numeric(fixef(gtw_u, "time",       "dfirst")))[it]
eff_id_level <- as.numeric(fixef(gtw_u, "individual"))[ii]
eff_ti_level <- as.numeric(fixef(gtw_u, "time"))[it]

all.equal(pred_effs, eff_id_level  + eff_ti_dfirst) # TRUE
all.equal(pred_effs, eff_id_dfirst + eff_ti_level)  # TRUE

(This is based on the man page of fixef, ?fixef. There it is also shown how the (balanced and unbalanced) one-way model is to be handled).

Sinclare answered 12/11, 2015 at 14:28 Comment(3)
Hi, and thanks for posting. Your code works, but in essence it is just a subset in the code in my original question -- (1) deals with a balanced dataset, and (2) has only one effect, "individual". If you can get it to work with both time & individual effects and an unbalanced dataset (e.g. drop a row from Grunfeld), I'd be quite happy to accept it as an answer.Monogamy
Thanks! I am seriously amazed people still find the energy & time to work on 5-year-old questions :)Monogamy
My initial answer was about 5 years old. Your initial question about 7 years! ;)Sinclare
A
3

This works for an unbalanced data with effect="individual" and time dummies y ~ x +factor(year):

fitted <- pmodel.response(plm.model)-residuals(plm.model)
Apiarian answered 18/10, 2017 at 0:3 Comment(0)
B
2

I found this that can help you, since the lm() solution was not working in my case (I got different coefficients comparing to the plm package)

Therefore, it is just about applying the suggestions by the authors of the plm package here http://r.789695.n4.nabble.com/fitted-from-plm-td3003924.html

So what I did is just to apply

plm.object <- plm(y ~ lag(y, 1) + z +z2, data = mdt, model= "within", effect="twoways")
fitted <- as.numeric(plm.object$model[[1]] - plm.object$residuals) 

where I need the as.numeric function since I need to use it as a vector to plug in for further manipulations. I also want to point out that if your model has a lagged dependent variable on the right hand side, the solution above with as.numeric provides a vector already NET of the missing values because of the lag. For me this is exactly what I needed to.

Bussell answered 20/4, 2015 at 16:8 Comment(2)
Hello Bob, thank you for your answer. It does not address what I needed though. I have used something identical to what you suggest (probably after reading the thread you linked to). What I wanted to do was to generate the estimates using only the independent variables and the estimated coefficients (incl. time/id effects). Something similar to what I added as an example with lm().Monogamy
This is what I wanted to do as well but I did not manage to. Also using your approach with lm() I get different estimates from the plm modelBussell
A
1

I'm getting pretty close with Helix123's suggestion to subtract the within_intercept (it gets included in each of the two fixed effects, so you need to correct for that).

There's a very suggestive pattern in my reconstruction errors: individual a is always off by -0.004858712 (for every time period). Individuals b, c, d are always off by 0.002839703 for every time period except in period 4 (where there is no observation for a), where they're off by -0.010981192.

Any ideas? It looks like the individual fixed effects are thrown off by unbalancing. Rerunning it balanced works correctly.

Full code:

DT <- data.table(CJ(id=c("a","b","c","d"), time=c(1:10)))
DT[, x1:=rnorm(40)]
DT[, x2:=rnorm(40)]
DT[, y:= x1 + 2*x2 + rnorm(40)/10]
DT <- DT[!(id=="a" & time==4)] # just to make it an unbalanced panel
setkey(DT, id, time)

plmFEit <- plm(formula=y ~ x1 + x2,
               data=DT,
               index=c("id","time"),
               effect="twoways",
               model="within")

summary(plmFEit)

DT[, resids := residuals(plmFEit)]

FEI <- data.table(as.matrix(fixef(plmFEit, effect="individual", type="level")), keep.rownames=TRUE) # as.matrix needed to preserve the names?
setnames(FEI, c("id","fei"))
setkey(FEI, id)
setkey(DT, id)
DT <- DT[FEI] # merge the fei into the data, each id gets a single number for every row

FET <- data.table(as.matrix(fixef(plmFEit, effect="time", type="level")), keep.rownames=TRUE) # as.matrix needed to preserve the names?
setnames(FET, c("time","fet"))
FET[, time := as.integer(time)] # fixef returns time as character
setkey(FET, time)
setkey(DT, time)
DT <- DT[FET] # merge the fet into the data, each time gets a single number for every row

DT[, fitted.calc := plmFEit$coefficients[[1]] * x1 + plmFEit$coefficients[[2]] * x2 +
     fei + fet - within_intercept(plmFEit)]

DT[, myresids := y - fitted.calc]
DT[, myerr := resids - myresids]
Anya answered 9/12, 2017 at 19:16 Comment(2)
Hi Toban, would you please post a complete code for this? I am failing to get a TRUE in the last line. Also, are you running this for a single variable X rather than two variables?Monogamy
Thanks for checking this. I thought I had this working before with my code... I just spent some time testing this, trying to get it working with your code. I updated my answer, I feel like it's very close but can't put my finger on what's wrong.Anya
S
0

Edit: adapted to two-ways unbalanced model, needs plm version >= 2.4-0

Is this what you wanted? Extract the fixed effects by fixef. Here is an example for the Grunfeld data on an unbalanced two-way model (works the same for the balanced two-way model):

gtw_u <- plm(inv ~ value + capital, data = Grunfeld[-200, ], effect = "twoways")
yhat <- as.numeric(gtw_u$model[ , 1] - gtw_u$residuals) # reference
pred_beta <- as.numeric(tcrossprod(coef(gtw_u), as.matrix(gtw_u$model[ , -1])))
pred_effs <- as.numeric(fixef(gtw_u, "twoways")) # sum of ind and time effects

all.equal(pred_effs + pred_beta, yhat) # TRUE -> matches fitted values (yhat)

If you want to split the sum of individual and time effects (given by effect = "twoways") in its components, you will need to choose a reference and two come naturally to mind which are both given below:

# Splits of summed up individual and time effects:
# use one "level" and one "dfirst"
ii <- index(gtw_u)[[1L]]; it <- index(gtw_u)[[2L]]
eff_id_dfirst <- c(0, as.numeric(fixef(gtw_u, "individual", "dfirst")))[ii]
eff_ti_dfirst <- c(0, as.numeric(fixef(gtw_u, "time",       "dfirst")))[it]
eff_id_level <- as.numeric(fixef(gtw_u, "individual"))[ii]
eff_ti_level <- as.numeric(fixef(gtw_u, "time"))[it]

all.equal(pred_effs, eff_id_level  + eff_ti_dfirst) # TRUE
all.equal(pred_effs, eff_id_dfirst + eff_ti_level)  # TRUE

(This is based on the man page of fixef, ?fixef. There it is also shown how the (balanced and unbalanced) one-way model is to be handled).

Sinclare answered 12/11, 2015 at 14:28 Comment(3)
Hi, and thanks for posting. Your code works, but in essence it is just a subset in the code in my original question -- (1) deals with a balanced dataset, and (2) has only one effect, "individual". If you can get it to work with both time & individual effects and an unbalanced dataset (e.g. drop a row from Grunfeld), I'd be quite happy to accept it as an answer.Monogamy
Thanks! I am seriously amazed people still find the energy & time to work on 5-year-old questions :)Monogamy
My initial answer was about 5 years old. Your initial question about 7 years! ;)Sinclare

© 2022 - 2024 — McMap. All rights reserved.