This is an old question, but I would like to point out something that is easy to miss and that can result in serious errors. The previous answer by dickoa is correct, but I thought I'd clarify why such a work-around is needed, since it might not be obvious.
While reading another thread I learned the following: As noted here, plm does not necessarily keep data in the same order it was given to the function. This means that simply using the residuals()
function on an plm-object and then joining it to your data can lead to the wrong residuals grouped to the wrong row of the data if you are not careful! As an example consider the following:
require(plm)
data("Gasoline") # The Gasoline dataset from the plm package
plm1 <- plm(lgaspcar ~ lincomep + lrpmg + lcarpcap, data=Gasoline, method = "within", index = c("country", "year"))
coef(plm1)
lincomep lrpmg lcarpcap
0.6622497 -0.3217025 -0.6404829
head(residuals(plm1))
1 2 3 4 5 6
-0.18814207 -0.19642727 -0.14874420 -0.12476346 -0.12114060 -0.08684045
Note the residuals that we were given. Now let us just change the order in which the dataset is ordered. This shouldn't change anything in the analysis.
set.seed(1234)
Gasoline2 <- Gasoline[order(runif(nrow(Gasoline))), ] # We just change the order of the rows.
plm2 <- plm(lgaspcar ~ lincomep + lrpmg + lcarpcap, data=Gasoline2, method = "within", index = c("country", "year"))
coef(plm2)
lincomep lrpmg lcarpcap
0.6622497 -0.3217025 -0.6404829
head(residuals(plm2))
258 7 64 73 268 186
-0.18814207 -0.19642727 -0.14874420 -0.12476346 -0.12114060 -0.08684045
At first glance this seems fine; the estimated coefficients are the same as before. However, note that the order that the residuals are presented in are the same as before we shifted the rows around. The only thing that's changed is that the names associated to the residuals now reflect their new position in the data. So the observation that post-reordering is on row 1 in the data, was pre-reordering on row 258.
Gasoline2[1, ]
country year lgaspcar lincomep lrpmg lcarpcap
258 SWEDEN 1970 3.989372 -7.73261 -2.733592 -8.164506
Gasoline[258, ]
country year lgaspcar lincomep lrpmg lcarpcap
258 SWEDEN 1970 3.989372 -7.73261 -2.733592 -8.164506
This means that if we had Gasoline2 as our dataset that we were working with, then using a function like cbind()
on Gasoline2
and residuals(plm2)
would lead to the wrong residuals being connected to the observations.
head(cbind(Gasoline, residuals(plm1)))
country year lgaspcar lincomep lrpmg lcarpcap residuals(plm1)
1 AUSTRIA 1960 4.173244 -6.474277 -0.3345476 -9.766840 -0.18814207
2 AUSTRIA 1961 4.100989 -6.426006 -0.3513276 -9.608622 -0.19642727
3 AUSTRIA 1962 4.073177 -6.407308 -0.3795177 -9.457257 -0.14874420
4 AUSTRIA 1963 4.059509 -6.370679 -0.4142514 -9.343155 -0.12476346
5 AUSTRIA 1964 4.037689 -6.322247 -0.4453354 -9.237739 -0.12114060
6 AUSTRIA 1965 4.033983 -6.294668 -0.4970607 -9.123903 -0.08684045
head(cbind(Gasoline2, residuals(plm2)))
country year lgaspcar lincomep lrpmg lcarpcap residuals(plm2)
258 SWEDEN 1970 3.989372 -7.732610 -2.7335921 -8.164506 -0.18814207
7 AUSTRIA 1966 4.047537 -6.252545 -0.4668377 -9.019822 -0.19642727
64 DENMARK 1966 4.233643 -5.851866 -0.3961885 -8.681541 -0.14874420
73 DENMARK 1975 4.033015 -5.612967 -0.3939543 -8.274632 -0.12476346
268 SWITZERL 1961 4.441330 -6.111640 -0.8655847 -9.158229 -0.12114060
186 JAPAN 1974 4.007964 -5.852553 -0.1909064 -8.846520 -0.08684045
As we see above, the residuals are assigned to the wrong row in the Gasoline2 example.
So what's going on? Well, as alluded to earlier, plm
does not preserve the order of the observations. Using the attr()
function dickoa pointed out in a previous answer, we can see that plm
reorganizes the data by country and year.
head( attr(residuals(plm2), "index") )
country year
1 AUSTRIA 1960
2 AUSTRIA 1961
3 AUSTRIA 1962
4 AUSTRIA 1963
5 AUSTRIA 1964
6 AUSTRIA 1965
This is how the original Gasoline data was structured, so that's why the residuals are presented in the same order.
We can thus use the fact that attr(residuals(plm2), "index")
gives us the residuals and their corresponding country and year indicators, in order to add the residuals to the original data. As pointed out here, the plyr
package is very helpful for this.
require(plyr)
resids2 <- data.frame(residual = residuals(plm2), attr(residuals(plm2), "index"))
Gasoline2$year <- factor(Gasoline2$year) # Needed since resids2$year is a factor, and Gasoline2$years was an integer. plyr does not accept them to be of different types.
Gasoline2 <- join(Gasoline2, resids2, by = c("country", "year"))
head(Gasoline2)
country year lgaspcar lincomep lrpmg lcarpcap residual
1 SWEDEN 1970 3.989372 -7.732610 -2.7335921 -8.164506 -0.02468148
2 AUSTRIA 1966 4.047537 -6.252545 -0.4668377 -9.019822 -0.02479759
3 DENMARK 1966 4.233643 -5.851866 -0.3961885 -8.681541 0.03175032
4 DENMARK 1975 4.033015 -5.612967 -0.3939543 -8.274632 -0.06575219
5 SWITZERL 1961 4.441330 -6.111640 -0.8655847 -9.158229 -0.05789130
6 JAPAN 1974 4.007964 -5.852553 -0.1909064 -8.846520 -0.21957156
Which gives us the correct result.
iris$residuals <- plm1$residuals
You can usedsplit
orplyr
to do your R^2 by group. – Airilycomplete.cases
to make sure I don't have missing observations in my data. – Airily