I'd like to create a variable containing the value of a variable in the previous year within a group.
id date value
1 1 1992 4.1
2 1 NA 4.5
3 1 1991 3.3
4 1 1990 5.3
5 1 1994 3.0
6 2 1992 3.2
7 2 1991 5.2
value_lagged
should be missing when the previous year is missing within a group - either because it is the first date within a group (as in row 4, 7), or because there are year gaps in the data (as in row 5). Also, value_lagged
should be missing when the current time is missing (as in row 2).
This gives:
id date value value_lagged
1 1 1992 4.1 3.3
2 1 NA 4.5 NA
3 1 1991 3.3 5.3
4 1 1990 5.3 NA
5 1 1994 3.0 NA
6 2 1992 3.2 5.2
7 2 1991 5.2 NA
For now, in R, I use the data.table
package
DT = data.table(id = c(1,1,1,1,1,2,2),
date = c(1992,NA,1991,1990,1994,1992,1991),
value = c(4.1,4.5,3.3,5.3,3.0,3.2,5.2)
)
setkey(DT, id, date)
DT[, value_lagged := DT[J(id, date-1), value], ]
DT[is.na(date), value_lagged := NA, ]
It's fast but it seems somewhat error prone to me. I'd like to know if there are better alternatives using data.table
, dplyr
, or any other package. Thanks a lot!
In Stata
, one would do:
tsset id date
gen value_lagged=L.value
is.na(date)
notis.na(value)
– SilbermanDT
multiple times), but I don't really see what else you want besides having a function that's specifically designed to do what you want (which what your stata solution really is):DT[J(id, date + 1, val = value), val_lag := i.val][is.na(date), val_lag := NA]
– WorstedD1=DT[J(id, date + 1, val = value)]
,D1[, val_lag:=val]
does not give the same thanDT[J(id, date + 1, val = value)][, val_lag:=val]
? – IowaDT[J(id, date + 1, val = value), val_lag := i.val]
andDT[,value_lagged :=DT[list(id,date-1),value],]
preferable for what data.table does? Or are they completely equivalent? – IowaDT[J(...), a := b]
does the assignments for the matching portion ofDT
from the merge, whereasDT[J(...)]
may be larger or smaller thanDT
itself. – WorstedDT[J(...), a := b]
syntax is better in that it avoids using thedata.table
name twice, which can be a source of errors; it's also going to be a tiny bit faster, but that's probably irrelevant for any task – Worsted