I'm trying to learn R and there are a few things I've done for 10+ years in SAS that I cannot quite figure out the best way to do in R. Take this data:
id class t count desired
-- ----- ---------- ----- -------
1 A 2010-01-15 1 1
1 A 2010-02-15 2 3
1 B 2010-04-15 3 3
1 B 2010-09-15 4 4
2 A 2010-01-15 5 5
2 B 2010-06-15 6 6
2 B 2010-08-15 7 13
2 B 2010-09-15 8 21
I want to calculate the column desired as a rolling sum by id, class, and within a 4 months rolling window. Notice that not all months are present for each combination of id and class.
In SAS I'd typically do this in one of 2 ways:
RETAIN
plus a by id & class.PROC SQL
with a left join from df as df1 to df as df2 on id, class and the df1.d-df2.d within the appropriate window
What is the best R approach to this type of problem?
t <- as.Date(c("2010-01-15","2010-02-15","2010-04-15","2010-09-15",
"2010-01-15","2010-06-15","2010-08-15","2010-09-15"))
class <- c("A","A","B","B","A","B","B","B")
id <- c(1,1,1,1,2,2,2,2)
count <- seq(1,8,length.out=8)
desired <- c(1,3,3,4,5,6,13,21)
df <- data.frame(id,class,t,count,desired)
d
? its definition isn't included in your setup code. – Dukeszoo
package. It can do these rolling summaries on time aligned data fairly easily. If you're comfortable withsql
, you can use thesqldf
package. – Poiseddf <- data.frame(t,class, id, count ,desired)
– Terresaterrestrialproc sql
then check videos #027 and #028 for an intro tosqldf
andmerge
with R :) – Piledzoo
anddata.table
together for thesplit-apply-combine
strategy when I have sparse dates... – Poised