sum over past window-size dates per group
Asked Answered
O

1

2

The problem is similar to How do I do a conditional sum which only looks between certain date criteria but slightly different and the answer from that does not fit into current problem. The main difference is that the date column based on each group may not necessarily be complete (i.e., certain date may be missing)

Input:

input <- read.table(text="
2017-04-01     A     1
2017-04-02     B     2
2017-04-02     B     2
2017-04-02     C     2
2017-04-02     A     2
2017-04-03     C     3
2017-04-04     A     4
2017-04-05     B     5
2017-04-06     C     6
2017-04-07     A     7
2017-04-08     B     8
2017-04-09     C     9")
colnames(input) <- c("Date","Group","Score")

Rule: for each group at each date, looking back 3 calendar dates (include current date). calculate the sum.

Expected output:

    Date Group 3DaysSumPerGroup
    2017-04-01     A                1 #1  previous two dates are not available. partial is allowed
    2017-04-02     A                3 #2+1 both 4-01 and 4-02 are in the range
    2017-04-04     A                6 #4+2
    2017-04-07     A                7 #7
    2017-04-02     B                4 # 2+2 at the same day
    2017-04-05     B                5
    2017-04-08     B                8
    2017-04-02     C                2
    2017-04-03     C                5
    2017-04-06     C                6
    2017-04-09     C                9

I tried to use rollapply with partial=T, but result doesn't seem correct.

 input %>% 
     group_by(Group) %>% 
     arrange(Date) %>% mutate("3DaysSumPerGroup"=rollapply(data=Score,width=3,align="right",FUN=sum,partial=T,fill=NA,rm.na=T))
Osteitis answered 7/6, 2017 at 7:20 Comment(5)
Perhaps use cut to generate the window groupsProtozoan
Why the first value in group B is 4? There are no previous values for that groupCratch
@DavidArenburg, because B has two records on the same dayOsteitis
Possible duplicate of How do I do a conditional sum which only looks between certain date criteriaHumble
@mtoto, it's different. The question is on sequential date, but current problem has some missing date.Osteitis
C
4

Here's a (supposedly efficient) solution using the new non-equi joins and the by = .EACHI features in data.table (v1.9.8+)

library(data.table) #v1.10.4

## Convert to a proper date class, and add another column in order to define the range
setDT(input)[, c("Date", "Date2") := {
  Date = as.IDate(Date)
  Date2 = Date - 2L
  .(Date, Date2)
}]

## Run a non-equi join against the unique Date/Group combination in input
## Sum the Scores on the fly
## You can ignore the second Date column 

input[unique(input, by = c("Date", "Group")), ## This removes the dupes
      on = .(Group, Date <= Date, Date >= Date2), ## The join condition
      .(Score = sum(Score)), ## sum the scores
      keyby = .EACHI] ## Run the sum by each row in unique(input, by = c("Date", "Group"))

#     Group       Date       Date Score
#  1:     A 2017-04-01 2017-03-30     1
#  2:     A 2017-04-02 2017-03-31     3
#  3:     A 2017-04-04 2017-04-02     6
#  4:     A 2017-04-07 2017-04-05     7
#  5:     B 2017-04-02 2017-03-31     4
#  6:     B 2017-04-05 2017-04-03     5
#  7:     B 2017-04-08 2017-04-06     8
#  8:     C 2017-04-02 2017-03-31     2
#  9:     C 2017-04-03 2017-04-01     5
# 10:     C 2017-04-06 2017-04-04     6
# 11:     C 2017-04-09 2017-04-07     9
Cratch answered 7/6, 2017 at 8:22 Comment(6)
can you explain a bit on keyby = .EACHI?Osteitis
There is a comment near it. It basically means that "sum all the joined records in input for each row in unique(input, by = c("Date", "Group"))"Cratch
For the case of two columns Group1, Group2, do you know how to add the second Group into the current solution?Osteitis
Yes, you can add as many groups as you want, simply modify the on call as in: on = .(Group, Group2, Date <= Date, Date >= Date2)Cratch
You'll probably need to update the unique line to unique(input, by = c("Date", "Group", "Group2")) too.Cratch
thanks for adding the note. yes, i did and it worked :)Osteitis

© 2022 - 2024 — McMap. All rights reserved.