cumsum by group [duplicate]
Asked Answered
C

2

7

Suppose data looks like

group1 group2 num
A      sg     1
A      sh     2
A      sg     4
B      at     3
B      al     7

a <- cumsum(data[,"num"]) # 1 3 7 10 17

I need something accumulated by groups. In reality,I have multiple columns as grouping indicators. I want to get the accumulated sum by the subgroup I define.

E.g

If I group by group1 only, then the output should be

group1 sum
A      1
A      3
A      7
B      3
B      10

If I group by two variables group1,group2 then the output is

group1 group2 sum
A      sg     1
A      sh     2
A      sg     5
B      at     3
B      al     7
Cassidycassie answered 16/5, 2015 at 14:51 Comment(8)
Try for instance: ave(df$num,df$group1,FUN=cumsum) for summing on just group1 or ave(df$num,df$group1,df$group2,FUN=cumsum) for both groups.Yoghurt
This should be basic group by operation. Using data.table setDT(data)[, cumsum(num), list(group1, group2)] or data %>% group_by(group1, group2) %>% mutate(sum=cumsum(num)) with dplyrHosfmann
@Yoghurt didn't notice ave has option for FUN!Cassidycassie
@akrun. I haven't heard of this package. Will check it out later. The syntax looks weird.Cassidycassie
@W_ee which one the data.table or dplyr. dplyr is created to make things easier to understand.Hosfmann
@Hosfmann dplyr, Well, its name is already difficult to understand. Haven't found out what dplyr stands for.Cassidycassie
@W_ee I read in some posts that it acts like the plierHosfmann
Let us continue this discussion in chat.Cassidycassie
C
9
library(data.table)

data <- data.table(group1=c('A','A','A','B','B'),sum=c(1,2,4,3,7))

data[,list(cumsum = cumsum(sum)),by=list(group1)]
Consol answered 16/5, 2015 at 21:56 Comment(0)
P
6

In addition to using data.table, tapply in base R works fine for both of these cases:

dta <- read.table(text="
group1 group2 num
A      sg     1
A      sh     2
A      sg     4
B      at     3
B      al     7", header=TRUE)

dta$cumsum <- do.call(c, tapply(dta$num, dta$group1, FUN=cumsum))

Calculating the cumulative sum by two groups requires some reordering:

dta <- dta[order(dta$group1, dta$group2, dta$num),]

dta$cumsum2 <- do.call(c, tapply(dta$num, 
                                 paste0(dta$group1, dta$group2), 
                                 FUN=cumsum))
dta
      group1 group2 num cumsum cumsum2
1      A     sg   1      1       1
3      A     sg   4      7       5
2      A     sh   2      3       2
5      B     al   7     10       7
4      B     at   3      3       3

And if you need the original order back:

dta[as.numeric(rownames(dta)),]
  group1 group2 num cumsum cumsum2
1      A     sg   1      1       1
2      A     sh   2      3       2
3      A     sg   4      7       5
4      B     at   3      3       3
5      B     al   7     10       7
Pairoar answered 17/5, 2015 at 0:18 Comment(1)
Nice! I would just add simplify = FALSE to that tapply() call for the sake of type safety.Xanthochroid

© 2022 - 2024 — McMap. All rights reserved.