Cumulative sum in a matrix

Asked 22/11, 2012 at 20:42 Answered 24/3, 2021 at 13:4

Solved arrays performance r matrix cumsum

I have a matrix like

A= [ 1 2 4
     2 3 1
     3 1 2 ]

and I would like to calculate its cumulative sum by row and by column, that is, I want the result to be

B = [ 1  3  7 
      3  8  13 
      6  12 19 ]

Any ideas of how to make this in R in a fast way? (Possibly using the function cumsum) (I have huge matrices)

Thanks!

Augean answered 22/11, 2012 at 20:42 Comment(0)

A one-liner:

t(apply(apply(A, 2, cumsum)), 1, cumsum))

The underlying observation is that you can first compute the cumulative sums over the columns and then the cumulative sum of this matrix over the rows.

Note: When doing the rows, you have to transpose the resulting matrix.

Your example:

> apply(A, 2, cumsum)
     [,1] [,2] [,3]
[1,]    1    2    4
[2,]    3    5    5
[3,]    6    6    7

> t(apply(apply(A, 2, cumsum), 1, cumsum))
     [,1] [,2] [,3]
[1,]    1    3    7
[2,]    3    8   13
[3,]    6   12   19

About performance: I have now idea how good this approach scales to big matrices. Complexity-wise, this should be close to optimal. Usually, apply is not that bad in performance as well.

Edit

Now I was getting curious - what approach is the better one? A short benchmark:

> A <- matrix(runif(1000*1000, 1, 500), 1000)
> 
> system.time(
+   B <- t(apply(apply(A, 2, cumsum), 1, cumsum))
+ )
       User      System     elapsed 
      0.082       0.011       0.093 
> 
> system.time(
+   C <- lower.tri(diag(nrow(A)), diag = TRUE) %*% A %*% upper.tri(diag(ncol(A)), diag = TRUE)
+ )
       User      System     elapsed 
      1.519       0.016       1.530

Thus: Apply outperforms matrix multiplication by a factor of 15. (Just for comparision: MATLAB needed 0.10719 seconds.) The results do not really surprise, as the apply-version can be done in O(n^2), while the matrix multiplication will need approx. O(n^2.7) computations. Thus, all optimizations that matrix multiplication offers should be lost if n is big enough.

Nekton answered 22/11, 2012 at 20:50 Comment(6)

+1 I actually can't think of anything better (despite what the deleted Answer said - serious brain fail there coupled with a error creating A). – Medlin 22/11, 2012 at 21:5

@Gavin: Brain fail happens all the time - at least in my case ;) - However, your solution made me think. Matrix multiplication with triangular matrices would work. In MATLAB: tril(ones(3,3)) * A * triu(ones(3,3)). R sadly does not offer good support for triangular matrices, so creating suitable matrices probably would kill all speed gains that could be archived by matrix multiplication. Great idea, though. – Nekton 22/11, 2012 at 21:10

@Nekton Yes, that's kind of what I had in mind before my brain failed and diag() snuck in there. – Medlin 22/11, 2012 at 21:20

@Nekton That would be lower.tri(A,T) %*% A %*% upper.tri(A,T) in R. – Rexanna 23/11, 2012 at 1:9

@Rexanna Neat one. There was the brain fail on my side - some time ago, I knew those functions... – Nekton 23/11, 2012 at 6:51

Final word, it should be lower.tri(diag(nrow(A)), diag = TRUE) %*% A %*% upper.tri(diag(ncol(A)), diag = TRUE) to handle non-square matrices. It does not scale as well as Thilo's double apply solution above though. – Mcfarlane 23/11, 2012 at 19:20

Here is a more efficient implementation using the matrixStats package and a larger example matrix:

library(matrixStats)
A <- matrix(runif(10000*10000, 1, 500), 10000)

# Thilo's answer
system.time(B <- t(apply(apply(A, 2, cumsum), 1, cumsum)))
user  system elapsed 
3.684   0.504   4.201

# using matrixStats
system.time(C <- colCumsums(rowCumsums(A)))
user  system elapsed 
0.164   0.068   0.233 

all.equal(B, C)
[1] TRUE

Sallyanne answered 22/6, 2016 at 15:54 Comment(0)

My solution: Function cumsum_row() (see below) takes matrix M and returns matrix of cumulative sums of M's rows. Function cumsum_col() does same thing for columns.

cumsum_row <- function(M) {
  M2 <- c()
  for (i in 1:nrow(M))
    M2 <- rbind(M2, cumsum(M[i,]))
  return (M2)
}

cumsum_col <- function(M) {
  return (t(cumsum_row(t(M))))
}

Example:

  > M <- matrix(rep(1, 9), nrow=3)
  > M
         [,1] [,2] [,3]
    [1,]    1    1    1
    [2,]    1    1    1
    [3,]    1    1    1

  > cumsum_row(M)
         [,1] [,2] [,3]
    [1,]    1    2    3
    [2,]    1    2    3
    [3,]    1    2    3

Ivanna answered 24/3, 2021 at 13:4 Comment(1)

Thank you for this code snippet, which might provide some limited, immediate help. A proper explanation would greatly improve its long-term value by showing why this is a good solution to the problem and would make it more useful to future readers with other, similar questions. Please edit your answer to add some explanation, including the assumptions you’ve made. – Precedence 24/3, 2021 at 13:13

Edit

Recommended topics

Hot tags