Efficiently replicate matrices in R
Asked Answered
L

6

18

I have a matrix and look for an efficient way to replicate it n times (where n is the number of observations in the dataset). For example, if I have a matrix A

A <- matrix(1:15, nrow=3)

then I want an output of the form

rbind(A, A, A, ...) #n times.

Obviously, there are many ways to construct such a large matrix, for example using a for loop or apply or similar functions. However, the call to the "matrix-replication-function" takes place in the very core of my optimization algorithm where it is called tens of thousands of times during one run of my program. Therefore, loops, apply-type of functions and anything similar to that are not efficient enough. (Such a solution would basically mean that a loop over n is performed tens of thousands of times, which is obviously inefficient.) I already tried to use the ordinary rep function, but haven't found a way to arrange the output of rep in a matrix of the desired format.

The solution do.call("rbind", replicate(n, A, simplify=F)) is also too inefficient because rbind is used too often in this case. (Then, about 30% of the total runtime of my program are spent performing the rbinds.)

Does anyone know a better solution?

Leena answered 23/10, 2012 at 16:13 Comment(2)
rbind is only used once in the do.call way. it's the replication that's probably bogging it down.Zante
I tested it with Rprofand rbind took about twice as much time as replicate. I was also surprised by that.Radish
P
26

Two more solutions:

The first is a modification of the example in the question

do.call("rbind", rep(list(A), n))

The second involves unrolling the matrix, replicating it, and reassembling it.

matrix(rep(t(A),n), ncol=ncol(A), byrow=TRUE)

Since efficiency is what was requested, benchmarking is necessary

library("rbenchmark")
A <- matrix(1:15, nrow=3)
n <- 10

benchmark(rbind(A, A, A, A, A, A, A, A, A, A),
          do.call("rbind", replicate(n, A, simplify=FALSE)),
          do.call("rbind", rep(list(A), n)),
          apply(A, 2, rep, n),
          matrix(rep(t(A),n), ncol=ncol(A), byrow=TRUE),
          order="relative", replications=100000)

which gives:

                                                 test replications elapsed
1                 rbind(A, A, A, A, A, A, A, A, A, A)       100000    0.91
3                   do.call("rbind", rep(list(A), n))       100000    1.42
5  matrix(rep(t(A), n), ncol = ncol(A), byrow = TRUE)       100000    2.20
2 do.call("rbind", replicate(n, A, simplify = FALSE))       100000    3.03
4                                 apply(A, 2, rep, n)       100000    7.75
  relative user.self sys.self user.child sys.child
1    1.000      0.91        0         NA        NA
3    1.560      1.42        0         NA        NA
5    2.418      2.19        0         NA        NA
2    3.330      3.03        0         NA        NA
4    8.516      7.73        0         NA        NA

So the fastest is the raw rbind call, but that assumes n is fixed and known ahead of time. If n is not fixed, then the fastest is do.call("rbind", rep(list(A), n). These were for a 3x5 matrix and 10 replications. Different sized matrices might give different orderings.

EDIT:

For n=600, the results are in a different order (leaving out the explicit rbind version):

A <- matrix(1:15, nrow=3)
n <- 600

benchmark(do.call("rbind", replicate(n, A, simplify=FALSE)),
          do.call("rbind", rep(list(A), n)),
          apply(A, 2, rep, n),
          matrix(rep(t(A),n), ncol=ncol(A), byrow=TRUE),
          order="relative", replications=10000)

giving

                                                 test replications elapsed
4  matrix(rep(t(A), n), ncol = ncol(A), byrow = TRUE)        10000    1.74
3                                 apply(A, 2, rep, n)        10000    2.57
2                   do.call("rbind", rep(list(A), n))        10000    2.79
1 do.call("rbind", replicate(n, A, simplify = FALSE))        10000    6.68
  relative user.self sys.self user.child sys.child
4    1.000      1.75        0         NA        NA
3    1.477      2.54        0         NA        NA
2    1.603      2.79        0         NA        NA
1    3.839      6.65        0         NA        NA

If you include the explicit rbind version, it is slightly faster than the do.call("rbind", rep(list(A), n)) version, but not by much, and slower than either the apply or matrix versions. So a generalization to arbitrary n does not require a loss of speed in this case.

Pily answered 23/10, 2012 at 16:53 Comment(1)
Wow, thanks a lot! However, my own little benchmark showed that the matrix versions are more efficient than the rbind calls for larger n, say n = 600. In that case, the version with the matrix(rep(t(...-call was most efficient.Radish
H
10

Probably this is more efficient:

apply(A, 2, rep, n)
Hurricane answered 23/10, 2012 at 16:19 Comment(4)
As I already said before, this method does not yield the correct result. Just try it out yourself: A <- matrix(1:15, nrow=3); n <- 2; rbind(A,A); matrix(rep(A, n), ncol = ncol(A), byrow = TRUE) not the same... edit: why can't I create linebreaks in a comment?Radish
@WolfgangPößnecker Sorry, my mistake. See the update of my answer.Hurricane
Thanks, that's already quite an improvement. I'm still looking for even faster solutions, though. ;)Radish
Great answer, very simplePapal
R
3

There's also this way:

rep(1, n) %x% A
Roanne answered 1/8, 2015 at 22:37 Comment(1)
also if you want to do an "each" version of the replication where every row of A is repeated n times then it's simply: A %x% rep(1, n).Roanne
P
2

You can use indexing

A[rep(seq(nrow(A)), n), ]
Poole answered 1/8, 2015 at 22:48 Comment(0)
I
2

I came here for the same reason as the original poster and ultimately updated @Brian Diggs comparison to include all of the other posted answers. Hopefully I did this correctly:

#install.packages("rbenchmark")
library("rbenchmark")
A <- matrix(1:15, nrow=3)
n <- 600

benchmark(do.call("rbind", replicate(n, A, simplify=FALSE)),
          do.call("rbind", rep(list(A), n)),
          apply(A, 2, rep, n),
          matrix(rep(t(A),n), ncol=ncol(A), byrow=TRUE),
          A[rep(seq(nrow(A)), n), ],
          rep(1, n) %x% A,
          apply(A, 2, rep, n),
          matrix(rep(as.integer(t(A)),n),nrow=nrow(A)*n,byrow=TRUE),
     order="relative", replications=10000)

#                                                                test replications elapsed relative user.self sys.self user.child sys.child
#5                                          A[rep(seq(nrow(A)), n), ]        10000    0.32    1.000      0.33     0.00         NA        NA
#8 matrix(rep(as.integer(t(A)), n), nrow = nrow(A) * n, byrow = TRUE)        10000    0.36    1.125      0.35     0.02         NA        NA
#4                 matrix(rep(t(A), n), ncol = ncol(A), byrow = TRUE)        10000    0.38    1.188      0.37     0.00         NA        NA
#3                                                apply(A, 2, rep, n)        10000    0.59    1.844      0.56     0.03         NA        NA
#7                                                apply(A, 2, rep, n)        10000    0.61    1.906      0.58     0.03         NA        NA
#6                                                    rep(1, n) %x% A        10000    1.44    4.500      1.42     0.02         NA        NA
#2                                  do.call("rbind", rep(list(A), n))        10000    1.67    5.219      1.67     0.00         NA        NA
#1                do.call("rbind", replicate(n, A, simplify = FALSE))        10000    5.03   15.719      5.02     0.01         NA        NA
Ivatts answered 4/4, 2021 at 21:49 Comment(0)
B
0

what about transforming it into an array, replicate the content and create a new matrix with the updated number of rows?

A <- matrix(...)
n = 2 # just a test

a = as.integer(A)
multi.a = rep(a,n)
multi.A = matrix(multi.a,nrow=nrow(A)*n,byrow=T)
Boggle answered 23/10, 2012 at 16:23 Comment(2)
I just tried this and it has the same problem as the answer above: your suggestion does unfortunately not yield the correct result.Radish
Use as.integer(t(A)). Then it works: matrix(rep(as.integer(t(A)),n),nrow=nrow(A)*n,byrow=TRUE)Ivatts

© 2022 - 2024 — McMap. All rights reserved.