do.call doesn't work with "+" as "what" and a list of 3+ elements
Asked Answered
Z

3

6

I can use do.call to sum two vectors elementwise:

do.call(what="+", args =list(c(0,0,1), c(1,2,3))
>[1] 1 2 4

However, if I'd like to call the same operator with a list of three vectors, it fails:

do.call(what = "+", args = list(c(0,0,1), c(1,2,3), c(9,1,2)))
>Error in `+`(c(0, 0, 1), c(1, 2, 3), c(9, 1, 2)): operator needs one or two arguments

I could use Reduce

Reduce(f = "+", x = list(c(0,0,1), c(1,2,3), c(9,1,2)))
>[1] 10  3  6

but I am aware of the overhead generated by the Reduce operation as compared to do.call and in my REAL application it isn't tolerable, as I need to sum not 3-element lists, but rather 10^5-element list of 10^4-element-long vectors.

UPD: Reduce turned out to be the fastest method, after all...

lst <- list(1:10000, 10001:20000, 20001:30000)
lst2 <- lst[rep(seq.int(length(lst)), 1000)]
microbenchmark::microbenchmark(colSums(do.call(rbind, lst2)),
                            vapply(transpose(lst2), sum, 0),
                            Reduce(f = "+", x = lst2))

    Unit: milliseconds
                           expr      min       lq     mean   median       uq       max neval cld
   colSums(do.call(rbind, lst2)) 153.5086 194.9139 222.7954 198.1952 201.8152  915.6354   100  b 
 vapply(transpose(lst2), sum, 0) 398.9424 537.3834 732.4747 781.7255 813.7376 1538.4301   100   c
       Reduce(f = "+", x = lst2) 101.5618 105.5864 139.8651 108.1204 112.7861 2567.1793   100 a  
Zorina answered 5/8, 2020 at 11:49 Comment(8)
nothing to do with do.call -- + only works like +x or x+yOpulent
if Reduce doesn't work for you, what's wrong with using a for loop for this case? unless your input is already in a matrix in which case colSums/rowSums is what you wantOpulent
if you want to build a call it'll have to be in "Polish" form like +(x1, +(x2, ..., +(x[n-1], xn)...)). doable but a mess; for loop should have the same performanceOpulent
my concern with loop was its (alleged) poor performance as compared to the classical apply-styles of functionsAngelineangelique
For the sake of completeness, I would do as MichaelChirico suggested and benchmark a loop. It'll be more efficient than you expect (if done correctly).Caseinogen
@27ϕ9 I did this out of curiosity. Same benchmark as before (with vector length = 10000). My for loop is virtually identical in time to Reduce, both of which are faster than the other two methods.Twotime
questions of efficiency mean looking at the whole workflow. of your input is already as a list of inputs, actually I think do.call(psum, inputs) would be best. apply works best on matrices, etcOpulent
ah, I forgot psum is something I wrote 😅 keep an eye on this pull request, eventually data.table could handle your case directly: github.com/Rdatatable/data.table/pull/4448Opulent
T
5

As your list gets larger, you might find that this starts to become fast:

# careful if you use the tidyverse that purrr does not mask transpose
library(data.table) 

lst <- list(c(0,0,1), c(1,2,3), c(9, 1, 2))

vapply(transpose(lst), sum, 0)
# [1] 10  3  6

I have taken a few answers to compare speed, which seems to be what you want.

# make the list a bit bigger...
lst2 <- lst[rep(seq.int(length(lst)), 1000)]

microbenchmark::microbenchmark(Reduce(`+`, lst2),
                               colSums(do.call(rbind, lst2)),
                               vapply(transpose(lst2), sum, 0),
                               eval(str2lang(paste0(lst2,collapse = "+"))))
)

Unit: microseconds
                                         expr     min       lq      mean   median       uq     max neval
                            Reduce(`+`, lst2)   954.9  1088.10  1341.271  1191.05  1389.00  6923.2   100
                colSums(do.call(rbind, lst2))   402.2   474.80   761.473   538.85   843.75  7079.7   100
              vapply(transpose(lst2), sum, 0)    81.9    91.85   110.455   103.90   119.00   330.4   100
 eval(str2lang(paste0(lst2, collapse = "+"))) 17489.2 18466.65 20767.888 19572.25 20809.80 57770.4   100

Here it is though with longer vectors, as is your use case. This benchmark will take a minute or two to run. Notice the unit is now in milliseconds. I think it will depend on how long the list is.

lst <- list(1:10000, 10001:20000, 20001:30000)
lst2 <- lst[rep(seq.int(length(lst)), 1000)]

microbenchmark::microbenchmark(colSums(do.call(rbind, lst2)),
                               vapply(transpose(lst2), sum, 0))
)

Unit: milliseconds
                            expr      min       lq     mean   median       uq      max neval
   colSums(do.call(rbind, lst2)) 141.7147 146.6305 188.5108 163.4915 228.7852 270.5679   100
 vapply(transpose(lst2), sum, 0) 261.8630 335.6093 348.6241 341.6958 348.6404 495.0994   100
Twotime answered 5/8, 2020 at 12:31 Comment(4)
Thanks for running these benchmarks, I'll go with the vapply solution, surely! But why is vapply so much faster anyway?! Would be interesting to find out...Angelineangelique
I think you need to consider the length of the vectors inside the list as this will have a dramatic impact on the benchmarks. OP can clarify but maybe these vectors are length 3, maybe they're 5000 or whatever.Caseinogen
they're length 10k, thanks for asking, added it to the main postAngelineangelique
Actually, it might not be. Just reran with longer vectors, about to update.Twotime
H
5

You could use :

colSums(do.call(rbind, lst))
#[1] 10  3  6

Or similarly :

rowSums(do.call(cbind, lst))

where lst is :

lst <- list(c(0,0,1), c(1,2,3), c(9, 1, 2))
Heisel answered 5/8, 2020 at 11:54 Comment(1)
Thanks, I was also considering something like. I wonder about the overhead generated by rbind'ing or cbind'ing the vectors into a matrix first. I'll get back with some tests.Angelineangelique
T
5

As your list gets larger, you might find that this starts to become fast:

# careful if you use the tidyverse that purrr does not mask transpose
library(data.table) 

lst <- list(c(0,0,1), c(1,2,3), c(9, 1, 2))

vapply(transpose(lst), sum, 0)
# [1] 10  3  6

I have taken a few answers to compare speed, which seems to be what you want.

# make the list a bit bigger...
lst2 <- lst[rep(seq.int(length(lst)), 1000)]

microbenchmark::microbenchmark(Reduce(`+`, lst2),
                               colSums(do.call(rbind, lst2)),
                               vapply(transpose(lst2), sum, 0),
                               eval(str2lang(paste0(lst2,collapse = "+"))))
)

Unit: microseconds
                                         expr     min       lq      mean   median       uq     max neval
                            Reduce(`+`, lst2)   954.9  1088.10  1341.271  1191.05  1389.00  6923.2   100
                colSums(do.call(rbind, lst2))   402.2   474.80   761.473   538.85   843.75  7079.7   100
              vapply(transpose(lst2), sum, 0)    81.9    91.85   110.455   103.90   119.00   330.4   100
 eval(str2lang(paste0(lst2, collapse = "+"))) 17489.2 18466.65 20767.888 19572.25 20809.80 57770.4   100

Here it is though with longer vectors, as is your use case. This benchmark will take a minute or two to run. Notice the unit is now in milliseconds. I think it will depend on how long the list is.

lst <- list(1:10000, 10001:20000, 20001:30000)
lst2 <- lst[rep(seq.int(length(lst)), 1000)]

microbenchmark::microbenchmark(colSums(do.call(rbind, lst2)),
                               vapply(transpose(lst2), sum, 0))
)

Unit: milliseconds
                            expr      min       lq     mean   median       uq      max neval
   colSums(do.call(rbind, lst2)) 141.7147 146.6305 188.5108 163.4915 228.7852 270.5679   100
 vapply(transpose(lst2), sum, 0) 261.8630 335.6093 348.6241 341.6958 348.6404 495.0994   100
Twotime answered 5/8, 2020 at 12:31 Comment(4)
Thanks for running these benchmarks, I'll go with the vapply solution, surely! But why is vapply so much faster anyway?! Would be interesting to find out...Angelineangelique
I think you need to consider the length of the vectors inside the list as this will have a dramatic impact on the benchmarks. OP can clarify but maybe these vectors are length 3, maybe they're 5000 or whatever.Caseinogen
they're length 10k, thanks for asking, added it to the main postAngelineangelique
Actually, it might not be. Just reran with longer vectors, about to update.Twotime
B
0

Another base R workaround

rowSums(as.data.frame(lst)

or

eval(str2lang(paste0(lst,collapse = "+")))

which gives

[1] 10  3  6

Data

lst <- list(c(0,0,1), c(1,2,3), c(9, 1, 2))
Bosomy answered 5/8, 2020 at 12:34 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.