do.call doesn't work with "+" as "what" and a list of 3+ elements

Asked 5/8, 2020 at 11:49 Answered 5/8, 2020 at 12:34

I can use do.call to sum two vectors elementwise:

do.call(what="+", args =list(c(0,0,1), c(1,2,3))
>[1] 1 2 4

However, if I'd like to call the same operator with a list of three vectors, it fails:

do.call(what = "+", args = list(c(0,0,1), c(1,2,3), c(9,1,2)))
>Error in `+`(c(0, 0, 1), c(1, 2, 3), c(9, 1, 2)): operator needs one or two arguments

I could use Reduce

Reduce(f = "+", x = list(c(0,0,1), c(1,2,3), c(9,1,2)))
>[1] 10  3  6

but I am aware of the overhead generated by the Reduce operation as compared to do.call and in my REAL application it isn't tolerable, as I need to sum not 3-element lists, but rather 10^5-element list of 10^4-element-long vectors.

UPD: Reduce turned out to be the fastest method, after all...

lst <- list(1:10000, 10001:20000, 20001:30000)
lst2 <- lst[rep(seq.int(length(lst)), 1000)]
microbenchmark::microbenchmark(colSums(do.call(rbind, lst2)),
                            vapply(transpose(lst2), sum, 0),
                            Reduce(f = "+", x = lst2))

    Unit: milliseconds
                           expr      min       lq     mean   median       uq       max neval cld
   colSums(do.call(rbind, lst2)) 153.5086 194.9139 222.7954 198.1952 201.8152  915.6354   100  b 
 vapply(transpose(lst2), sum, 0) 398.9424 537.3834 732.4747 781.7255 813.7376 1538.4301   100   c
       Reduce(f = "+", x = lst2) 101.5618 105.5864 139.8651 108.1204 112.7861 2567.1793   100 a

Zorina answered 5/8, 2020 at 11:49 Comment(8)

nothing to do with do.call -- + only works like +x or x+y – Opulent 5/8, 2020 at 11:57

if Reduce doesn't work for you, what's wrong with using a for loop for this case? unless your input is already in a matrix in which case colSums/rowSums is what you want – Opulent 5/8, 2020 at 11:58

if you want to build a call it'll have to be in "Polish" form like +(x1, +(x2, ..., +(x[n-1], xn)...)). doable but a mess; for loop should have the same performance – Opulent 5/8, 2020 at 12:1

my concern with loop was its (alleged) poor performance as compared to the classical apply-styles of functions – Angelineangelique 5/8, 2020 at 12:21

For the sake of completeness, I would do as MichaelChirico suggested and benchmark a loop. It'll be more efficient than you expect (if done correctly). – Caseinogen 5/8, 2020 at 13:28

@27ϕ9 I did this out of curiosity. Same benchmark as before (with vector length = 10000). My for loop is virtually identical in time to Reduce, both of which are faster than the other two methods. – Twotime 5/8, 2020 at 15:13

questions of efficiency mean looking at the whole workflow. of your input is already as a list of inputs, actually I think do.call(psum, inputs) would be best. apply works best on matrices, etc – Opulent 5/8, 2020 at 15:17

ah, I forgot psum is something I wrote 😅 keep an eye on this pull request, eventually data.table could handle your case directly: github.com/Rdatatable/data.table/pull/4448 – Opulent 5/8, 2020 at 15:21

As your list gets larger, you might find that this starts to become fast:

# careful if you use the tidyverse that purrr does not mask transpose
library(data.table) 

lst <- list(c(0,0,1), c(1,2,3), c(9, 1, 2))

vapply(transpose(lst), sum, 0)
# [1] 10  3  6

I have taken a few answers to compare speed, which seems to be what you want.

# make the list a bit bigger...
lst2 <- lst[rep(seq.int(length(lst)), 1000)]

microbenchmark::microbenchmark(Reduce(`+`, lst2),
                               colSums(do.call(rbind, lst2)),
                               vapply(transpose(lst2), sum, 0),
                               eval(str2lang(paste0(lst2,collapse = "+"))))
)

Unit: microseconds
                                         expr     min       lq      mean   median       uq     max neval
                            Reduce(`+`, lst2)   954.9  1088.10  1341.271  1191.05  1389.00  6923.2   100
                colSums(do.call(rbind, lst2))   402.2   474.80   761.473   538.85   843.75  7079.7   100
              vapply(transpose(lst2), sum, 0)    81.9    91.85   110.455   103.90   119.00   330.4   100
 eval(str2lang(paste0(lst2, collapse = "+"))) 17489.2 18466.65 20767.888 19572.25 20809.80 57770.4   100

Here it is though with longer vectors, as is your use case. This benchmark will take a minute or two to run. Notice the unit is now in milliseconds. I think it will depend on how long the list is.

lst <- list(1:10000, 10001:20000, 20001:30000)
lst2 <- lst[rep(seq.int(length(lst)), 1000)]

microbenchmark::microbenchmark(colSums(do.call(rbind, lst2)),
                               vapply(transpose(lst2), sum, 0))
)

Unit: milliseconds
                            expr      min       lq     mean   median       uq      max neval
   colSums(do.call(rbind, lst2)) 141.7147 146.6305 188.5108 163.4915 228.7852 270.5679   100
 vapply(transpose(lst2), sum, 0) 261.8630 335.6093 348.6241 341.6958 348.6404 495.0994   100

Twotime answered 5/8, 2020 at 12:31 Comment(4)

Thanks for running these benchmarks, I'll go with the vapply solution, surely! But why is vapply so much faster anyway?! Would be interesting to find out... – Angelineangelique 5/8, 2020 at 12:44

I think you need to consider the length of the vectors inside the list as this will have a dramatic impact on the benchmarks. OP can clarify but maybe these vectors are length 3, maybe they're 5000 or whatever. – Caseinogen 5/8, 2020 at 12:44

they're length 10k, thanks for asking, added it to the main post – Angelineangelique 5/8, 2020 at 12:45

Actually, it might not be. Just reran with longer vectors, about to update. – Twotime 5/8, 2020 at 12:51

You could use :

colSums(do.call(rbind, lst))
#[1] 10  3  6

Or similarly :

rowSums(do.call(cbind, lst))

where lst is :

lst <- list(c(0,0,1), c(1,2,3), c(9, 1, 2))

Heisel answered 5/8, 2020 at 11:54 Comment(1)

Thanks, I was also considering something like. I wonder about the overhead generated by rbind'ing or cbind'ing the vectors into a matrix first. I'll get back with some tests. – Angelineangelique 5/8, 2020 at 12:13

As your list gets larger, you might find that this starts to become fast:

# careful if you use the tidyverse that purrr does not mask transpose
library(data.table) 

lst <- list(c(0,0,1), c(1,2,3), c(9, 1, 2))

vapply(transpose(lst), sum, 0)
# [1] 10  3  6

I have taken a few answers to compare speed, which seems to be what you want.

# make the list a bit bigger...
lst2 <- lst[rep(seq.int(length(lst)), 1000)]

microbenchmark::microbenchmark(Reduce(`+`, lst2),
                               colSums(do.call(rbind, lst2)),
                               vapply(transpose(lst2), sum, 0),
                               eval(str2lang(paste0(lst2,collapse = "+"))))
)

Unit: microseconds
                                         expr     min       lq      mean   median       uq     max neval
                            Reduce(`+`, lst2)   954.9  1088.10  1341.271  1191.05  1389.00  6923.2   100
                colSums(do.call(rbind, lst2))   402.2   474.80   761.473   538.85   843.75  7079.7   100
              vapply(transpose(lst2), sum, 0)    81.9    91.85   110.455   103.90   119.00   330.4   100
 eval(str2lang(paste0(lst2, collapse = "+"))) 17489.2 18466.65 20767.888 19572.25 20809.80 57770.4   100

Here it is though with longer vectors, as is your use case. This benchmark will take a minute or two to run. Notice the unit is now in milliseconds. I think it will depend on how long the list is.

lst <- list(1:10000, 10001:20000, 20001:30000)
lst2 <- lst[rep(seq.int(length(lst)), 1000)]

microbenchmark::microbenchmark(colSums(do.call(rbind, lst2)),
                               vapply(transpose(lst2), sum, 0))
)

Unit: milliseconds
                            expr      min       lq     mean   median       uq      max neval
   colSums(do.call(rbind, lst2)) 141.7147 146.6305 188.5108 163.4915 228.7852 270.5679   100
 vapply(transpose(lst2), sum, 0) 261.8630 335.6093 348.6241 341.6958 348.6404 495.0994   100

Twotime answered 5/8, 2020 at 12:31 Comment(4)

Thanks for running these benchmarks, I'll go with the vapply solution, surely! But why is vapply so much faster anyway?! Would be interesting to find out... – Angelineangelique 5/8, 2020 at 12:44

they're length 10k, thanks for asking, added it to the main post – Angelineangelique 5/8, 2020 at 12:45

Actually, it might not be. Just reran with longer vectors, about to update. – Twotime 5/8, 2020 at 12:51

Another base R workaround

rowSums(as.data.frame(lst)

eval(str2lang(paste0(lst,collapse = "+")))

which gives

[1] 10  3  6

Data

lst <- list(c(0,0,1), c(1,2,3), c(9, 1, 2))

Bosomy answered 5/8, 2020 at 12:34 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags