I'm benchmarking the solutions given so far against my own (which simply uses lapply
and rbindlist
). I couldn't run the entire task because I run out of memory. That's why I choose a smaller dt:
library(data.table)
dt <- data.table(cbind(1:5000000, 5000000:1))
original <- function() {
dt2 <- dt[, c(.SD, .(year = 1:10)), by = 1:nrow(dt) ]
dt2
}
sb <- function() {
dt2 <- dt[CJ(V1, year = 1:10), on = "V1"]
}
gregor <- function() {
CJDT <- function(...) {
Reduce(function(DT1, DT2) cbind(DT1, DT2[rep(1:.N, each=nrow(DT1))]), list(...))
}
years = data.table(year = 1:10, key = "year")
setkey(dt)
dt3 = CJDT(dt, years)
dt3
}
bindlist <- function() {
dt3 <- rbindlist(lapply(1:10, function(x) {
dt$year <- x
dt
}))
# dt3 <- setcolorder(dt3, c("nrow", "V1", "V2", "year")) # to get exactly same dt
# dt3 <- dt3[order(nrow)]
dt3
}
Benchmark
library(bench)
res <- mark(
original = original(),
sb = sb(),
gregor = gregor(),
bindlist = bindlist(),
iterations = 1,
check = FALSE
)
#> Warning: Some expressions had a GC in every iteration; so filtering is
#> disabled.
res
#> # A tibble: 4 x 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 original 5.88s 5.88s 0.170 1.72GB 16.0
#> 2 sb 1.76s 1.76s 0.570 1.73GB 0.570
#> 3 gregor 1.87s 1.87s 0.536 972.86MB 0
#> 4 bindlist 558.69ms 558.69ms 1.79 1.12GB 0
summary(res, relative = TRUE)
#> Warning: Some expressions had a GC in every iteration; so filtering is
#> disabled.
#> # A tibble: 4 x 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 original 10.5 10.5 1 1.81 Inf
#> 2 sb 3.14 3.14 3.35 1.82 Inf
#> 3 gregor 3.34 3.34 3.15 1 NaN
#> 4 bindlist 1 1 10.5 1.18 NaN
Created on 2019-12-03 by the reprex package (v0.3.0)
Now the results are not exactly the same (see commented code in my solution for correcting it) but equivalent to what you are trying to do. My lapply
plus rbindlist
solution is suprisingly the fastet by a factor of more than 3. This might change on the full task but I doubt it.
rep
on sequence of rows – Need