I need to programmatically apply different functions to different columns and group by, using data.table
.
If the columns and functions were known, I would do like this:
library(data.table)
DT = data.table(id = rep(letters[1:3], each=3),
v1 = rep(c(2, 3, 4), each=3),
v2 = rep(c(5, 10, 15), each=3))
DT
#> id v1 v2
#> 1: a 2 5
#> 2: a 2 5
#> 3: a 2 5
#> 4: b 3 10
#> 5: b 3 10
#> 6: b 3 10
#> 7: c 4 15
#> 8: c 4 15
#> 9: c 4 15
DT[, .(v1=mean(v1), v2=sum(v2)), keyby=.(id)]
#> id v1 v2
#> 1: a 2 15
#> 2: b 3 30
#> 3: c 4 45
But I want to do this by passing the column names and their specific function:
aggregate_functions = list(v1=mean, v2=sum)
col_selection = c('v1', 'v2')
I wrote something like this by I can't figure out a way of passing the column name to lapply
:
DT[, lapply(.SD,
aggregate_functions[[col_name]] # some way of selecting the right function from aggregate_functions
),
.SDcols = col_selection,
by=id]
I have also tried with melt
and dcast
, but the latter applies all the functions to all the columns:
library(data.table)
DT = data.table(id = rep(letters[1:3], each=3),
v1 = rep(c(2, 3, 4), each=3),
v2 = rep(c(5, 10, 15), each=3))
DTm = melt(DT, meaure.vars=col_selection, id.vars='id')
DTm
#> id variable value
#> 1: a v1 2
#> 2: a v1 2
#> 3: a v1 2
#> 4: b v1 3
#> 5: b v1 3
#> 6: b v1 3
#> 7: c v1 4
#> 8: c v1 4
#> 9: c v1 4
#> 10: a v2 5
#> 11: a v2 5
#> 12: a v2 5
#> 13: b v2 10
#> 14: b v2 10
#> 15: b v2 10
#> 16: c v2 15
#> 17: c v2 15
#> 18: c v2 15
DTc = dcast(DTm, id~variable, fun.aggregate=list(sum, mean))
DTc
#> id value_sum_v1 value_sum_v2 value_mean_v1 value_mean_v2
#> 1: a 6 15 2 5
#> 2: b 9 30 3 10
#> 3: c 12 45 4 15
I could programmatically select and rename the relevant columns (3 and 4 in this case) but it doesn't look like an efficient approach.
Of course I could have a loop doing the job and merging the results, but I am looking for a data.table
way.
Thank you for your answer and thank you to the team at data.table
.
Created on 2019-11-26 by the reprex package (v0.3.0)