This question has a partial answer here but the question is too specific and I'm not able to apply it to my own problem.
I would like to skip a potentially heavy computation of the NA group when using by
.
library(data.table)
DT = data.table(X = sample(10),
Y = sample(10),
g1 = sample(letters[1:2], 10, TRUE),
g2 = sample(letters[1:2], 10, TRUE))
set(DT, 1L, 3L, NA)
set(DT, 1L, 4L, NA)
set(DT, 6L, 3L, NA)
set(DT, 6L, 4L, NA)
DT[, mean(X*Y), by = .(g1,g2)]
Here we can see there are up to 5 groups including the (NA, NA)
group. Considering that (i) the group is useless (ii) the groups can be very big and (iii) the actual computation is more complex than mean(X*Y)
can I skip the group in an efficient way? I mean, without creating a copy of the remaining table. Indeed the following works.
DT2 = data.table:::na.omit.data.table(DT, cols = c("g1", "g2"))
DT2[, mean(X*Y), by = .(g1,g2)]