As a follow-up to this oldie but goodie: efficient row-wise operations on a data.table
I have some data that (unfortunately) look like:
library('data.table')
set.seed(1234)
m <- 5
n <- 7
rb <- function() runif(m,1000,2000) * rbinom(m,1,0.5)
series_col_nms <- paste0('YearNo',1:n)
rev <- data.table(cust_id = paste0('CustNo',1:m),
other_stuff = sample(letters,m, replace=TRUE))
for(col in series_col_nms){
set(rev, j=col, value=rb())
}
setkey(rev, cust_id)
One row per customer with various columns including yearly revenue for year 1, 2, ...
I would like to get the year indices for the first and last years with any revenue from each customer.
I can produce the desired result, but with a somewhat hacky join:
years_active <- rev[, which(.SD>0), .SDcols = series_col_nms,
keyby=cust_id][, .(min_year_active = min(V1),
max_year_active = max(V1)), keyby=cust_id]
years_active[rev]
These attempts to get the minimum index fail:
rev[, apply(.SD, 1, function(x) min(which(x>0))), .SDcols=series_col_nms, by=cust_id] # returns data type error
rev[, do.call(pmin, lapply(.SD, function(x) which(x>0))), .SDcols=series_col_nms, by=cust_id] # returns empty
What is the data.table
way to do this?