I am trying to calculate the minimum across rows using the pmin function and data.table (similar to the post row-by-row operations and updates in data.table) but with a character list of columns using something like the with=FALSE
syntax, and with the na.rm=TRUE
argument.
DT <- data.table(x = c(1,1,2,3,4,1,9),
y = c(2,4,1,2,5,6,6),
z = c(3,5,1,7,4,5,3),
a = c(1,3,NA,3,5,NA,2))
> DT
x y z a
1: 1 2 3 1
2: 1 4 5 3
3: 2 1 1 NA
4: 3 2 7 3
5: 4 5 4 5
6: 1 6 5 NA
7: 9 6 3 2
I can calculate the minimum across rows using columns directly:
DT[,min_val := pmin(x,y,z,a,na.rm=TRUE)]
giving
> DT
x y z a min_val
1: 1 2 3 1 1
2: 1 4 5 3 1
3: 2 1 1 NA 1
4: 3 2 7 3 2
5: 4 5 4 5 4
6: 1 6 5 NA 1
7: 9 6 3 2 2
However, I am trying to do this over an automatically generated large set of columns, and I want to be able to do this across this arbitrary list of columns, stored in a col_names variable, col_names <- c("a","y","z')
I can do this:
DT[, col_min := do.call(pmin,DT[,col_names,with=FALSE])]
But it gives me NA values. I can't figure out how to pass the na.rm=TRUE
argument into the do.call. I've tried defining the function as
DT[, col_min := do.call(function(x) pmin(x,na.rm=TRUE),DT[,col_names,with=FALSE])]
but this gives me an error. I also tried passing in the argument as an additional element in a list, but I think pmin (or do.call) gets confused between the DT non-standard evaluation of column names and the argument.
Any ideas?