If you are choosing from 1:n
you could use sampl.int
which is faster. Also apply
ing on a matrix
is faster. Putting both in a function psamp
is even faster.
So, try this (I added dt[, 1:3]
so that it won't fail once the column is added):
psamp <- function(x) sample.int(n=3, size=1, prob=x)
dt[, sample :=apply(as.matrix(dt[, 1:3]), 1, psamp)]
To get somewhat rid of the apply
we could Vectorize
psamp
and use do.call
. Additionally—as @IRTFM suggests in his answer—we should make use of the .SD
symbol.
psampv <- Vectorize(function(p1, p2, p3) sample.int(n=3, size=1, replace=TRUE, prob=c(p1, p2, p3)))
dt[, sample := do.call(psampv, .SD), .SDcols=c('p1','p2','p3')]
To improve performance by even more than an order of magnitude, as suggested by @Henrik in comments we may use Rcpp
. I have slightly adapted the code from this answer and use the new Rcpp::sample
, which kindly gives identical results to base::sample
with the same set.seed
.
#include <Rcpp.h>
// [[Rcpp::export]]
Rcpp::IntegerVector sample_matrix1(Rcpp::NumericMatrix x, Rcpp::IntegerVector choice_set) {
int n = x.nrow();
Rcpp::IntegerVector result(n);
for (int i = 0; i < n; ++i) {
Rcpp::NumericVector z(x(i, Rcpp::_));
result[i] = Rcpp::sample(choice_set, 1, false, z)[0];
}
return result;
}
Rcpp::sourceCpp("sample_matrix1.cpp")
dt[, sample := sample_matrix1(as.matrix(.SD), 1:3), .SDcols=c('p1','p2','p3')]
Benchmark, 100k*100 repetitions each:
Unit: milliseconds
expr min lq mean median uq max neval cld
psamp_:= 1195.16708 1259.06558 1327.19581 1311.17878 1349.98905 1515.1187 100 b
psamp_.SD 1225.90467 1257.37766 1318.74885 1289.27571 1335.07736 1522.3423 100 b
psamp_set 1181.44985 1256.73204 1320.29317 1301.75657 1335.22009 1491.3870 100 b
psamp_do.call 1181.93117 1251.45863 1316.23306 1285.85710 1337.06674 1476.8023 100 b
rcpp 60.73652 67.15291 72.76073 70.47052 73.91629 127.8278 100 a
apply
are (1)by = 1:nrow(dt)
, or (2) melt to long format. Described e.g. here: Efficient row-wise operations on a data.table; How to do row wise operations on .SD columns in data.table, posts that you should have found, even with a very poor google-fu - "R data.table rowwise" ;) – Korykorzybskipsampv
I actually was inspired from thepmin
solution of your first link. – Embody