I am trying to get a Rcpp version of pmvnorm to work at least as fast as mvtnorm::pmvnorm in R.
I have found https://github.com/zhanxw/libMvtnorm and created a Rcpp skeleton package with the relevant source files. I have added the following functions which make use of Armadillo (since I'm using it across other code I've been writing).
//[[Rcpp::export]]
arma::vec triangl(const arma::mat& X){
arma::mat LL = arma::trimatl(X, -1); // omit the main diagonal
return LL.elem(arma::find(LL != 0));
}
//[[Rcpp::export]]
double pmvnorm_cpp(arma::vec& bound, arma::vec& lowtrivec){
double error;
int n = bound.n_elem;
double* boundptr = bound.memptr();
double* lowtrivecptr = lowtrivec.memptr();
double result = pmvnorm_P(n, boundptr, lowtrivecptr, &error);
return result;
}
From R after building the package, this is a reproducible example:
set.seed(1)
covar <- rWishart(1, 10, diag(5))[,,1]
sds <- diag(covar) ^-.5
corrmat <- diag(sds) %*% covar %*% diag(sds)
triang <- triangl(corrmat)
bounds <- c(0.5, 0.9, 1, 4, -1)
rbenchmark::benchmark(pmvnorm_cpp(bounds, triang),
mvtnorm::pmvnorm(upper=bounds, corr = corrmat),
replications=1000)
Which shows that pmvnorm_cpp is much slower than mvtnorm::pmvnorm. and the result is different.
> pmvnorm_cpp(bounds, triang)
[1] 0.04300643
> mvtnorm::pmvnorm(upper=bounds, corr = corrmat)
[1] 0.04895361
which puzzles me because I thought the base fortran code was the same. Is there something in my code that makes everything go slow? Or should I try to port the mvtnorm::pmvnorm code directly? I have literally no experience with fortran.
Suggestions appreciated, excuse my incompetence othewise.
EDIT: to make a quick comparison with an alternative, this:
//[[Rcpp::export]]
NumericVector pmvnorm_cpp(NumericVector bound, NumericMatrix cormat){
Environment stats("package:mvtnorm");
Function f = stats["pmvnorm"];
NumericVector lower(bound.length(), R_NegInf);
NumericVector mean(bound.length());
NumericVector res = f(lower, bound, mean, cormat);
return res;
}
has essentially the same performance as an R call (the following on a 40-dimensional mvnormal):
> rbenchmark::benchmark(pmvnorm_cpp(bounds, corrmat),
+ mvtnorm::pmvnorm(upper=bounds, corr = corrmat),
+ replications=100)
test replications elapsed relative user.self sys.self
2 mvtnorm::pmvnorm(upper = bounds, corr = corrmat) 100 16.86 1.032 16.60 0.00
1 pmvnorm_cpp(bounds, corrmat) 100 16.34 1.000 16.26 0.01
so it seems to me there must be something going on in the previous code. either with how I'm handling things with Armadillo, or how the other things are connected. I would assume that there should be a performance gain compared to this last implementation.