I've noticed that base::chol()
severely slows down when the matrix contains many small elements. Here is an example:
## disable openMP
library(RhpcBLASctl); blas_set_num_threads(1); omp_set_num_threads(1)
Baseline: create positive definite matrix and get timing for
chol()
.loc <- expand.grid(1:60, 1:50) covmat1 <- exp(-as.matrix(dist(loc))) mean(c(covmat1)) # [1] 0.002076862 system.time(chol1 <- chol(covmat1)) # user system elapsed # 0.313 0.024 0.337
Increase small values: create
covmat2
matrix with more small values.covmat2 <- exp(-as.matrix(dist(loc))*10) mean(c(covmat2)) # [1] 0.0003333937 system.time(chol2 <- chol(covmat2)) # user system elapsed # 2.311 0.021 2.333
Compared to the base line this slows down the computation by almost factor 10.
Set small values to zero: set values of
covmat2
that are smaller than 1e-13 to zero.covmat3 <- covmat2 covmat3[covmat3 < 1e-13] <- 0 mean(c(covmat3)) # [1] 0.0003333937 system.time(chol3 <- chol(covmat3)) # user system elapsed # 0.302 0.016 0.318
This version is again faster and similar to the base line.
Why does this slowdown happen?
Notes:
Repeated evaluations of the timing experiments lead to similar results.
I know that for matrices with many values close to zero it might be more efficient to use a sparse matrix approach, e.g., the R package spam
.
sessionInfo()
## R version 3.6.1 (2019-07-05)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Linux Mint 19.2
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/openblas/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/libopenblasp-r0.2.20.so
?chol
the third line under details suggests that there may be some early checks to determine whether the values are zero or negative. Just a guess without looking at the source code – Stiltcovmat1
is shorter than those ofcovmat2
. Hence, I don't think this could explain the timing difference. – Armourycovmat2
has more small numbers thancovmat1
and according to your explanation it should be faster. But it is slower... – Armoury