R: any faster R function than "tcrossprod" for symmetric dense matrix multiplication?
Asked Answered
M

1

0

Let

x = matrix(rnorm(1000000), nrow = 5000)

I would like to compute matrix multiplication with its transpose: x %*% t(x).

After googling I found a possible faster way of doing the above is

tcrossprod(x)

And time taken is

 user  system elapsed 
2.975   0.000   2.960

Is there is any other R-function which can do the task faster than the above function?

Mae answered 16/9, 2016 at 13:1 Comment(3)
Do you have all doubles as in your example? It's rare that I have a matrix this size with no zeroes or integers. The time is drastically increased with the format you are showing.Certification
@PierreLafortune. The actual matrix I am dealing with is consists of integers and fractions as elements. The output matrix is a covariance matrix.Mae
@ZheyuanLi I want to compute the eigen values and vectors using the r function "eigen".Mae
T
3

No. At R level this is already the fastest. But internally it calls level-3 BLAS routine dsyrk. So if you can have a high performance BLAS library this will be a lot faster. Try linking OpenBLAS to your R.

Linking a BLAS library does not require rebuilding R. You may have a read on my question linking R to BLAS library for an overview, which contains several links showing you how to set up alias then switch between different BLAS libraries on the machine.

Alternatively, you can read my extremely long question and answer Without root access, run R with tuned BLAS when it is linked with reference BLAS which gives various ways to use an external BLAS library even if R is linked to reference BLAS library.


As a side note, for a matrix with dimension m * n, dsyrk has FLOP counts n * m ^ 2. (Note, this is the computational costs for tcrossprod. For crossprod it is m * n ^ 2.)

You have m = 5000 and n = 200, and computation takes 2.96s. Thus, computation has speed: (200 * 5000 ^ 2 / 2.96) * 1e-9 = 1.68 GFLOPs. Well, this is an ordinary level of performance so at the moment you are definitely using reference BLAS. With OpenBLAS, performance can reach 10 GFLOPs or more, depending on your CPU. Good luck!

Teodoro answered 16/9, 2016 at 13:3 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.