No. At R level this is already the fastest. But internally it calls level-3 BLAS routine dsyrk
. So if you can have a high performance BLAS library this will be a lot faster. Try linking OpenBLAS to your R.
Linking a BLAS library does not require rebuilding R. You may have a read on my question linking R to BLAS library for an overview, which contains several links showing you how to set up alias then switch between different BLAS libraries on the machine.
Alternatively, you can read my extremely long question and answer Without root access, run R with tuned BLAS when it is linked with reference BLAS which gives various ways to use an external BLAS library even if R is linked to reference BLAS library.
As a side note, for a matrix with dimension m * n
, dsyrk
has FLOP counts n * m ^ 2
. (Note, this is the computational costs for tcrossprod
. For crossprod
it is m * n ^ 2
.)
You have m = 5000
and n = 200
, and computation takes 2.96s
. Thus, computation has speed: (200 * 5000 ^ 2 / 2.96) * 1e-9 = 1.68 GFLOPs
. Well, this is an ordinary level of performance so at the moment you are definitely using reference BLAS. With OpenBLAS
, performance can reach 10 GFLOPs
or more, depending on your CPU. Good luck!