I am curious to why multiplying a sparse-matrix by a dense-matrix takes a different time than the reverse. Are the algorithms significantly different?
Here's an example in matlab 2018a:
a=sprand(M,M,0.01);
b=rand(M);
tic;ref1=a*b;t_axb=toc
tic;ref2=b*a;t_bxa=toc
Here's an example with Eigen 3 and C++ using 1 thread:
//prepare acol=MxM ColMajor Eigen sparse matrix with 0.01 density
...
Map<Matrix<double,M,M,ColMajor> > bcol (PR, M, M );
double tic,toc;
tic=getHighResolutionTime();
result=acol*bcol;
toc=getHighResolutionTime();
printf("\nacol*bcol time: %f seconds", (toc - tic));
tic=getHighResolutionTime();
result=bcol*acol;
toc=getHighResolutionTime();
printf("\nbcol*acol time: %f seconds\n", (toc - tic));
When M=4000, the results are:
t_axb =
0.6877
t_bxa =
0.4803
acol*bcol time: 0.937590 seconds
bcol*acol time: 0.532622 seconds
When M=10000, the results are
t_axb =
11.5649
t_bxa =
9.7872
acol*bcol time: 20.140380 seconds
bcol*acol time: 8.061626 seconds
In both cases, sparse-dense product is slower than dense-sparse product for both Matlab and Eigen. I am curious as to
Why is this the case? Are the algorithms for sparse-dense significantly difference than dense-sparse? The number of FLOPs is the same, right?
Why does eigen match or exceed Matlab's performance for dense-sparse but not for sparse-dense product? A small difference in performance is normal, but a factor of ~1.4-1.8 difference seems strange given that both are highly optimized libraries. I am compiling eigen with all the optimizations as per the documentation. i.e.
-fPIC -fomit-frame-pointer -O3 -DNDEBUG -fopenmp -march=native
M=10000
I gett_bxa
about 10% greater, not smaller – Assumed