I would like to compare the speed of Matlab in matrix multiplication with the speed of Eigen 3 on an Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz. The code including Eigen:
#include <iostream>
#include "Eigen/Dense"
#include <chrono>
#include <omp.h>
using namespace std;
using namespace Eigen;
const int dim=100;
int main()
{
std::chrono::time_point<std::chrono::system_clock> start, end;
int n;
n = Eigen::nbThreads();
cout<<n<<"\n";
Matrix<double, Dynamic, Dynamic> m1(dim,dim);
Matrix<double, Dynamic, Dynamic> m2(dim,dim);
Matrix<double, Dynamic, Dynamic> m_res(dim,dim);
start = std::chrono::system_clock::now();
for (int i = 0 ; i <100000; ++i) {
m1.setRandom(dim,dim);
m2.setRandom(dim,dim);
m_res=m1*m2;
}
end = std::chrono::system_clock::now();
std::chrono::duration<double> elapsed_seconds = end-start;
std::cout << "elapsed time: " << elapsed_seconds.count() << "s\n";
return 0;
}
It is compiled with g++ -O3 -std=c++11 -fopenmp
and executed with OMP_NUM_THREADS=8 ./prog
.
In Matlab I'm using
function mat_test(N,dim)
%
% N: how many tests
% dim: dimension of the matrices
tic
parfor i=1:N
A = rand(dim);
B = rand(dim);
C = A*B;
end
toc
The result is: 9s for Matlab, 36s for Eigen. What am I doing wrong in the Eigen case? I can exclude the dynamic allocation of of the matrices. Also, only 3 threads are used instead of eight.
EDIT:
Maybe I didn't state it clearly enough: The task is to multiply 100000times double valued matrices of dim=100 which are randomly filled each time, not only once. Do it as fast as possible with Eigen. If Eigen cannot cope with Matlab, what choice would you suggest?
parfor
? Lastly, why do you think you're doing something wrong, just because Eigen is slower? – BionicsA*B
until afterC
is used, andC
is never used, so it could eliminate ... well, the multiplication. You can emulate this in C++ by doingauto res = std::async( std::launch::deferred, [&]{return m1*m2;} );
. Well, using immutable matrices in shared pointers and implementing*
on them using lazy evaluation (as well as on lazy matrices)? In short, you have to do something to compare meaningfully. – Featherstonvec(i)=trace(A*B)
into the loop anddisplay(sum(vec))
below it boils down to ~10s. So it must have generated the matricesA
andB
and computed their product. Using the same computer. – Peltrand()
? – Featherston