Preamble
Some time ago I asked a question about performance of Matlab vs Python (Performance: Matlab vs Python). I was surprised that Matlab is faster than Python, especially in meshgrid
. In the discussion of that question, it was pointed to me that I should use a wrapper in Python to call my C++ code because C++ code is also available to me. I have the same code in C++, Matlab and Python.
While doing that, I was surprised once again to find that Matlab is faster than C++ in matrix assembly and computation.I have a slightly larger code, from which I am investigating a segment of matrix-vector multiplication. The larger code performs such multiplications at multiple instances. Overall the code in C++ is much much faster than Matlab (because function calling in Matlab has an overhead etc.), but Matlab seems to be outperforming C++ in the matrix-vector multiplication (code snippet at the bottom).
Results
The table below shows the comparison of time it takes to assemble the kernel matrix and the time it takes to multiply the matrix with the vector. The results are compiled for a matrix size NxN
where N
varies from 10,000 to 40,000. Which is not that large. But the interesting thing is that Matlab outperforms C++ the larger the N
gets. Matlab is 3.8 - 5.8 times faster in total time. Moreover it is also faster in both matrix assembly and computation.
___________________________________________
|N=10,000 Assembly Computation Total |
|MATLAB 0.3387 0.031 0.3697 |
|C++ 1.15 0.24 1.4 |
|Times faster 3.8 |
___________________________________________
|N=20,000 Assembly Computation Total |
|MATLAB 1.089 0.0977 1.187 |
|C++ 5.1 1.03 6.13 |
|Times faster 5.2 |
___________________________________________
|N=40,000 Assembly Computation Total |
|MATLAB 4.31 0.348 4.655 |
|C++ 23.25 3.91 27.16 |
|Times faster 5.8 |
-------------------------------------------
Question
Is there a faster way of doing this in C++? Am I missing something? I understand that C++ is using for
loops but my understanding is that Matlab will also be doing something similar in meshgrid
.
Code Snippets
Matlab Code:
%% GET INPUT DATA FROM DATA FILES ------------------------------------------- %
% Read data from input file
Data = load('Input/input.txt');
location = Data(:,1:2);
charges = Data(:,3:end);
N = length(location);
m = size(charges,2);
%% EXACT MATRIX VECTOR PRODUCT ---------------------------------------------- %
kex1=ex1;
tic
Q = kex1.kernel_2D(location , location);
fprintf('\n Assembly time: %f ', toc);
tic
potential_exact = Q * charges;
fprintf('\n Computation time: %f \n', toc);
Class (Using meshgrid):
classdef ex1
methods
function [kernel] = kernel_2D(obj, x,y)
[i1,j1] = meshgrid(y(:,1),x(:,1));
[i2,j2] = meshgrid(y(:,2),x(:,2));
kernel = sqrt( (i1 - j1) .^ 2 + (i2 - j2) .^2 );
end
end
end
C++ Code:
EDIT
Compiled using a make file with following flags:
CC=g++
CFLAGS=-c -fopenmp -w -Wall -DNDEBUG -O3 -march=native -ffast-math -ffinite-math-only -I header/ -I /usr/include
LDFLAGS= -g -fopenmp
LIB_PATH=
SOURCESTEXT= src/read_Location_Charges.cpp
SOURCESF=examples/matvec.cpp
OBJECTSF= $(SOURCESF:.cpp=.o) $(SOURCESTEXT:.cpp=.o)
EXECUTABLEF=./exec/mykernel
mykernel: $(SOURCESF) $(SOURCESTEXT) $(EXECUTABLEF)
$(EXECUTABLEF): $(OBJECTSF)
$(CC) $(LDFLAGS) $(KERNEL) $(INDEX) $(OBJECTSF) -o $@ $(LIB_PATH)
.cpp.o:
$(CC) $(CFLAGS) $(KERNEL) $(INDEX) $< -o $@
`
# include"environment.hpp"
using namespace std;
using namespace Eigen;
class ex1
{
public:
void kernel_2D(const unsigned long M, double*& x, const unsigned long N, double*& y, MatrixXd& kernel) {
kernel = MatrixXd::Zero(M,N);
for(unsigned long i=0;i<M;++i) {
for(unsigned long j=0;j<N;++j) {
double X = (x[0*N+i] - y[0*N+j]) ;
double Y = (x[1*N+i] - y[1*N+j]) ;
kernel(i,j) = sqrt((X*X) + (Y*Y));
}
}
}
};
int main()
{
/* Input ----------------------------------------------------------------------------- */
unsigned long N = 40000; unsigned m=1;
double* charges; double* location;
charges = new double[N * m](); location = new double[N * 2]();
clock_t start; clock_t end;
double exactAssemblyTime; double exactComputationTime;
read_Location_Charges ("input/test_input.txt", N, location, m, charges);
MatrixXd charges_ = Map<MatrixXd>(charges, N, m);
MatrixXd Q;
ex1 Kex1;
/* Process ------------------------------------------------------------------------ */
// Matrix assembly
start = clock();
Kex1.kernel_2D(N, location, N, location, Q);
end = clock();
exactAssemblyTime = double(end-start)/double(CLOCKS_PER_SEC);
//Computation
start = clock();
MatrixXd QH = Q * charges_;
end = clock();
exactComputationTime = double(end-start)/double(CLOCKS_PER_SEC);
cout << endl << "Assembly time: " << exactAssemblyTime << endl;
cout << endl << "Computation time: " << exactComputationTime << endl;
// Clean up
delete []charges;
delete []location;
return 0;
}
m
is one, usingVectorXd
may be faster. – EquityMap
, butVectorXd
directly due to alignment. – EquityZero
takes about 1 second in the total assembly of 5.5 seconds in theN
=20,000 case. That is something I have tried. Do you have any alternate toZero
? – BarrVectorXd
it improves the computation time. ForN=20,000
case, it goes down from ~1 second to 0.27 seconds. Alas in the larger scheme of thingsm
may not necessarily be1
. – Barr-msse2
compiler flag (or a newer sse version) may improve the speed – Equity-ffast-math
≠"accurate-math"
, a compromise MATLAB does not make – Slush-msse -msse2 -msse3 -mssse3 -msse4 -msse4.1 -msse4.2 -mavx
did not help @yakoudbz. The matrix is square. – Barr//kernel = MatrixXd::Zero(M,N);
I get a segmentation fault – Barrresize
function. – Equitymex
functions if I recall correctly – Pigeonwingmex
runs C++ code and provides a MATLAB function that runs in MATLAB. If the OP requires an actual runnable executable outside of MATLAB, then the MATLAB Coder Toolbox is required. – Filibeg