Matlab mex-file with mexCallMATLAB is almost 300 times slower than the corresponding m-file
Asked Answered
S

3

9

I started implementing a few m-files in C++ in order to reduce run times. The m-files produce n-dimensional points and evaluate function values at these points. The functions are user-defined and they are passed to m-files and mex-files as function handles. The mex-files use mexCallMATLAB with feval for finding function values.

I constructed the below example where a function handle fn constructed in the Matlab command line is passed to matlabcallingmatlab.m and mexcallingmatlab.cpp routines. With a freshly opened Matlab, mexcallingmatlab evaluates this function 200000 in 241.5 seconds while matlabcallingmatlab evaluates it in 0.81522 seconds therefore a 296 times slow-down with the mex implementation. These times are the results of the second runs as the first runs seem to be larger probably due to some overhead associated first time loading the program etc.

I have spent many days searching online on this problem and tried some suggestions on it. I tried different mex compiling flags to optimize the mex but there was almost no difference in performance. A previous post in Stackoverflow stated that upgrading Matlab was the solution but I am using probably the latest version MATLAB Version: 8.1.0.604 (R2013a) on Mac OS X Version: 10.8.4. I did compile the mex file with and without –largeArrayDims flag but this didn’t make any difference either. Some suggested that the content of the function handle could be directly coded in the cpp file but this is impossible as I would like to provide this code to any user with any type of function with a vector input and real number output.

As far as I found out, mex files need to go through feval function for using a function handle whereas m-files can directly call function handles provided that Matlab version is newer than some version.

Any help would be greatly appreciated.

simple function handle created in the Matlab command line:

fn = @(x) x'*x 

matlabcallingmatlab.m :

function matlabcallingmatlab( fn )
x = zeros(2,1); 
for i = 0 : 199999
    x(2) = i; 
    f = fn( x ); 
end

mexcallingmatlab.cpp:

#include "mex.h"
#include <cstring>

void mexFunction( int nlhs, mxArray *plhs[],
                  int nrhs, const mxArray *prhs[] )
{
    mxArray *lhs[1], *rhs[2]; //parameters to be passed to feval
    double f, *xptr, x[] = {0.0, 0.0}; // x: input to f and f=f(x)
    int n = 2, nbytes = n * sizeof(double);  // n: dimension of input x to f

    // prhs[0] is the function handle as first argument to feval
    rhs[0] = const_cast<mxArray *>( prhs[0] );

    // rhs[1] contains input x to the function
    rhs[1] = mxCreateDoubleMatrix( n, 1, mxREAL);
    xptr = mxGetPr( rhs[1] );

    for (int i = 0; i < 200000; ++i)
    {
        x[1] = double(i);   // change input 
        memcpy( xptr, x, nbytes );  // now rhs[1] has new x
        mexCallMATLAB(1, lhs, 2, rhs, "feval");
        f = *mxGetPr( lhs[0] );
    }
}

Compilation of mex file:

>> mex -v -largeArrayDims mexcallingmatlab.cpp
Saleswoman answered 6/9, 2013 at 14:42 Comment(7)
So, you are using C++ to call a Matlab function that does "x * x"? I wouldn't be surprised if Matlab does this better than your C++ solution. Because the Matlab code doesn't have to jump through a whole range of hoops to get the data from C++ to Matlab format and back into C++ format.Sauder
Are you sure you haven't just measured the overhead of calling mexCallMATLAB 200000 times ?Terracotta
If you want to improve performance here, then vectorize your functions to make them work on entire vectors/matrices rather than one input at-a-time. Unfortunately in MATLAB, function calls have a high overhead compared to other languages, so the idea is to minimize the number of times you evaluate fh.. For instance, the above can be made: x = [zeros(1,199999);1:199999]; fh = @(x) dot(x,x); out = fh(x); where the dot function (vector dot product) is already vectorized and works on the columns of the input argumentsAxe
@Amro: If I am the user I could optimize the function, but I cannot assume other users have carefully optimized functions.Saleswoman
@Mats: I put this example function for simplicity. I tried many other more involved functions and the results were similar.Saleswoman
@Meteor: My comment is a less detailed variant of Peter's answer.Sauder
@Meteor: while what others said is true, the real issue is that you were relying on the automatic freeing of memory in MEX-file, which can be really slow for thousands of mxArrays. So if you explicitly cleanup after your calls, the problem goes away. Please see my answer below...Axe
A
18

So I tried to implement this myself, and I think I found the reason for the slowness.

Basically your code have a small memory leak where you are not freeing the lhs mxArray returned from the call to mexCallMATLAB. It is not exactly a memory-leak, seeing that MATLAB memory manager takes care of freeing the memory when the MEX-file exits:

MATLAB allocates dynamic memory to store the mxArrays in plhs. MATLAB automatically deallocates the dynamic memory when you clear the MEX-file. However, if heap space is at a premium, call mxDestroyArray when you are finished with the mxArrays plhs points to.

Still explicit is better than implicit... So your code is really stressing the deallocator of the MATLAB memory manager :)

mexcallingmatlab.cpp

#include "mex.h"

#ifndef N
#define N 100
#endif

void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[])
{
    // validate input/output arguments
    if (nrhs != 1) {
        mexErrMsgTxt("One input argument required.");
    }
    if (mxGetClassID(prhs[0]) != mxFUNCTION_CLASS) {
        mexErrMsgTxt("Input must be a function handle.");
    }
    if (nlhs > 1) {
        mexErrMsgTxt("Too many output arguments.");
    }

    // allocate output
    plhs[0] = mxCreateDoubleMatrix(N, 1, mxREAL);
    double *out = mxGetPr(plhs[0]);

    // prepare for mexCallMATLAB: val = feval(@fh, zeros(2,1))
    mxArray *lhs, *rhs[2];
    rhs[0] = mxDuplicateArray(prhs[0]);
    rhs[1] = mxCreateDoubleMatrix(2, 1, mxREAL);
    double *xptr = mxGetPr(rhs[1]) + 1;

    for (int i=0; i<N; ++i) {
        *xptr = i;
        mexCallMATLAB(1, &lhs, 2, rhs, "feval");
        out[i] = *mxGetPr(lhs);
        mxDestroyArray(lhs);
    }

    // cleanup
    mxDestroyArray(rhs[0]);
    mxDestroyArray(rhs[1]);
}

MATLAB

fh = @(x) x'*x;
N = 2e5;

% MATLAB
tic
out = zeros(N,1);
for i=0:N-1
    out(i+1) = feval(fh, [0;i]);
end
toc

% MEX
mex('-largeArrayDims', sprintf('-DN=%d',N), 'mexcallingmatlab.cpp')
tic
out2 = mexcallingmatlab(fh);
toc

% check results
assert(isequal(out,out2))

Running the above benchmark a couple of times (to warm it up), I get the following consistent results:

Elapsed time is 0.732890 seconds.    % pure MATLAB
Elapsed time is 1.621439 seconds.    % MEX-file

No where near the slow times you initially had! Still the pure MATLAB part is about twice as fast, probably because of the overhead of calling an external MEX-function.

(My system: Win8 running 64-bit R2013a)

Axe answered 6/9, 2013 at 17:6 Comment(1)
Thank you very much. Adding mxDestroyArray(lhs[0]); line in the for loop to my code solved the problem. Now I have almost the same run times, both around 0.8 seconds. This performance is surely acceptable. I would like to thank others too for their informative comments.Saleswoman
H
4

There's absolutely no reason to expect that a MEX file is, in general, faster than an M file. The only reason that this is often true is that many loops in MATLAB incur a lot of function call overhead, along with parameter checking and such. Rewriting that in C eliminates the overhead, and gives your C compiler a chance to optimize the code.

In this case, there's nothing for the C compiler to optimize... it MUST make the MATLAB interface call for every iteration. In fact, the MATLAB optimizer will do a better job, since it can, in some cases "see" into the function.

In other words, forget using MEX to speed up this program.

Hett answered 6/9, 2013 at 15:15 Comment(1)
This code represents only a small part of the whole program. C compiler is doing very well over all other parts but this part was the bottleneck.Saleswoman
P
1

There is some overhead cost in calls from mex to Matlab and vice versa. The overhead per call is small, but it it really adds up in a tight loop like this. As your testing indicates, pure Matlab can be much faster in this case! Your other option is to eliminate the mexCallMATLAB call and do everything in pure C++.

Pitfall answered 6/9, 2013 at 15:10 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.