Why is Eigens mean() method so much faster than sum()?
Asked Answered
K

1

12

This is a rather theoretical question, but I'm quite interested in it and would be glad if someone has some expert knowledge on this which he or she is willing to share.

I have a matrix of floats with 2000 rows and 600 cols and want to subtract the mean of the columns from each row. I have tested the following two lines and compared their runtime:

MatrixXf centered = data.rowwise() - (data.colwise().sum() / data.cols());
MatrixXf centered = data.rowwise() - data.colwise().mean();

I thought, mean() would not do something different from dividing the sum of each column by the number of rows, but while the execution of the first line takes 12.3 seconds on my computer, the second line finishes in 0.09 seconds.

I'm using Eigen version 3.2.6, which currently is the latest version, and my matrices are stored in row-major order.

Does someone know something about the internals of Eigen which could explain this huge performance difference?


Edit: I should add that data in the code above actually is of type Eigen::Map< Eigen::MatrixXf<Eigen::Dynamic, Eigen::Dynamic, Eigen::RowMajor> > and maps Eigen's functionality to a raw buffer.


Edit 2: As suggested by GuyGreer, I'll provide some sample code to reproduce my findings:

#include <iostream>
#include <chrono>
#include <Eigen/Core>
using namespace std;
using namespace std::chrono;
using namespace Eigen;

int main(int argc, char * argv[])
{
    MatrixXf data(10000, 1000), centered;
    data.setRandom();
    auto start = high_resolution_clock::now();
    if (argc > 1)
        centered = data.rowwise() - data.colwise().mean();
    else
        centered = data.rowwise() - (data.colwise().sum() / data.rows());
    auto stop = high_resolution_clock::now();
    cout << duration_cast<milliseconds>(stop - start).count() << " ms" << endl;
    return 0;
}

Compile with:

g++ -O3 -std=c++11 -o test test.cc

Running the resulting program without arguments, so that is uses sum(), takes 126 seconds on my machine, while running test 1 using mean() only takes 0.03 seconds!


Edit 3: As it turned out (see comments), it is not sum() which takes so long, but the division of the resulting vector by the number of rows. So the new question is: Why does Eigen take more than 2 minutes to divide a vector with 1000 columns by a single scalar?

Kerplunk answered 27/10, 2015 at 19:53 Comment(7)
Are you running the two calculations one after the another in the same run? If so, it could be a caching issue, try swapping the the order around.Degraded
No, I've tested them separately.Kerplunk
I've never used Eigen and so won't be able to help you, but I think it would be helpful for people if you provided an end-to-end test that they can run that demonstrates what you are asking about. That way people can better check themselves what's going on.Ringo
For a fair comparison, I think you should replace (data.colwise().sum() / data.cols()); by a simple data.colwise().sum(). What does that yield (--I can't test it now)?Ascension
Could be an inlining decision made by GCC. I would try to remove the branch and test both versions individually. Also profiling should give you a good idea of what is going on.Couloir
@Ascension Surprisingly, that makes the sum() version execute as fast as the one using mean(). But in both cases I want to compute the mean, which I have to divide the sum by the number of rows for. mean() must somehow to something equivalent, but much faster. This transforms the question into: Why does Eigen take so long, to divide a vector by a scalar?Kerplunk
Imo it is not easy to figure out what the expression template stuff does under the hood. For example, it could lead to the division being executed inside the sum and not on the total result ... but this is just for the idea, that alone should not give such huge deviations in speed. So to answer your question, here is my guess: mean() is explicitly trimmed to perform the division outside, i.e. on the result of the sum.Ascension
Q
8

Somehow, both the partial reduction (sum) and division are recomputed every time because some crucial information about the evaluation cost of the partial reduction are wrongly lost by operator/... Explicitly evaluating the mean fixes the issue:

centered = data.rowwise() - (data.colwise().sum() / data.cols()).eval();

Of course, this evaluation should be done by Eigen for you, as fixed by the changeset 42ab43a. This fix will be part of the next 3.2.7 and 3.3 releases.

Queenqueena answered 27/10, 2015 at 21:18 Comment(4)
Perfect, thanks! And thumbs up for the reference to the changeset.Kerplunk
@Kerplunk You realize that ggael committed the changeset right around the time that he answered your question, yes? I guess we all have to thank you for finding the bug.Hypertensive
@AviGinsburg I did not realize that, actually. Thanks for pointing me to this.Kerplunk
yes, thanks to Callidor for finding this shortcoming.Queenqueena

© 2022 - 2024 — McMap. All rights reserved.