Why is Eigen's Cholesky decomposition much faster on Linux than on Windows?

Asked 7/5, 2014 at 16:45 Answered 7/5, 2014 at 19:33

I've noticed a significant performance difference regarding Cholesky decomposition using the Eigen library.

I'm using the latest version of Eigen (3.2.1) with the following benchmark code:

#include <iostream>
#include <chrono>
#include <Eigen/Core>
#include <Eigen/Cholesky>
using namespace std;
using namespace std::chrono;
using namespace Eigen;

int main()
{
    const MatrixXd::Index size = 4200;
    MatrixXd m = MatrixXd::Random(size, size);
    m = (m + m.transpose()) / 2.0 + 10000 * MatrixXd::Identity(size, size);

    LLT<MatrixXd> llt;
    auto start = high_resolution_clock::now();
    llt.compute(m);
    if (llt.info() != Success)
        cout << "Cholesky decomposition failed!" << endl;
    auto stop = high_resolution_clock::now();

    cout << "Cholesky decomposition in "
         << duration_cast<milliseconds>(stop - start).count()
         << " ms." << endl;

    return 0;
}

I compile this benchmark with g++ -std=c++11 -Wall -O3 -o bench bench.cc and run it on Windows the first time (using MinGW, [edit: GCC 4.8.1]) and on Linux (edit: GCC 4.8.1) the second time, but both times on the same machine.

On Windows, it gives me:

Cholesky decomposition in 10114 ms.

But on Linux I get:

Cholesky decomposition in 3258 ms.

That is less than a third of the time needed on Windows.

Is there something available on Linux systems that Eigen uses to achieve this speed-up?
And if so, how may I accomplish the same on Windows?

Burner answered 7/5, 2014 at 16:45 Comment(4)

Just a thought but I would look into the precision of the clock, it may not be the same on both systems? – Lowercase 7/5, 2014 at 19:41

@jcoder: The difference is 7 seconds. The clock doesn't even need to be very accurate. – Burner 7/5, 2014 at 19:45

Instead of using Eigen you could use my code. On my system it took 1.36 s to decompose a 5790x5790 matrix. If Eigen's Cholesky decomposition supports OpenMP (I know it does for matrix multiplication) you should enable OpenMP and see what the performance is. It's difficult to parallelize Cholesky decomposition but that's what I did (both for MIMD and SIMD). – Allergic 8/5, 2014 at 6:54

Right, sorry I misread the timing, ignore my comment please. – Lowercase 8/5, 2014 at 11:49

Make sure you are using a 64 bit system. If that's not the case then don't forget to enable SSE2 instructions (-msse2), but the performance still won't be as performant as with 64 bits system because fewer SSE registers are available.

Southernly answered 7/5, 2014 at 19:33 Comment(1)

That's it. I completely forgot, that my MinGW compiler produces 32-bit code, while the g++ on linux generates native 64-bit executables. Using the compiler of the MinGW-64 project to compile 64-bit Windows binaries, I get the same timing results as on Linux. Your hint on SSE is true too, enabling it for the 32-bit binary boosts the speed of the execution to 6,8 seconds (compared to 10,1 s) anyway. – Burner 7/5, 2014 at 21:25

See Eigen's main page here.

Quote

Eigen is being successfully used with the following compilers:

GCC, version 4.1 and newer. Very good performance with GCC 4.2 and newer.

MSVC (Visual Studio), 2008 and newer (the old 2.x versions of Eigen support MSVC 2005, but without vectorization).

Intel C++ compiler. Very good performance.

LLVM/CLang++ (2.8 and newer).

MinGW, recent versions. Very good performance when using GCC 4.

QNX's QCC compiler.

Argument

You may have a more recent version of gcc (>=4.2) than your version of MinGW uses...

Note

Just as a side-note, you may even have a MinGW version that is not "recent", as the link also says:

Eigen is standard C++98 and so should theoretically be compatible with any compliant compiler. Whenever we use some non-standard feature, that is optional and can be disabled.

So maybe your version of gcc uses a new optimizing feature, that MinGW doesn't posses, and falls back to another, slower, alternative.

Of course, in the end, it could be a completely different thing, this is an experimental guess based on theory...

Foliated answered 7/5, 2014 at 17:7 Comment(2)

That's a good thought in general, thank you! But after checking the version of GCC on both systems it turns out that both use the identical version of GCC, which is 4.8.1. – Burner 7/5, 2014 at 17:25

@Burner Well, sorry for my incorrect guess, yet I think my answer could open up your thinking in similar situations. ;) – Foliated 7/5, 2014 at 17:48

Quote

Argument

Note

Recommended topics

Hot tags