Is armadillo solve() thread safe?
Asked Answered
D

2

86

In my code I have loop in which I construct and over determined linear system and try to solve it:

#pragma omp parallel for
for (int i = 0; i < n[0]+1; i++) {
    for (int j = 0; j < n[1]+1; j++) {
        for (int k = 0; k < n[2]+1; k++) {
            arma::mat A(max_points, 2);
            arma::mat y(max_points, 1);
            // initialize A and y

            arma::vec solution = solve(A,y);
        }
    }
}

Sometimes, quite randomly the program hangs or the results in the solution vector are NaN. And if I put do this:

arma::vec solution;
#pragma omp critical 
{
    solution = solve(weights*A,weights*y);
}

then these problem don't seem to happen anymore.

When it hangs, it does so because some threads are waiting at the OpenMP barrier:

Thread 2 (Thread 0x7fe4325a5700 (LWP 39839)):
#0  0x00007fe44d3c2084 in gomp_team_barrier_wait_end () from /usr/lib64/gcc-4.9.2/lib64/gcc/x86_64-redhat-linux-gnu/4.9.2/libgomp.so.1
#1  0x00007fe44d3bf8c2 in gomp_thread_start () at ../.././libgomp/team.c:118
#2  0x0000003f64607851 in start_thread () from /lib64/libpthread.so.0
#3  0x0000003f642e890d in clone () from /lib64/libc.so.6

And the other threads are stuck inside Armadillo:

Thread 1 (Thread 0x7fe44afe2e60 (LWP 39800)):
#0  0x0000003ee541f748 in dscal_ () from /usr/lib64/libblas.so.3
#1  0x00007fe44c0d3666 in dlarfp_ () from /usr/lib64/atlas/liblapack.so.3
#2  0x00007fe44c058736 in dgelq2_ () from /usr/lib64/atlas/liblapack.so.3
#3  0x00007fe44c058ad9 in dgelqf_ () from /usr/lib64/atlas/liblapack.so.3
#4  0x00007fe44c059a32 in dgels_ () from /usr/lib64/atlas/liblapack.so.3
#5  0x00007fe44f09fb3d in bool arma::auxlib::solve_ud<double, arma::Glue<arma::Mat<double>, arma::Mat<double>, arma::glue_times> >(arma::Mat<double>&, arma::Mat<double>&, arma::Base<double, arma::Glue<arma::Mat<double>, arma::Mat<double>, arma::glue_times> > const&) () at /usr/include/armadillo_bits/lapack_wrapper.hpp:677
#6  0x00007fe44f0a0f87 in arma::Col<double>::Col<arma::Glue<arma::Glue<arma::Mat<double>, arma::Mat<double>, arma::glue_times>, arma::Glue<arma::Mat<double>, arma::Mat<double>, arma::glue_times>, arma::glue_solve> >(arma::Base<double, arma::Glue<arma::Glue<arma::Mat<double>, arma::Mat<double>, arma::glue_times>, arma::Glue<arma::Mat<double>, arma::Mat<double>, arma::glue_times>, arma::glue_solve> > const&) ()
at /usr/include/armadillo_bits/glue_solve_meat.hpp:39

As you can see from the stacktrace my version of Armadillo uses atlas. And according to this documentation atlas seems to be thread safe: ftp://lsec.cc.ac.cn/netlib/atlas/faq.html#tsafe

Update 9/11/2015

I finally got some time to run more tests, based on the suggestions of Vladimir F.

When I compile armadillo with ATLAS's BLAS, I'm still able to reproduce then hangs and the NaNs. When it hangs, the only thing that changes in the stacktrace is the call to BLAS:

#0  0x0000003fa8054718 in ATL_dscal_xp1yp0aXbX@plt () from /usr/lib64/atlas/libatlas.so.3
#1  0x0000003fb05e7666 in dlarfp_ () from /usr/lib64/atlas/liblapack.so.3
#2  0x0000003fb0576a61 in dgeqr2_ () from /usr/lib64/atlas/liblapack.so.3
#3  0x0000003fb0576e06 in dgeqrf_ () from /usr/lib64/atlas/liblapack.so.3
#4  0x0000003fb056d7d1 in dgels_ () from /usr/lib64/atlas/liblapack.so.3
#5  0x00007ff8f3de4c34 in void arma::lapack::gels<double>(char*, int*, int*, int*, double*, int*, double*, int*, double*, int*, int*) () at /usr/include/armadillo_bits/lapack_wrapper.hpp:677
#6  0x00007ff8f3de1787 in bool arma::auxlib::solve_od<double, arma::Glue<arma::Mat<double>, arma::Mat<double>, arma::glue_times> >(arma::Mat<double>&, arma::Mat<double>&, arma::Base<double, arma::Glue<arma::Mat<double>, arma::Mat<double>, arma::glue_times> > const&) () at /usr/include/armadillo_bits/auxlib_meat.hpp:3434

Compiling without ATLAS, only with netlib BLAS and LAPACK, I was able to reproduce the NaNs but not the hangs.

In both cases, surrounding solve() with #pragma omp critical I have no problems at all

Demisemiquaver answered 5/5, 2015 at 21:5 Comment(11)
Is /usr/lib64/libblas.so.3 part of atlas? Why it isn't located in /usr/lib64/atlas?Microcrystalline
No, in opensuse it's part of package liblas3 and in redhat it's part of package blas.Demisemiquaver
Then you cannot use any guarantees of ATLAS when you use the default BLAS.Microcrystalline
This is very strange indeed, Good Sir, I think I have to find out why ATLAS LAPACK is using an external BLAS first. Thanks for pointing that out.Demisemiquaver
Have you solved this? If not, which packages are installed, and could you please post the command you used to compile the program?Jarad
You can also try using OpenBLAS instead of Atlas.Piranha
No, it can be a really risky procedure.Jolly
Why would it be risky? It's simply a matter of linking with OpenBLAS instead of Atlas.Piranha
so what about install debug-info and seeing exactly where they are stack with gdb ? command is : sudo debuginfo-install blas .Irreproachable
You don't quite provide a MCVE, but I tried to reproduce your problem and could not (mine runs with no problems).Beneficent
You can try to use firefox rr (rr-project.org) in order to obtain a valid stack trace when things go wrong. I don't really know if it will work on your problem because they claim this: "emulates a single-core machine. So, parallel programs incur the slowdown of running on a single core. This is an inherent feature of the design.". It is worth a shot though.Bespeak
A
2

Are you sure your systems are over determined? solve_ud in your stack trace says otherwise. Though you have solve_od too, and probably that's nothing to do with the issue. But it doesn't hurt to find why that's happening and fix it if you think the systems should be od.

Is armadillo solve() thread safe?

That I think depends on your lapack version, also see this. Looking at the code of solve_od all the variables accessed seem to be local. Note the warning in the code:

NOTE: the dgels() function in the lapack library supplied by ATLAS 3.6 seems to have problems

Thus it seems only lapack::gels can cause trouble for you. If fixing lapack is not possible, a workaround is to stack your systems and solve a single large system. That probably would be even more efficient if your individual systems are small.

Alisaalisan answered 28/10, 2015 at 21:33 Comment(0)
T
1

The thread safety of Armadillo's solve() function depends (only) on the BLAS library that you use. The LAPACK implementations are thread safe when BLAS is. The Armadillo solve() function is not thread safe when linking to the reference BLAS library. However, it is thread safe when using OpenBLAS. Additionally, ATLAS provides a BLAS implementation that also mentions it is thread safe, and the Intel MKL is thread safe as well, but I have no experience with Armadillo linked to those libraries.

Of course, this only applies when you run solve() from multiple threads with different data.

Trowbridge answered 12/1, 2018 at 10:10 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.