I am using Eigen to set up a sparse linear system as follows (slightly pseudocode):
Eigen::SparseQR<Eigen::SparseMatrix<real_t>, Eigen::COLAMDOrdering<int>> solver;
Eigen::SparseMatrix<real_t> P(rows, cols);
P.setFromTriplets(triplet_list.begin(), triplet_list.end());
P.makeCompressed();
solver.compute(P);
This code is within a small library. I am compiling with -mavx -mfma -O2
. If I build a simple executable using this library, everything runs fine. If I instead link into another library (in which the C++ sources are built with the same compiler flags, but which also includes CUDA), I get a segmentation fault in Eigen::SparseQR<Eigen::SparseMatrix<real_t>, Eigen::COLAMDOrdering<int>>::factorize
. If I compile with -O0
the segmentation fault disappears.
I have not been able to isolate this into a minimum working example; I would appreciate suggestions on how I could describe the problem better or ideas as to what might be going wrong. While vectorization is not critical for this solve, I do need it elsewhere in the library so simply removing the AVX flags is not a good option.
EDIT: adding some context as requested.
If I compile with -g and run in gdb, the exact crash line is line 98 in Core/util/Memory.h
│95 /** \internal Frees memory allocated with handmade_aligned_malloc */ │
│96 inline void handmade_aligned_free(void *ptr) │
│97 { │
>│98 if (ptr) std::free(*(reinterpret_cast<void**>(ptr) - 1)); │
│99 }
with stack trace
#0 0x00007ffff12e94dc in free () from /lib64/libc.so.6
#1 0x00007fffe3dadb1f in Eigen::internal::handmade_aligned_free (ptr=<optimized out>) at include/eigen3/Eigen/src/Core/util/Memory.h:98
#2 Eigen::internal::aligned_free (ptr=<optimized out>) at include/eigen3/Eigen/src/Core/util/Memory.h:179
#3 Eigen::aligned_allocator<float>::deallocate (this=<optimized out>, p=<optimized out>) at include/eigen3/Eigen/src/Core/util/Memory.h:763
#4 std::allocator_traits<Eigen::aligned_allocator<float> >::deallocate (__a=..., __n=<optimized out>, __p=<optimized out>) at include/c++/7.3.0/bits/alloc_traits.h:328
#5 std::_Vector_base<float, Eigen::aligned_allocator<float> >::_M_deallocate (this=<optimized out>, __n=<optimized out>, __p=<optimized out>) at include/c++/7.3.0/bits/stl_vector.h:180
#6 std::vector<float, Eigen::aligned_allocator<float> >::_M_default_append (this=0x7fffe3fefc20 <lse_helper_t::singleton()::helper>, __n=<optimized out>) at include/c++/7.3.0/bits/vector.tcc:592
#7 0x00007fffe3dae688 in std::vector<float, Eigen::aligned_allocator<float> >::resize (__new_size=10, this=0x7fffe3fefc20 <lse_helper_t::singleton()::helper>) at include/c++/7.3.0/bits/stl_vector.h:692
If I run with valgrind, I see errors of the form below. However, the program no longer crashes (the same code run outside of valgrind does still segfault).
==16218== Invalid read of size 8
==16218== at 0x19049B16: handmade_aligned_free (Memory.h:98)
==16218== by 0x19049B16: aligned_free (Memory.h:179)
==16218== by 0x19049B16: deallocate (Memory.h:763)
==16218== by 0x19049B16: deallocate (alloc_traits.h:328)
==16218== by 0x19049B16: _M_deallocate (stl_vector.h:180)
==16218== by 0x19049B16: std::vector<float, Eigen::aligned_allocator<float> >::_M_default_append(unsigned long) (vector.tcc:592)
==16218== by 0x1904A687: resize (stl_vector.h:692)
==16218== Address 0x3e195558 is 8 bytes before a block of size 8 alloc'd
==16218== at 0x4C29BE3: malloc (vg_replace_malloc.c:299)
==16218== by 0x123B7326: Eigen::internal::aligned_malloc(unsigned long) (in /gdn/centos7/0001/x3/prefixes/desmond-dependencies/2.14c7__dc4688ce01c7/lib/libminimax.so)
==16218== by 0x19049B73: allocate (Memory.h:758)
==16218== by 0x19049B73: allocate (alloc_traits.h:301)
==16218== by 0x19049B73: _M_allocate (stl_vector.h:172)
==16218== by 0x19049B73: std::vector<float, Eigen::aligned_allocator<float> >::_M_default_append(unsigned long) (vector.tcc:571)
==16218== by 0x1904A687: resize (stl_vector.h:692)
==16218== Invalid free() / delete / delete[] / realloc()
==16218== at 0x4C2ACDD: free (vg_replace_malloc.c:530)
==16218== by 0x19049B1E: handmade_aligned_free (Memory.h:98)
==16218== by 0x19049B1E: aligned_free (Memory.h:179)
==16218== by 0x19049B1E: deallocate (Memory.h:763)
==16218== by 0x19049B1E: deallocate (alloc_traits.h:328)
==16218== by 0x19049B1E: _M_deallocate (stl_vector.h:180)
==16218== by 0x19049B1E: std::vector<float, Eigen::aligned_allocator<float> >::_M_default_append(unsigned long) (vector.tcc:592)
==16218== by 0x1904A687: resize (stl_vector.h:692)
==16218== Invalid read of size 8
==16218== at 0x1905327B: handmade_aligned_free (Memory.h:98)
==16218== by 0x1905327B: aligned_free (Memory.h:179)
==16218== by 0x1905327B: conditional_aligned_free<true> (Memory.h:230)
==16218== by 0x1905327B: conditional_aligned_delete_auto<double, true> (Memory.h:416)
==16218== by 0x1905327B: ~DenseStorage (DenseStorage.h:542)
==16218== by 0x1905327B: ~PlainObjectBase (PlainObjectBase.h:98)
==16218== by 0x1905327B: ~Matrix (Matrix.h:178)
==16218== by 0x1905327B: Eigen::SparseQR<Eigen::SparseMatrix<double, 0, int>, Eigen::COLAMDOrdering<int> >::factorize(Eigen::SparseMatrix<double, 0, int> const&) (SparseQR.h:360)
==16218== by 0x19047A28: compute (SparseQR.h:118)
I am attempting to turn this into a minimal reproducible example.
-g
and run it into a debugger to isolate the precise line triggering the segfault. Running it into a memory debugger like valgrind might also help to better identify the root of the problem. – Shondrashone-D EIGEN_DONT_VECTORIZE
. – Outguard-DEIGEN_MAX_ALIGN_BYTES=16
or-DEIGEN_MAX_ALIGN_BYTES=32
to every compilation unit (16 is the default without AVX, 32 the default with AVX). – Outguard