I am hitting a segmentation fault only when both using AVX and linking to other code that does
Asked Answered
H

1

5

I am using Eigen to set up a sparse linear system as follows (slightly pseudocode):

Eigen::SparseQR<Eigen::SparseMatrix<real_t>, Eigen::COLAMDOrdering<int>> solver;
Eigen::SparseMatrix<real_t> P(rows, cols);
P.setFromTriplets(triplet_list.begin(), triplet_list.end());
P.makeCompressed();
solver.compute(P);

This code is within a small library. I am compiling with -mavx -mfma -O2. If I build a simple executable using this library, everything runs fine. If I instead link into another library (in which the C++ sources are built with the same compiler flags, but which also includes CUDA), I get a segmentation fault in Eigen::SparseQR<Eigen::SparseMatrix<real_t>, Eigen::COLAMDOrdering<int>>::factorize. If I compile with -O0 the segmentation fault disappears.

I have not been able to isolate this into a minimum working example; I would appreciate suggestions on how I could describe the problem better or ideas as to what might be going wrong. While vectorization is not critical for this solve, I do need it elsewhere in the library so simply removing the AVX flags is not a good option.


EDIT: adding some context as requested.

If I compile with -g and run in gdb, the exact crash line is line 98 in Core/util/Memory.h

   │95      /** \internal Frees memory allocated with handmade_aligned_malloc */                                                                                                                                                                                                         │
   │96      inline void handmade_aligned_free(void *ptr)                                                                                                                                                                                                                                 │
   │97      {                                                                                                                                                                                                                                                                            │
  >│98        if (ptr) std::free(*(reinterpret_cast<void**>(ptr) - 1));                                                                                                                                                                                                                  │
   │99      } 

with stack trace

#0  0x00007ffff12e94dc in free () from /lib64/libc.so.6
#1  0x00007fffe3dadb1f in Eigen::internal::handmade_aligned_free (ptr=<optimized out>) at include/eigen3/Eigen/src/Core/util/Memory.h:98
#2  Eigen::internal::aligned_free (ptr=<optimized out>) at include/eigen3/Eigen/src/Core/util/Memory.h:179
#3  Eigen::aligned_allocator<float>::deallocate (this=<optimized out>, p=<optimized out>) at include/eigen3/Eigen/src/Core/util/Memory.h:763
#4  std::allocator_traits<Eigen::aligned_allocator<float> >::deallocate (__a=..., __n=<optimized out>, __p=<optimized out>) at include/c++/7.3.0/bits/alloc_traits.h:328
#5  std::_Vector_base<float, Eigen::aligned_allocator<float> >::_M_deallocate (this=<optimized out>, __n=<optimized out>, __p=<optimized out>) at include/c++/7.3.0/bits/stl_vector.h:180
#6  std::vector<float, Eigen::aligned_allocator<float> >::_M_default_append (this=0x7fffe3fefc20 <lse_helper_t::singleton()::helper>, __n=<optimized out>) at include/c++/7.3.0/bits/vector.tcc:592
#7  0x00007fffe3dae688 in std::vector<float, Eigen::aligned_allocator<float> >::resize (__new_size=10, this=0x7fffe3fefc20 <lse_helper_t::singleton()::helper>) at include/c++/7.3.0/bits/stl_vector.h:692

If I run with valgrind, I see errors of the form below. However, the program no longer crashes (the same code run outside of valgrind does still segfault).

==16218== Invalid read of size 8
==16218==    at 0x19049B16: handmade_aligned_free (Memory.h:98)
==16218==    by 0x19049B16: aligned_free (Memory.h:179)
==16218==    by 0x19049B16: deallocate (Memory.h:763)
==16218==    by 0x19049B16: deallocate (alloc_traits.h:328)
==16218==    by 0x19049B16: _M_deallocate (stl_vector.h:180)
==16218==    by 0x19049B16: std::vector<float, Eigen::aligned_allocator<float> >::_M_default_append(unsigned long) (vector.tcc:592)
==16218==    by 0x1904A687: resize (stl_vector.h:692)
==16218==  Address 0x3e195558 is 8 bytes before a block of size 8 alloc'd
==16218==    at 0x4C29BE3: malloc (vg_replace_malloc.c:299)
==16218==    by 0x123B7326: Eigen::internal::aligned_malloc(unsigned long) (in /gdn/centos7/0001/x3/prefixes/desmond-dependencies/2.14c7__dc4688ce01c7/lib/libminimax.so)
==16218==    by 0x19049B73: allocate (Memory.h:758)
==16218==    by 0x19049B73: allocate (alloc_traits.h:301)
==16218==    by 0x19049B73: _M_allocate (stl_vector.h:172)
==16218==    by 0x19049B73: std::vector<float, Eigen::aligned_allocator<float> >::_M_default_append(unsigned long) (vector.tcc:571)
==16218==    by 0x1904A687: resize (stl_vector.h:692)
==16218== Invalid free() / delete / delete[] / realloc()
==16218==    at 0x4C2ACDD: free (vg_replace_malloc.c:530)
==16218==    by 0x19049B1E: handmade_aligned_free (Memory.h:98)
==16218==    by 0x19049B1E: aligned_free (Memory.h:179)
==16218==    by 0x19049B1E: deallocate (Memory.h:763)
==16218==    by 0x19049B1E: deallocate (alloc_traits.h:328)
==16218==    by 0x19049B1E: _M_deallocate (stl_vector.h:180)
==16218==    by 0x19049B1E: std::vector<float, Eigen::aligned_allocator<float> >::_M_default_append(unsigned long) (vector.tcc:592)
==16218==    by 0x1904A687: resize (stl_vector.h:692)
==16218== Invalid read of size 8
==16218==    at 0x1905327B: handmade_aligned_free (Memory.h:98)
==16218==    by 0x1905327B: aligned_free (Memory.h:179)
==16218==    by 0x1905327B: conditional_aligned_free<true> (Memory.h:230)
==16218==    by 0x1905327B: conditional_aligned_delete_auto<double, true> (Memory.h:416)
==16218==    by 0x1905327B: ~DenseStorage (DenseStorage.h:542)
==16218==    by 0x1905327B: ~PlainObjectBase (PlainObjectBase.h:98)
==16218==    by 0x1905327B: ~Matrix (Matrix.h:178)
==16218==    by 0x1905327B: Eigen::SparseQR<Eigen::SparseMatrix<double, 0, int>, Eigen::COLAMDOrdering<int> >::factorize(Eigen::SparseMatrix<double, 0, int> const&) (SparseQR.h:360)
==16218==    by 0x19047A28: compute (SparseQR.h:118)

I am attempting to turn this into a minimal reproducible example.

Hodge answered 8/8, 2019 at 23:15 Comment(4)
Compile with -g and run it into a debugger to isolate the precise line triggering the segfault. Running it into a memory debugger like valgrind might also help to better identify the root of the problem.Shondrashone
Read the instructions on how to provide a minimal reproducible example. One possibility is to keep removing source code until the error disappears, then continue removing other source. If you think vectorization is the issue, you can disable this by compiling with -D EIGEN_DONT_VECTORIZE.Outguard
This looks like some parts are compiled with AVX, other parts without. If you want to keep it that way, you can add -DEIGEN_MAX_ALIGN_BYTES=16 or -DEIGEN_MAX_ALIGN_BYTES=32 to every compilation unit (16 is the default without AVX, 32 the default with AVX).Outguard
@Outguard you were quite correct; this was simply a different library I was linking that I hadn't noticed was being compiled without -mavx. That does make this into a less interesting problem. If you'd like to post your comment as an answer I'll accept it - thank you.Hodge
O
7

The described problem usually occurs if compilation units with different memory-alignment options are linked together. By default Eigen aligns memory to 16 bytes, unless AVX is enabled, in which case memory is aligned to 32 bytes (or 64 bytes for AVX512 -- I think).

Ideally, you should compile all compilation units with the same target architecture -- if you only plan to run on your local machine best use -march=native (this also enables tuning for the local architecture).

If you need to have some parts compiled with AVX enabled and others without, you can manually override the memory-alignment of Eigen using -DEIGEN_MAX_ALIGN_BYTES=16 or -DEIGEN_MAX_ALIGN_BYTES=32 (for consistency, either one should be added to all compilation units, even though some would be redundant).

Outguard answered 12/8, 2019 at 17:19 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.