What is causing this memory access violation error (0xC0000005) when using Eigen with "-march=native"?
Asked Answered
I

1

5

I am rewriting some c++ code (originally written in Matlab as a MEX function) in codeblocks so that I can use debugging and profiling tools designed for c++. The code I am rewriting uses Eigen and SIMD intrinsic instructions, so I need to compile with the -march=native flag. I was getting a memory access violation error when running my main project. Here is a slimmed down version of the code that causes the issue:

#include <iostream>
#include <fstream>
#include <string>
#include <sys/stat.h>
#include <immintrin.h>
#include <Eigen/Dense>
#include "Parameters.h"

using namespace std;

int main()
{
    
    Parameters p;
    p.na = 16;
    p.TXangle = Eigen::VectorXd::LinSpaced(p.na,0,p.na-1);

    cout << p.TXangle << endl;

    cout << "Hello world!" << endl;
    return 0;
}

where Parameters is a custom class defined with the following two files:

Parameters.h

#ifndef PARAMETERS_H_INCLUDED
#define PARAMETERS_H_INCLUDED

class Parameters
{
    public:
        int na;
        Eigen::VectorXd TXangle;

        Parameters();

};

#endif // PARAMETERS_H_INCLUDED

Parameters.cpp

#include <string>
#include <Eigen/Dense>
#include "Parameters.h"

Parameters::Parameters()
{
    //ctor
}

The line that's breaking is when p.TXangle is initialized. At that point, the program throws the (0xC0000005) error. If I don't compile with '-march=native' then the error doesn't happen and the program runs fine. When building with '-march=native' I also get several alignment warnings. My computer supports upto AVX2 instructions and I'm compiling with MinGW GCC (not sure how to check the version of gcc on codeblocks).

gcc version is 8.1.0

Update: Is this the value @Sedenion was asking about in the comments?

enter image description here

This is the exact line the debugger stops at: enter image description here

**Update: ** Based on the discussion in the comments, the disassembler shows that the code is at this assembly instruction when it fails:

enter image description here

I'm struggling to interpret this, reading assembly is still a bit new to me. Here are the registers at that same point:

enter image description here

Inculpable answered 13/4, 2022 at 15:23 Comment(25)
I don't have Eigen on a Windows machine available to try and compile this so let me ask for clarification, if you run only this and not the larger example you showed before the edit, this also causes it to crash?Farro
Yes, what is currently shown and what was shown before the edit both cause the crash. I simply deleted portions of the code that were commented out or weren't used in the main function.Inculpable
I can't reproduce this. godbolt.org/z/7sr1bz11cBradfield
It very likely has something to do with the GCC version (not mentioned, what is it?) and Windows. There are some warnings about MinGW GCC on Eigen's website but they only relate to fixed size types, which is not the case here.Farro
This seems to be some issue related to -march=native. I don't know how godbolt would handle that flag since it's compiled on a serverInculpable
Did you compile both main and parameters.cpp with -march=native?Posse
How can I check the version of gcc being used by codeblocks?Inculpable
@Posse yes, they are in the same project and I've set the build options when building the project to have the -march=native flagInculpable
@Inculpable A quick Google search leads to some suggestions, e.g. superuser.com/questions/206157/… -- note I haven't tried so I have no clue if this works for whatever version you have but I'm sure it can't be too hard.Farro
Hmm I saw that link and it wasn't working when I tried that last night... But it looks like my gcc version is 8.1.0.Inculpable
"I also get several alignment warnings" Those sound like a pretty serious problem if they happen when enabling AVX2 support, as x86 SIMD (MMX, SSE, AVX) instructions do not silently correct misalignment, they simply crash.Avaavadavat
This looks relevant: eigen.tuxfamily.org/dox/group__TopicWrongStackAlignment.html Although it applies to a very different version of gcc, it might still be worth trying those compiler options.Avaavadavat
@BenVoigt This is probably not relevant for dynamically allocated arrays though.Farro
@Inculpable Using a debugger, try to step into the problematic Eigen call. I.e. step into the source code of Eigen, and see which functions get called. Especially the malloc-functions in Memory.h are of interest. The key to watch out for is if the allocated memory is aligned to 32 in case AVX is enabled (which should be the case here). Also check the value of the EIGEN_IDEAL_MAX_ALIGN_BYTES macro in ConfigureVectorization.h: Is it 32?Sherfield
Oh, and what Eigen version are you using? Have you tried the latest one? Also, try to enable C++17 (-std=c++17) since it comes with some improvements regarding alignment features that Eigen uses.Sherfield
Since he uses gcc 8.1 might be worth checking if that version even supports those features.Farro
I'm using Eigen 3.4.0 with c++17Inculpable
Can't reproduce on MSYS2 MinGW.Doty
@Qubit: If Windows GCC is still broken about spilling/reloading 32-byte vectors to the stack without setting up 32-byte alignment, that could be a real problem. It's not stack arrays that are the problem (alignas(32) does work on them), but GCC mis-compiling code that uses __m256i temporaries at all if it ever spills/reloads any (which will use vmovaps or vmovdqa; alignment required). Windows clang doesn't have this bug. But it's also possible that misalignment of data arrays is a problem though, if Eigen uses aligned stores.Physician
@drakon101: Use a debugger to check what asm instruction faults. Most debuggers have a "disassembly" view that ignores higher-level source code even if it's available. That'll tell you whether it's a stack access or not.Physician
@PeterCordes are you saying that if I see a vmovaps (vmovapd) or vmovdqa command on the line where the program fails, then this bug in GCC you mention is what's tripping me up?Inculpable
Or better yet, are you essentially referring to the issue discussed here and here ?Inculpable
Yes, exactly, those Q&As are about that bug. If it's a vmovaps/d or vmovdqa with a memory operand using [rsp +- something] then it's this bug. (Or [rbp +- something] in a function that set up RBP as a frame pointer and is using it for local vars.) If you manually used _mm256_store_ps on an array without doing alignas(32) on it, you could in theory get that code, but on non-Windows GCC notices when you do that and over-aligns the array, IIRC.Physician
It seems like the AVX instructions that are faulting are from inside Eigen. If I didn't want to go mucking around with Eigen's source code, is my best bet to switch compilers, i.e. something like CLang on windows? (assuming I wanted to maintain AVX support and stick with windows)Inculpable
Well yes, I expect it uses intrinsics. The question is whether it's on your data, with your source code failing to align data as much as Eigen expected, or whether it's stack space which should have gotten automatically aligned by the compiler. But yes, the best bet is to use clang; unlike GCC, it's not broken on Windows.Physician
S
9

I managed to reproduce this problem using Eigen 3.4.0 and mingw (gcc 8.1.0 with -mavx -m64 -std=c++17 -g) on Windows using AVX (-mavx, also enabled by -march=native for the OP). As already suspected by people in the comments, it is certainly the issue that mingw-gcc fails to align stack variables correctly to 32 bytes, which is required by AVX (compare the bug issue for gcc, also see e.g. this post).

The crash is not related to the use of VectorXd (which should not suffer from it since it uses dynamic memory allocations). Rather, the Eigen::VectorXd::LinSpaced() call is the issue. In Eigen, this eventually calls the following function involving AVX instructions in PacketMath.h:

template<> EIGEN_STRONG_INLINE Packet4d plset<Packet4d>(const double& a) {
 return _mm256_add_pd(_mm256_set1_pd(a), _mm256_set_pd(3.0,2.0,1.0,0.0)); 
}

In this call, temporary stack variables are involved which are not 32 byte aligned by mingw. At one point, an aligned mov vmovapd is attempted to such a non-aligned address:

mov    rax,QWORD PTR [rbp+0x10]
vmovapd YMMWORD PTR [rax],ymm0

For example, in one run, I got rax=0x67f890 which is only 16 byte but not 32 byte aligned. A minimal reproducible example that captures the behavior is the following (https://godbolt.org/z/qbE6z1nb8, note that mingw is not supported on godbolt, so the problem does not appear there):

#include <iostream>
#include <immintrin.h>

__m256d Set(const double&) {
   __m256d temp = _mm256_setzero_pd(); // Crashes on mingw
   return temp;
}

int main() {
    Set(2);
    std::cerr << "End" << std::endl;
}

The unused parameter to Set() is just there to get an offset on the stack to trigger the issue. It crashes using mingw-gcc (with -mavx -m64) but runs fine when compiled with clang or MSVC on Windows. It also runs fine on all Linux compilers I have tried.

So, in short, your code is correct and the crash occurs in Eigen. There is no "magic switch" for gcc to fix this. Hence, I guess you have 3 options:

  • Wait for the mingw problem to get fixed. According to the posts, it still persists in gcc 11.2.0. But considering the long history, I doubt it will get fixed soon.
  • Do not compile with AVX or higher (or -march=native) and stick to <=SSE4.2 instead. Of course, this might impact performance. I'd advise to profile to be sure if this is the case for you.
  • Use another compiler such as clang or MSVC.
Sherfield answered 15/4, 2022 at 14:0 Comment(3)
Thanks for verifying this. One question, why do you suggest SSE2 instead of SSE4.2?Inculpable
@Inculpable sorry, that was a mistake. You are of course right, everything up to SSE4.2 should work since it requires only 16 byte alignment. I corrected my answer.Sherfield
Note that mov rax,QWORD PTR [rbp+0x10] is loading a pointer from the stack (assuming RBP being used as a frame pointer) but that asm isn't enough context to know the pointer is also pointing to a local on the stack. Those same instructions could appear is a non-buggy program. Something like vmovapd [rsp+0x10], ymm0 would be a lot more suspicious. But if that's where it first faults, then so be it.Physician

© 2022 - 2024 — McMap. All rights reserved.