MinGW64 Is Incapable of 32 Byte Stack Alignment (Required for AVX on Windows x64), Easy Work Around or Switch Compilers?

Doesn't MinGW-w64 have a 32-bit compilation option? – Mesomorphic 19/6, 2015 at 1:2

An extra 32 bytes and some simple pointer math isn't costly when compared to anything you would need 256-bit AVX instructions for. – Bouilli 19/6, 2015 at 1:4

@VermillionAzure 64-bit is pretty important to my application – Alonzoaloof 19/6, 2015 at 1:5

@RossRidge: That's not really relevant to this question. The underlying problem is that it's not safe to use AVX instructions in mingw-w64, because it apparently can't align the stack to 32 bytes because it isn't supported by the Windows x64 ABI. Therefore, if you use a __m256 type and the compiler has to spill it onto the stack, you can end up with segmentation faults because it will try to use aligned instructions to move it to/from the stack. It seems like one fix on the compiler side would be to use unaligned moves in this case, but I don't know how feasible that change would be. – Riddle 19/6, 2015 at 1:37

@JasonR You've apparently completely misunderstood what I wrote. – Bouilli 19/6, 2015 at 1:54

@JasonR Even if you wrapped __m256 to get correct alignment with hacky code, the AVX intrinsics still return __m256, which means if you're doing code that requires the use of temporaries, theres always a chance the __m256 temporary would spill out of registers, onto the stack, and then the seg fault will happen, right? So this isn't even a real solution – Alonzoaloof 19/6, 2015 at 2:6

@RossRidge: I'm not sure what you meant, then. It sounds like you're advocating for some kind of manual implementation of the required alignment. Such hacks aren't really feasible in this case (as there are numerous manipulations of __m256 instances, like temporaries, that only the compiler has control over), but if I'm misunderstanding your recommendation, perhaps you could clarify it. – Riddle 19/6, 2015 at 2:13

@Ragdoll: Exactly; there's no good solution to this problem achievable by just working around the issue in your source code. You would need some level of support at the compiler level to make this feasible. One potential solution would be for the compiler to emit unaligned move instructions when moving to/from the stack. That's essentially what the Python script that you linked does. Unfortunately, contemporary processors have a performance penalty for unaligned 256-bit moves (although 128-bit unaligned moves have been full-speed since the Nehalem architecture). – Riddle 19/6, 2015 at 2:16

I assume this is undesirable for other reasons, but one obvious workaround would be to pass the affected variables by reference rather than by value. – Thanasi 19/6, 2015 at 2:32

@JasonR I was only pointing out an inconsistency in the question. If he has a real use for AVX instructions then costs he was bitterly complaining about are insignificant. Indeed, pretty much those same costs will be paid by having the compiler align the stack automatically. – Bouilli 19/6, 2015 at 2:32

@RossRidge: Agreed. If there was a robust way to implement the manual alignment then it would certainly be a viable solution for any proper application of SIMD instructions. – Riddle 19/6, 2015 at 2:34

Both the Microsoft and Intel compilers manually align the stack at the start of each function call that uses AVX. Why GCC doesn't do this might be related to exception handling. – Bile 19/6, 2015 at 14:0

@Bile Any idea where clang stands on the issue? – Alonzoaloof 19/6, 2015 at 19:0

@JasonR, When you say

That's not really relevant to this question. The underlying problem is that it's not safe to use AVX instructions in mingw-w64, because it apparently can't align the stack to 32 bytes because it isn't supported by the Windows x64 ABI.

do you mean AVX isn't available with Windows? As it does? Also see Ross' answer -

Despite what Kai Tietz said in the bug report you linked, Microsoft's x64 ABI does allow a compiler to give variables a greater than 16-byte alignment on the stack.

– Varistor 8/4, 2018 at 13:0

@Varistor Coming from the same GCC bug, MSVC and ICC for Windows don't actually align the stack itself. Instead, they clobber an extra register that points to an aligned portion on the stack. (r13 in the case of ICC.) All local variables (as well as spilled ymm/zmm values) that require >16-byte alignment are then placed in this section. This also has nothing to do with MSVC and ICC using unaligned load/stores. They do that for a completely different reason (they unconditionally use unaligned access for everything). – Bile 9/4, 2018 at 22:17

@Mysticial, I really think your comment should go here - gcc.gnu.org/bugzilla/show_bug.cgi?id=54412 and here - github.com/Alexpux/MSYS2-packages/issues/1209 (Maybe also here - sourceforge.net/p/mingw-w64/mailman/message/34485783). People trying to fix it could use your knowledge (I'm not an expert in those area). Thank You. – Varistor 10/4, 2018 at 4:13

Recommended topics

Hot tags