I'm trying to work with AVX instructions and windows 64bit. I'm comfortable with g++ compiler so I've been using that, however, there is a big bug described reported here and very rough solutions were presented here.
Basically, m256 variable can't be aligned on the stack to work properly with avx instructions, it needs 32 byte alignment.
The solutions presented at the other stack question I linked are really terrible, especially if you have performance in mind. A python program that you would have to run every time you want to debug that replaces instructions with their sub-optimal unaligned instructions, or over-allocating and doing a bunch of costly hacky pointer math in code to get proper alignment. If you do the pointer math solution, I think there is still even a chance for a seg fault because you can't control the allocation or r-values / temporaries.
I'm looking for an easier and cheaper solution. I don't mind switching compilers, would prefer not to, but if it's the best solution I will. However, my very poor understanding of the bug is that it is intrinsic to windows 64 bit, so would switching compilers help or do other compilers also have the same issue?
__m256
type and the compiler has to spill it onto the stack, you can end up with segmentation faults because it will try to use aligned instructions to move it to/from the stack. It seems like one fix on the compiler side would be to use unaligned moves in this case, but I don't know how feasible that change would be. – Riddle__m256
instances, like temporaries, that only the compiler has control over), but if I'm misunderstanding your recommendation, perhaps you could clarify it. – RiddleThat's not really relevant to this question. The underlying problem is that it's not safe to use AVX instructions in mingw-w64, because it apparently can't align the stack to 32 bytes because it isn't supported by the Windows x64 ABI.
do you mean AVX isn't available with Windows? As it does? Also see Ross' answer -Despite what Kai Tietz said in the bug report you linked, Microsoft's x64 ABI does allow a compiler to give variables a greater than 16-byte alignment on the stack.
– Varistorr13
in the case of ICC.) All local variables (as well as spilled ymm/zmm values) that require >16-byte alignment are then placed in this section. This also has nothing to do with MSVC and ICC using unaligned load/stores. They do that for a completely different reason (they unconditionally use unaligned access for everything). – Bile