On recent versions of Visual Studio and the Intel Compiler (recent as post-2013?), the compiler rarely ever generates aligned SIMD load/stores anymore.
When compiling for AVX or higher:
- The Microsoft compiler (>VS2013?) doesn't generate aligned loads. But it still generates aligned stores.
- The Intel compiler (> Parallel Studio 2012?) doesn't do it at all anymore. But you'll still see them in ICC-compiled binaries inside their hand-optimized libraries like
memset()
.
- As of GCC 6.1, it still generates aligned load/stores when you use the aligned intrinsics.
The compiler is allowed to do this because it's not a loss of functionality when the code is written correctly. All processors starting from Nehalem have no penalty for unaligned load/stores when the address is aligned.
Microsoft's stance on this issue is that it "helps the programmer by not crashing". Unfortunately, I can't find the original source for this statement from Microsoft anymore. In my opinion, this achieves the exact opposite of that because it hides misalignment penalties. From the correctness standpoint, it also hides incorrect code.
Whatever the case is, unconditionally using unaligned load/stores does simplify the compiler a bit.
New Relevations:
- Starting Parallel Studio 2018, the Intel Compiler no longer generates aligned moves at all - even for pre-Nehalem targets.
- Starting from Visual Studio 2017, the Microsoft Compiler also no longer generates aligned moves at all - even when targeting pre-AVX hardware.
Both cases result in inevitable performance degradation on older processors. But it seems that this is intentional as both Intel and Microsoft no longer care about old processors.
The only load/store intrinsics that are immune to this are the non-temporal load/stores. There is no unaligned equivalent of them, so the compiler has no choice.
So if you want to just test for correctness of your code, you can substitute in the load/store intrinsics for non-temporal ones. But be careful not to let something like this slip into production code since NT load/stores (NT-stores in particular) are a double-edged sword that can hurt you if you don't know what you're doing.
movaps
? – Swithbartalignas
isn't perfect or a guarantee,memcpy
can put these structs anywhere (including unaligned locations),malloc
won't always give you aligned memory, etc. See the dupe - you generally need to write your own allocator using_aligned_malloc
. – Actinomycosis__declspec(align(#))
, but since VS2015alignas
support is implemented as veneer for same). – Actinomycosismovaps
to implement_mm_load_ps
(regardless of actual alignment), it just apparently didn't happen – Swithbartmovaps
will certainly cause an exception with an unaligned address. – Actinomycosis_mm_load_ps
is allowed to do that too, though it doesn't have to – Swithbart