The following code produces assembly that conditionally executes SIMD in GCC 12.3 when compiled with -O3
. For completeness, the code always executes SIMD in GCC 13.2 and never executes SIMD in clang 17.0.1.
#include <array>
__attribute__((noinline)) void fn(std::array<int, 4>& lhs, const std::array<int, 4>& rhs)
{
for (std::size_t idx = 0; idx != 4; ++idx) {
lhs[idx] = lhs[idx] + rhs[idx];
}
}
Here is the link in godbolt.
Here is the actual assembly from GCC 12.3 (with -O3):
fn(std::array<int, 4ul>&, std::array<int, 4ul> const&):
lea rdx, [rsi+4]
mov rax, rdi
sub rax, rdx
cmp rax, 8
jbe .L2
movdqu xmm0, XMMWORD PTR [rsi]
movdqu xmm1, XMMWORD PTR [rdi]
paddd xmm0, xmm1
movups XMMWORD PTR [rdi], xmm0
ret
.L2:
mov eax, DWORD PTR [rsi]
add DWORD PTR [rdi], eax
mov eax, DWORD PTR [rsi+4]
add DWORD PTR [rdi+4], eax
mov eax, DWORD PTR [rsi+8]
add DWORD PTR [rdi+8], eax
mov eax, DWORD PTR [rsi+12]
add DWORD PTR [rdi+12], eax
ret
I am very interested to know a) the purpose of the first 5 assembly instructions and b) if there is anything that can be done to cause GCC 12.3 to emit the code of GCC 13.2 (ideally, without manually writing SSE).
std::array
s can't partially overlap, but I'm not sure. – Transported