Here is a bounded loop that increments two locations of the same buffer.
unsigned int getid();
void foo(unsigned int *counter, unsigned int n) {
unsigned int A = getid();
unsigned int B = getid();
for (unsigned int i = 0; i < n; i++) {
++counter[A];
++counter[B];
}
}
According to the result from compiler explorer (https://godbolt.org/z/b1sjf5669), even at -O3 optimization level, the program still has a loop that involves add 1
command.
In contrast, if ++counter[B]
is removed, the compiler can reduce the loop and optimize the program into counter[A] += n
. (https://godbolt.org/z/4YoPcz5rT)
Why the compiler is conservative, i.e., not transforming the codes into counter[A] += n, counter[B] += n
in this case?
I know that some alias issues would lead to the failure of loop reduction. However, there is only one buffer here and even A=B
, optimizing the codes would not change the results.
counter[A]+=n
is not generic enough to catch this case. My uneducated guess would be that this rule requires no other access tocounter[A]
and maybe the compiler cannot "branch" and evaluate its decisions based on the relationship betweenA
andB
. – Enactcounter[B]
faults or something, because that's not something GCC optimizations preserve in general. – Outbalancecounter[A] += n, counter[B] += n;
is correct in C terms even in the A==B case. (I'd have used;
, but,
is a sequence point.) – Outbalance[A%4 + 0]
and[A%4 + 16]
. godbolt.org/z/PbeaW8nnE . (+16 is the next cache line, but it doesn't know howcounter
is aligned so even+1
could be into the next page. And it chooses to use one 64-bit SIMD increment if we use+1
, so as we expect it's assuming no UB, not preserving fault behaviour.) But surprisinglyif (A==B) return;
doesn't help GCC or clang. Clang even unrolls but sticks to strictly alternating. – Outbalance