Is `-ftree-loop-vectorize` not enabled by `-O2` in GCC v12?
Asked Answered
P

1

1

Example: https://www.godbolt.org/z/ahfcaj7W8

From https://gcc.gnu.org/onlinedocs/gcc-12.2.0/gcc/Optimize-Options.html

It says

-ftree-loop-vectorize
     Perform loop vectorization on trees. This flag is enabled by default at -O2 and by -ftree-vectorize, -fprofile-use, and -fauto-profile."

However it seems I have to pass a flag explicitly to turn on loop unrolling & SIMD. Did I misunderstand something here? It is enabled at -O3 though.

Propitious answered 23/12, 2022 at 10:30 Comment(2)
SIMD, Vectorize. is enabled by -o2, loop unrolling makes the code much larger so it is a separate flag in -o2 and included in -o3. many times -o2 will out-perform -o3.Sorcerer
@Strom: -O3 does not imply -funroll-loops in GCC, and hasn't for well over a decade I think. That's only on with -fprofile-use, so GCC knows which loops are actually hot and worth spending i-cache footprint on. (-O3 can be more aggresie about code size, like maybe more willing to fully peel a loop with like 16 iterations or something, especially depending on -mtune options.) Also, -o3 sets the output filename to 3, very different from -O3.Palaeontography
P
0

It is enabled at -O2 in GCC12, but only with a much lower cost threshold than at -O3, e.g. often only vectorizing when the loop trip count is a compile-time constant and known to be a multiple of the vector width (e.g. 8 for 32-bit elements with AVX2 vectors). See https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=2b8453c401b699ed93c085d0413ab4b5030bcdb8

https://godbolt.org/z/3xjdrx6as shows some loops at -O2 vs. -O3, with a sum of an array of integers only vectorizing with a constant count, not a runtime variable. Even for (int i=0 ; i < (len&-16) ; i++) sum += arr[i] to make the length a multiple of 16 doesn't make gcc -O2 auto-vectorize.

Before GCC12, -ftree-vectorize wasn't enabled at all by -O2.

Palaeontography answered 24/12, 2022 at 9:15 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.