Example: https://www.godbolt.org/z/ahfcaj7W8
From https://gcc.gnu.org/onlinedocs/gcc-12.2.0/gcc/Optimize-Options.html
It says
-ftree-loop-vectorize
Perform loop vectorization on trees. This flag is enabled by default at-O2
and by-ftree-vectorize
,-fprofile-use
, and-fauto-profile
."
However it seems I have to pass a flag explicitly to turn on loop unrolling & SIMD. Did I misunderstand something here? It is enabled at -O3
though.
-O3
does not imply-funroll-loops
in GCC, and hasn't for well over a decade I think. That's only on with-fprofile-use
, so GCC knows which loops are actually hot and worth spending i-cache footprint on. (-O3
can be more aggresie about code size, like maybe more willing to fully peel a loop with like 16 iterations or something, especially depending on-mtune
options.) Also,-o3
sets the output filename to3
, very different from-O3
. – Palaeontography