Clang optimization levels
Asked Answered
K

4

140

For gcc, the manual explains what -O3, -Os, etc. translate to in terms of specific optimisation arguments (-funswitch-loops, -fcompare-elim, etc.)

I'm looking for the same info for clang.

I've looked online and in man clang which only gives general information (-O2 optimises more aggressively than -O1, -Os optimises for size, ...) and also looked here on Stack Overflow and found this, but I haven't found anything relevant in the cited source files.

Edit: I found an answer but I'm still interested if anyone has a link to a user-manual documenting all optimisation passes and the passes selected by -Ox. Currently I just found this list of passes, but nothing on optimisation levels.

Killifish answered 21/3, 2013 at 12:48 Comment(1)
gcc-12 -c -Q -O2 --help=optimizersMetacenter
K
220

I found this related question.

To sum it up, to find out about compiler optimization passes:

llvm-as < /dev/null | opt -O3 -disable-output -debug-pass=Arguments

As pointed out in Geoff Nixon's answer (+1), clang additionally runs some higher level optimizations, which we can retrieve with:

echo 'int;' | clang -xc -O3 - -o /dev/null -\#\#\#

Documentation of individual passes is available here.

You can compare the effect of changing high-level flags such as -O like this:

diff -wy --suppress-common-lines  \
  <(echo 'int;' | clang -xc     - -o /dev/null -\#\#\# 2>&1 | tr " " "\n" | grep -v /tmp) \
  <(echo 'int;' | clang -xc -O0 - -o /dev/null -\#\#\# 2>&1 | tr " " "\n" | grep -v /tmp)
# will tell you that -O0 is indeed the default.

With version 6.0 the passes are as follow:

  • baseline (-O0):

    • opt sets: -tti -verify -ee-instrument -targetlibinfo -assumption-cache-tracker -profile-summary-info -forceattrs -basiccg -always-inline -barrier
    • clang adds : -mdisable-fp-elim -mrelax-all
  • -O1 is based on -O0

    • opt adds: -targetlibinfo -tti -tbaa -scoped-noalias -assumption-cache-tracker -profile-summary-info -forceattrs -inferattrs -ipsccp -called-value-propagation -globalopt -domtree -mem2reg -deadargelim -basicaa -aa -loops -lazy-branch-prob -lazy-block-freq -opt-remark-emitter -instcombine -simplifycfg -basiccg -globals-aa -prune-eh -always-inline -functionattrs -sroa -memoryssa -early-cse-memssa -speculative-execution -lazy-value-info -jump-threading -correlated-propagation -libcalls-shrinkwrap -branch-prob -block-freq -pgo-memop-opt -tailcallelim -reassociate -loop-simplify -lcssa-verification -lcssa -scalar-evolution -loop-rotate -licm -loop-unswitch -indvars -loop-idiom -loop-deletion -loop-unroll -memdep -memcpyopt -sccp -demanded-bits -bdce -dse -postdomtree -adce -barrier -rpo-functionattrs -globaldce -float2int -loop-accesses -loop-distribute -loop-vectorize -loop-load-elim -alignment-from-assumptions -strip-dead-prototypes -loop-sink -instsimplify -div-rem-pairs -verify -ee-instrument -early-cse -lower-expect
    • clang adds : -momit-leaf-frame-pointer
    • clang drops : -mdisable-fp-elim -mrelax-all
  • -O2 is based on -O1

    • opt adds: -inline -mldst-motion -gvn -elim-avail-extern -slp-vectorizer -constmerge
    • opt drops: -always-inline
    • clang adds: -vectorize-loops -vectorize-slp
  • -O3 is based on -O2

    • opt adds: -callsite-splitting -argpromotion
  • -Ofast is based on -O3, valid in clang but not in opt

    • clang adds: -fno-signed-zeros -freciprocal-math -ffp-contract=fast -menable-unsafe-fp-math -menable-no-nans -menable-no-infs -mreassociate -fno-trapping-math -ffast-math -ffinite-math-only
  • -Os is similar to -O2

    • opt drops: -libcalls-shrinkwrap and -pgo-memopt-opt
  • -Oz is based on -Os

    • opt drops: -slp-vectorizer

With version 3.8 the passes are as follow:

  • baseline (-O0):

    • opt sets : -targetlibinfo -tti -verify
    • clang adds : -mdisable-fp-elim -mrelax-all
  • -O1 is based on -O0

    • opt adds: -globalopt -demanded-bits -branch-prob -inferattrs -ipsccp -dse -loop-simplify -scoped-noalias -barrier -adce -deadargelim -memdep -licm -globals-aa -rpo-functionattrs -basiccg -loop-idiom -forceattrs -mem2reg -simplifycfg -early-cse -instcombine -sccp -loop-unswitch -loop-vectorize -tailcallelim -functionattrs -loop-accesses -memcpyopt -loop-deletion -reassociate -strip-dead-prototypes -loops -basicaa -correlated-propagation -lcssa -domtree -always-inline -aa -block-freq -float2int -lower-expect -sroa -loop-unroll -alignment-from-assumptions -lazy-value-info -prune-eh -jump-threading -loop-rotate -indvars -bdce -scalar-evolution -tbaa -assumption-cache-tracker
    • clang adds : -momit-leaf-frame-pointer
    • clang drops : -mdisable-fp-elim -mrelax-all
  • -O2 is based on -O1

    • opt adds: -elim-avail-extern -mldst-motion -slp-vectorizer -gvn -inline -globaldce -constmerge
    • opt drops: -always-inline
    • clang adds: -vectorize-loops -vectorize-slp
  • -O3 is based on -O2

    • opt adds: -argpromotion
  • -Ofast is based on -O3, valid in clang but not in opt

    • clang adds: -fno-signed-zeros -freciprocal-math -ffp-contract=fast -menable-unsafe-fp-math -menable-no-nans -menable-no-infs
  • -Os is the same as -O2

  • -Oz is based on -Os

    • opt drops: -slp-vectorizer
    • clang drops: -vectorize-loops

----------

With version 3.7 the passes are as follow (parsed output of the command above):

  • default (-O0): -targetlibinfo -verify -tti

  • -O1 is based on -O0

    • adds: -sccp -loop-simplify -float2int -lazy-value-info -correlated-propagation -bdce -lcssa -deadargelim -loop-unroll -loop-vectorize -barrier -memcpyopt -loop-accesses -assumption-cache-tracker -reassociate -loop-deletion -branch-prob -jump-threading -domtree -dse -loop-rotate -ipsccp -instcombine -scoped-noalias -licm -prune-eh -loop-unswitch -alignment-from-assumptions -early-cse -inline-cost -simplifycfg -strip-dead-prototypes -tbaa -sroa -no-aa -adce -functionattrs -lower-expect -basiccg -loops -loop-idiom -tailcallelim -basicaa -indvars -globalopt -block-freq -scalar-evolution -memdep -always-inline
  • -O2 is based on -01

    • adds: -elim-avail-extern -globaldce -inline -constmerge -mldst-motion -gvn -slp-vectorizer
    • removes: -always-inline
  • -O3 is based on -O2

    • adds: -argpromotion -verif
  • -Os is identical to -O2

  • -Oz is based on -Os

    • removes: -slp-vectorizer

----------

For version 3.6 the passes are as documented in GYUNGMIN KIM's post.


----------

With version 3.5 the passes are as follow (parsed output of the command above):

  • default (-O0): -targetlibinfo -verify -verify-di

  • -O1 is based on -O0

    • adds: -correlated-propagation -basiccg -simplifycfg -no-aa -jump-threading -sroa -loop-unswitch -ipsccp -instcombine -memdep -memcpyopt -barrier -block-freq -loop-simplify -loop-vectorize -inline-cost -branch-prob -early-cse -lazy-value-info -loop-rotate -strip-dead-prototypes -loop-deletion -tbaa -prune-eh -indvars -loop-unroll -reassociate -loops -sccp -always-inline -basicaa -dse -globalopt -tailcallelim -functionattrs -deadargelim -notti -scalar-evolution -lower-expect -licm -loop-idiom -adce -domtree -lcssa
  • -O2 is based on -01

    • adds: -gvn -constmerge -globaldce -slp-vectorizer -mldst-motion -inline
    • removes: -always-inline
  • -O3 is based on -O2

    • adds: -argpromotion
  • -Os is identical to -O2

  • -Oz is based on -Os

    • removes: -slp-vectorizer

----------

With version 3.4 the passes are as follow (parsed output of the command above):

  • -O0: -targetlibinfo -preverify -domtree -verify

  • -O1 is based on -O0

    • adds: -adce -always-inline -basicaa -basiccg -correlated-propagation -deadargelim -dse -early-cse -functionattrs -globalopt -indvars -inline-cost -instcombine -ipsccp -jump-threading -lazy-value-info -lcssa -licm -loop-deletion -loop-idiom -loop-rotate -loop-simplify -loop-unroll -loop-unswitch -loops -lower-expect -memcpyopt -memdep -no-aa -notti -prune-eh -reassociate -scalar-evolution -sccp -simplifycfg -sroa -strip-dead-prototypes -tailcallelim -tbaa
  • -O2 is based on -01

    • adds: -barrier -constmerge -domtree -globaldce -gvn -inline -loop-vectorize -preverify -slp-vectorizer -targetlibinfo -verify
    • removes: -always-inline
  • -O3 is based on -O2

    • adds: -argpromotion
  • -Os is identical to -O2

  • -Oz is based on -O2

    • removes: -barrier -loop-vectorize -slp-vectorizer

----------

With version 3.2 the passes are as follow (parsed output of the command above):

  • -O0: -targetlibinfo -preverify -domtree -verify

  • -O1 is based on -O0

    • adds: -sroa -early-cse -lower-expect -no-aa -tbaa -basicaa -globalopt -ipsccp -deadargelim -instcombine -simplifycfg -basiccg -prune-eh -always-inline -functionattrs -simplify-libcalls -lazy-value-info -jump-threading -correlated-propagation -tailcallelim -reassociate -loops -loop-simplify -lcssa -loop-rotate -licm -loop-unswitch -scalar-evolution -indvars -loop-idiom -loop-deletion -loop-unroll -memdep -memcpyopt -sccp -dse -adce -strip-dead-prototypes
  • -O2 is based on -01

    • adds: -inline -globaldce -constmerge
    • removes: -always-inline
  • -O3 is based on -O2

    • adds: -argpromotion
  • -Os is identical to -O2

  • -Oz is identical to -Os


-------------

Edit [march 2014] removed duplicates from lists.

Edit [april 2014] added documentation link + options for 3.4

Edit [september 2014] added options for 3.5

Edit [december 2015] added options for 3.7 and mention existing answer for 3.6

Edit [may 2016] added options for 3.8, for both opt and clang and mention existing answer for clang (versus opt)

Edit [nov 2018] add options for 6.0

Killifish answered 21/3, 2013 at 12:55 Comment(11)
Is there a way of doing this with the version of clang that ships with XCode5? I've tried hunting around for the llvm-as command, but it doesn't exist on my machine anywhere that I can seeAlbrecht
@Antoine, why some flags, like -simplifycfg are repeated?Bane
@Paschalis: I'm not sure, but since some optimization passes only work if some other passes have been run, and for example simplifycfg is required by multiple passes. And debug-pass=Arguments probably happens before deduplication. I removed the duplicates in my answer, thanks for your feedback.Killifish
Some optimizations create stuff that can be further optimized (dead code etc.), so it might make sense to rerun some optimization passes.Paramagnet
@Killifish I'm treating this like a wiki and added the results for v6. LMK if you prefer it moved to another answer though.Adeline
@Adeline / @Killifish Why not (also?) LLVM 7 (or is that what you meant?) Also: 1. I'm not sure for how long its been there, but there's also -Og a la GCC now; 2. Are all the specifics for the older versions still necessary? 3. I think given the nice changes that have been made over the years, and the community status, I'm gonna cut my answer down to just mentioning stuff like clang -cc1 -mllvm -help-list-hidden (unless you'd prefer to integrate it).Enoch
@GeoffNixon v6 was what I had handy. Trimming the 3.x versions down seems reasonable.Adeline
@Killifish 2021: neither in man clang nor in Clang command line argument reference I cannot find the default optimization level. What is the default optimization level? -O0? P.S. Your post states that the default is -O0 for version 3.5 and for version 3.7. In 2021 there is version 12.0.0. Is default -O0 still relevant for version 12.0.0?Landloper
@Landloper yeah I kind stopped updating this post with every llvm release. As far as I can tell default is still -O0. Not sure if/where this is documented. I'm adding the (rather long) command I used to compare default to -O0 as an edit in the post, so people can play with it.Killifish
@Killifish Thanks for the answer! It will be useful to update the answer telling that the default is still -O0. People might search for this info.Landloper
This seems to give a mixture of compilation and linkage arguments. Any way to get only compilation ones in such a way that they could be added to an external build system?Compaction
E
24

@Antoine's answer (and the other question linked) accurately describe the LLVM optimizations that are enabled, but there are a few other Clang-specific options (i.e., those that affect lowering to the AST) that affected by the -O[0|1|2|3|fast] flags.

You can take a look at these with:

echo 'int;' | clang -xc -O0 - -o /dev/null -\#\#\#

echo 'int;' | clang -xc -O1 - -o /dev/null -\#\#\#

echo 'int;' | clang -xc -O2 - -o /dev/null -\#\#\#

echo 'int;' | clang -xc -O3 - -o /dev/null -\#\#\#

echo 'int;' | clang -xc -Ofast - -o /dev/null -\#\#\#

For example, -O0 enables -mrelax-all, -O1 enables -vectorize-loops and -vectorize-slp, and -Ofast enables -menable-no-infs, -menable-no-nans, -menable-unsafe-fp-math, -ffp-contract=fast and -ffast-math.


@Techogrebo:

Yes, no don't necessarily need the other LLVM tools. Try:

echo 'int;' | clang -xc - -o /dev/null -mllvm -print-all-options

Also, there are a lot more detailed options you can examine/modify with Clang alone... you just need to know how to get to them!

Try a few of:

clang -help

clang -cc1 -help

clang -cc1 -mllvm -help

clang -cc1 -mllvm -help-list-hidden

clang -cc1as -help

Enoch answered 20/12, 2014 at 4:9 Comment(0)
C
7

Starting with clang / LLVM 13.0.0, the legacy pass manager has been deprecated and the new pass manager is used by default. This means that the previous solution for printing the optimization passes used for the different optimization levels in opt will only work if the legacy pass manager is explicitly enabled with -enable-new-pm=0. So as long as the legacy pass manager is around (expected until LLVM 14), one can use the following command

llvm-as < /dev/null | opt -O3 -disable-output -debug-pass=Arguments -enable-new-pm=0

Alternatively, the execution order of the optimization passes with the new pass manager can be extracted with --debug-pass-manager (instead of -debug-pass=Arguments). Unfortunately the output is very verbose and some processing needs to be done to reconstruct the behavior manually with -passes=. If only transformation passes are of interest, one can use the option -debug-pass-manager=quiet to skip information about analyses.

There is a user guide on how to use the new pass manager with opt on the LLVM Website.

Catherinacatherine answered 30/3, 2022 at 2:17 Comment(0)
M
2

LLVM 3.6 -O1

Pass Arguments: -targetlibinfo -no-aa -tbaa -scoped-noalias -assumption-cache-tracker -basicaa -notti -verify-di -ipsccp -globalopt -deadargelim -domtree -instcombine -simplifycfg -basiccg -prune-eh -inline-cost -always-inline -functionattrs -sroa -domtree -early-cse -lazy-value-info -jump-threading -correlated-propagation -simplifycfg -domtree -instcombine -tailcallelim -simplifycfg -reassociate -domtree -loops -loop-simplify -lcssa -loop-rotate -licm -loop-unswitch -instcombine -scalar-evolution -loop-simplify -lcssa -indvars -loop-idiom -loop-deletion -function_tti -loop-unroll -memdep -memcpyopt -sccp -domtree -instcombine -lazy-value-info -jump-threading -correlated-propagation -domtree -memdep -dse -adce -simplifycfg -domtree -instcombine -barrier -domtree -loops -loop-simplify -lcssa -branch-prob -block-freq -scalar-evolution -loop-vectorize -instcombine -simplifycfg -domtree -instcombine -loops -loop-simplify -lcssa -scalar-evolution -function_tti -loop-unroll -alignment-from-assumptions -strip-dead-prototypes -verify -verify-di

-O2 base on -O1

add : -inline -mldst-motion -domtree -memdep -gvn -memdep -scalar-evolution -slp-vectorizer -globaldce -constmerge

and removes: -always-inline

-O3 based on -O2

add: -argpromotion

Montes answered 7/8, 2015 at 7:14 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.