Example:
notmain.c
int __attribute__ ((noinline)) notmain(int i) {
return i + 1;
}
int notmain2(int i) {
return i + 2;
}
main.c
int notmain(int);
int main(int argc, char **argv) {
return notmain(argc);
}
I use noinline
to ensure that what happens is not a secondary effect of whether notmain
is inlined or not.
Compile and disassemble with -O1
:
gcc -c -flto -O1 notmain.c
gcc -flto -O3 notmain.o main.c
objdump -d a.out
Outcome: notmain
present and notmain2
not present:
0000000000001040 <main>:
1040: f3 0f 1e fa endbr64
1044: e9 f0 00 00 00 jmp 1139 <notmain>
1049: 0f 1f 80 00 00 00 00 nopl 0x0(%rax)
0000000000001139 <notmain>:
1139: 8d 47 01 lea 0x1(%rdi),%eax
113c: c3 ret
However if I instead do:
gcc -c -flto -O0 notmain.c
gcc -flto -O3 notmain.o main.c
objdump -d a.out
then both are present:
0000000000001040 <main>:
1040: f3 0f 1e fa endbr64
1044: e9 f0 00 00 00 jmp 1139 <notmain>
1049: 0f 1f 80 00 00 00 00 nopl 0x0(%rax)
0000000000001139 <notmain>:
1139: f3 0f 1e fa endbr64
113d: 55 push %rbp
113e: 48 89 e5 mov %rsp,%rbp
1141: 89 7d fc mov %edi,-0x4(%rbp)
1144: 8b 45 fc mov -0x4(%rbp),%eax
1147: 83 c0 01 add $0x1,%eax
114a: 5d pop %rbp
114b: c3 ret
000000000000114c <notmain2>:
114c: f3 0f 1e fa endbr64
1150: 55 push %rbp
1151: 48 89 e5 mov %rsp,%rbp
1154: 89 7d fc mov %edi,-0x4(%rbp)
1157: 8b 45 fc mov -0x4(%rbp),%eax
115a: 83 c0 02 add $0x2,%eax
115d: 5d pop %rbp
115e: c3 ret
So my question is what does -O1
change in the notmain.o
object file that leads to the optimization not being done?
Interestingly I also tried to bisect which exact optimization from -O1
leads to this. man gcc
lists all the flags that -O1
enables:
gcc -c -flto -fauto-inc-dec -fbranch-count-reg -fcombine-stack-adjustments -fcompare-elim -fcprop-registers -fdce -fdefer-pop -fdelayed-branch -fdse -fforward-propagate -fguess-branch-probability -fif-conversion -fif-conversion2 -finline-functions-called-once -fipa-modref -fipa-profile -fipa-pure-const -fipa-reference -fipa-reference-addressable -fmerge-constants -fmove-loop-invariants -fmove-loop-stores -fomit-frame-pointer -freorder-blocks -fshrink-wrap -fshrink-wrap-separate -fsplit-wide-types -fssa-backprop -fssa-phiopt -ftree-bit-ccp -ftree-ccp -ftree-ch -ftree-coalesce-vars -ftree-copy-prop -ftree-dce -ftree-dominator-opts -ftree-dse -ftree-forwprop -ftree-fre -ftree-phiprop -ftree-pta -ftree-scev-cprop -ftree-sink -ftree-slsr -ftree-sra -ftree-ter -funit-at-a-time notmain.c
gcc -flto -O3 notmain.o main.c
objdump -d a.out
but notmain2
is still present.
I tired to observe the LTO with:
lto-dump -dump-body=notmain2 notmain.o
but I don't see anything that clearly would make a difference, with -O1
:
Gimple Body of Function: notmain2
int notmain2 (int i)
{
int _2;
<bb 2> [local count: 1073741824]:
_2 = i_1(D) + 2;
return _2;
}
with -O0
:
Gimple Body of Function: notmain2
int notmain2 (int i)
{
int D.4724;
int _2;
<bb 2> :
_2 = i_1(D) + 2;
<bb 3> :
<L0>:
return _2;
}
Tested on Ubuntu 23.04, GCC 12.2.0.
I use noinline to ensure
the attribute is not visible when using the function, so it does not affect anything. Why do you care about that a function is not removed when compiling with -O0? – Protochordate-O0
-- "disable all optimizations" -- does not optimize away dead code? – Muticous-O0
is set on the object file, not at final link. Final link does get-O3
and could therefore do dead function elimination. I don't understand how it is that the producednotmain.o
file is different between-O0
and-O1
. – Nichromenotmain.o
objects differ specifically. There isn't a huge use case besides understanding better. – Nichrome