Example showing that LTO leads to dead code elimination
Test setup:
notmain.c
int notmain(int i) {
return i + 1;
}
int notmain2(int i) {
return i + 2;
}
main.c
int notmain(int);
int main(int argc, char **argv) {
return notmain(argc);
}
Control experiment without LTO
Compile and disassemble without LTO:
gcc -O3 -c notmain.c
gcc -O3 notmain.o main.c
objdump -d a.out
The output contains:
0000000000001040 <main>:
1040: f3 0f 1e fa endbr64
1044: e9 f7 00 00 00 jmp 1140 <notmain>
1049: 0f 1f 80 00 00 00 00 nopl 0x0(%rax)
0000000000001140 <notmain>:
1140: f3 0f 1e fa endbr64
1144: 8d 47 01 lea 0x1(%rdi),%eax
1147: c3 ret
1148: 0f 1f 84 00 00 00 00 nopl 0x0(%rax,%rax,1)
114f: 00
0000000000001150 <notmain2>:
1150: f3 0f 1e fa endbr64
1154: 8d 47 02 lea 0x2(%rdi),%eax
1157: c3 ret
so the useless notmain2
was not removed.
We can also look at the object size:
size a.out
which outputs:
text data bss dec hex filename
1304 544 8 1856 740 a.out
Furthermore, as a bonus, we note that the function call is not inlined:
0000000000001040 <main>:
1040: f3 0f 1e fa endbr64
1044: e9 f7 00 00 00 jmp 1140 <notmain>
1049: 0f 1f 80 00 00 00 00 nopl 0x0(%rax)
Observe LTO doing DCE
gcc -c -flto -O3 notmain.c
gcc -flto -O3 notmain.o main.c
objdump -d a.out
The output does not contain neither the notmain
not the notmain2
symbols. Everything is fully inlined into main
, which in a single instruction adds 1 to rdi, the first argument, and puts it into the return register eax:
0000000000001040 <main>:
1040: f3 0f 1e fa endbr64
1044: 8d 47 01 lea 0x1(%rdi),%eax
1047: c3 ret
1048: 0f 1f 84 00 00 00 00 nopl 0x0(%rax,%rax,1)
Inlining also mentioned at: Link-time optimization and inline
Beauty. Checking size:
size a.out
outputs:
text data bss dec hex filename
1217 544 8 1769 6e9 a.out
and we see that the text size is smaller as desired due to inline and dead code elimination.
LTO does DCE even when inlining doesn't happen
On the above example, it is not clear if function DCE elimination happens only when inlining is involved or not. So let's test it out with:
int __attribute__ ((noinline)) notmain(int i) {
return i + 1;
}
Compile and disassemble:
gcc -c -flto -O3 notmain.c
gcc -flto -O3 notmain.o main.c
objdump -d a.out
The output contains:
0000000000001040 <main>:
1040: f3 0f 1e fa endbr64
1044: e9 f7 00 00 00 jmp 1140 <notmain>
1049: 0f 1f 80 00 00 00 00 nopl 0x0(%rax)
0000000000001140 <notmain>:
1140: 8d 47 01 lea 0x1(%rdi),%eax
1143: c3 ret
and no notmain2
. Therefore, the useless notmain2
was removed even though notmain
wasn't.
Function removal does not happen when notmain.c
is compiled with -O0
I don't understand why exactly: Why GCC does not do function dead code elimination with LTO when compiling the object file with -O0?
Tested on Ubuntu 23.04 amd64, GCC 12.2.0.