-fdata-sections -ffunction-sections -Wl,--gc-sections
minimal example analysis
These options were mentioned at: https://mcmap.net/q/150689/-how-to-remove-unused-c-c-symbols-with-gcc-and-ld and I just wanted to confirm that they work and inspect a bit how with objdump
.
The conclusions we draw similar to what others posts mentioned:
- if any symbol of a section is used, then the entire section goes in, even if some other symbols aren't used at all
- inlining makes a symbol not be considered as used
-flto
leads to unused symbols being removed even if other symbols are used in the same compilation unit
Separate files, -O3
only
notmain.c
int i1 = 1;
int i2 = 2;
int f1(int i) {
return i + 1;
}
int f2(int i) {
return i + 2;
}
main.c
extern int i1;
int f1(int i);
int main(int argc, char **argv) {
return f1(argc) + i1;
}
Compile only with -O3
:
gcc -c -O3 notmain.c
gcc -O3 notmain.o main.c
Disassemble notmain.o
:
objdump -D notmain.o
The output contains:
Disassembly of section .text:
0000000000000000 <f1>:
0: f3 0f 1e fa endbr64
4: 8d 47 01 lea 0x1(%rdi),%eax
7: c3 ret
8: 0f 1f 84 00 00 00 00 nopl 0x0(%rax,%rax,1)
f: 00
0000000000000010 <f2>:
10: f3 0f 1e fa endbr64
14: 8d 47 02 lea 0x2(%rdi),%eax
17: c3 ret
Disassembly of section .data:
0000000000000000 <i2>:
0: 02 00 add (%rax),%al
...
0000000000000004 <i1>:
4: 01 00 add %eax,(%rax)
...
Disassemble notmain.o
:
objdump -D a.out
The output contains:
Disassembly of section .text:
0000000000001040 <main>:
1040: f3 0f 1e fa endbr64
1044: 48 83 ec 08 sub $0x8,%rsp
1048: e8 03 01 00 00 call 1150 <f1>
104d: 03 05 c1 2f 00 00 add 0x2fc1(%rip),%eax # 4014 <i1>
1053: 48 83 c4 08 add $0x8,%rsp
1057: c3 ret
1058: 0f 1f 84 00 00 00 00 nopl 0x0(%rax,%rax,1)
105f: 00
0000000000001150 <f1>:
1150: f3 0f 1e fa endbr64
1154: 8d 47 01 lea 0x1(%rdi),%eax
1157: c3 ret
1158: 0f 1f 84 00 00 00 00 nopl 0x0(%rax,%rax,1)
115f: 00
0000000000001160 <f2>:
1160: f3 0f 1e fa endbr64
1164: 8d 47 02 lea 0x2(%rdi),%eax
1167: c3 ret
Disassembly of section .data:
0000000000004010 <i2>:
4010: 02 00 add (%rax),%al
...
0000000000004014 <i1>:
4014: 01 00 add %eax,(%rax)
Conclusion both i2
and f2
were present in the final output file even though the weren't used.
Even if we had added -Wl,--gc-sections
to:
gcc -O3 -Wl,--gc-sections notmain.o main.c
to try and remove unused sections, that wouldn't have changed anything, because in the object file notmain.o
i2
appears in the same section as i1
(.data
), and f2
appears in the same section as f1
(.text
), which were used and therefore bring their entire sections in the final file.
-fdata-sections -ffunction-sections -Wl,--gc-sections
We modify the compilation commands to:
gcc -c -O3 -fdata-sections -ffunction-sections notmain.c
gcc -O3 -Wl,--gc-sections notmain.o main.c
Disassemble notmain.o
:
objdump -D notmain.o
Output contains:
Disassembly of section .text.f1:
0000000000000000 <f1>:
0: f3 0f 1e fa endbr64
4: 8d 47 01 lea 0x1(%rdi),%eax
7: c3 ret
Disassembly of section .text.f2:
0000000000000000 <f2>:
0: f3 0f 1e fa endbr64
4: 8d 47 02 lea 0x2(%rdi),%eax
7: c3 ret
Disassembly of section .data.i2:
0000000000000000 <i2>:
0: 02 00 add (%rax),%al
...
Disassembly of section .data.i1:
0000000000000000 <i1>:
0: 01 00 add %eax,(%rax)
So we see how everything gets its own section named based on the symbol name itself.
Disassemble notmain.o
:
objdump -D a.out
The output contains:
Disassembly of section .text:
0000000000001040 <main>:
1040: f3 0f 1e fa endbr64
1044: 48 83 ec 08 sub $0x8,%rsp
1048: e8 03 01 00 00 call 1150 <f1>
104d: 03 05 b5 2f 00 00 add 0x2fb5(%rip),%eax # 4008 <i1>
1053: 48 83 c4 08 add $0x8,%rsp
1057: c3 ret
1058: 0f 1f 84 00 00 00 00 nopl 0x0(%rax,%rax,1)
105f: 00
0000000000001150 <f1>:
1150: f3 0f 1e fa endbr64
1154: 8d 47 01 lea 0x1(%rdi),%eax
1157: c3 ret
Disassembly of section .data:
0000000000004008 <i1>:
4008: 01 00 add %eax,(%rax)
and it does not contain i2
nor f2
. This is because this time every symbol was in its own section, and so -Wl,--gc-sections
was able to remove every single unused symbol.
Inlining makes a symbol not be considered as used
To test the effect of inlining, let's move our test symbols to the same file as main.c
:
main2.c
int i1 = 1;
int i2 = 2;
int f1(int i) {
return i + 1;
}
int f2(int i) {
return i + 2;
}
int main(int argc, char **argv) {
return f1(argc) + i1;
}
And then:
gcc -c -O3 main2.c
gcc -O3 -Wl,--gc-sections -o main2.out main2.o
Disassemble main2.o
:
objdump -D main2.o
The output contains:
Disassembly of section .text:
0000000000000000 <f1>:
0: f3 0f 1e fa endbr64
4: 8d 47 01 lea 0x1(%rdi),%eax
7: c3 ret
8: 0f 1f 84 00 00 00 00 nopl 0x0(%rax,%rax,1)
f: 00
0000000000000010 <f2>:
10: f3 0f 1e fa endbr64
14: 8d 47 02 lea 0x2(%rdi),%eax
17: c3 ret
Disassembly of section .data:
0000000000000000 <i2>:
0: 02 00 add (%rax),%al
...
0000000000000004 <i1>:
4: 01 00 add %eax,(%rax)
...
Disassembly of section .text.startup:
0000000000000000 <main>:
0: f3 0f 1e fa endbr64
4: 8b 05 00 00 00 00 mov 0x0(%rip),%eax # a <main+0xa>
a: 8d 44 38 01 lea 0x1(%rax,%rdi,1),%eax
e: c3 ret
Interesting how main
is on a separate section .text.startup
, possibly to allow the rest of text to be GC'ed.
We also see that f1
was fully inlined on lea 0x1(%rax,%rdi,1),%eax
(directly adds 1), while for reasons I don't understand i1
is still used at mov 0x0(%rip),%eax
pending relocation, see also: What do linkers do? The relocation will be clear after disassembling main2.out
below.
Disassemble main2.out
:
objdump -D main2.out
The output contains:
Disassembly of section .text:
0000000000001040 <main>:
1040: f3 0f 1e fa endbr64
1044: 8b 05 c2 2f 00 00 mov 0x2fc2(%rip),%eax # 400c <i1>
104a: 8d 44 38 01 lea 0x1(%rax,%rdi,1),%eax
104e: c3 ret
104f: 90 nop
Disassembly of section .data:
0000000000004008 <i2>:
4008: 02 00 add (%rax),%al
...
000000000000400c <i1>:
400c: 01 00 add %eax,(%rax)
and f1
and f2
were entirely removed, because f1
was inlined and therefore not marked as used anymore, so the entire .text
section got removed.
If we forced f1
not to be inlined with:
int __attribute__ ((noinline)) f1(int i) {
return i + 1;
}
then both f1
and f2
would appear on main2.out
.
Sections of different object files are separate even though they have the same name
Obviously, e.g.:
notmain2.c
int i3 = 3;
int i4 = 4;
int f3(int i) {
return i + 3;
}
int f4(int i) {
return i + 4;
}
and then:
gcc -c -O3 notmain.c
gcc -c -O3 notmain2.c
gcc -O3 -Wl,--gc-sections notmain.o notmain2.o main.c
objdump -D a.out
does not contain f3
and f4
, even though f1
and f2
were included, and both are no sections called .text
.
Possible downside of: -fdata-sections -ffunction-sections -Wl,--gc-sections
: slower link speed
We should find some benchmark, but this is likely, as it would require more relocations to be done when one symbol refers to another symbol from the same compilation unit, as they are no present in independent section anymore.
-flto
leads to symbols being removed even if other symbols in the same compilation unite are used
Also, this happens whether or not LTO would lead to inline happening. Consider:
notmain.c
int i1 = 1;
int i2 = 2;
int __attribute__ ((noinline)) f1(int i) {
return i + 1;
}
int f2(int i) {
return i + 2;
}
main.c
extern int i1;
int f1(int i);
int main(int argc, char **argv) {
return f1(argc) + i1;
}
Compile and disassemble:
gcc -c -O3 -flto notmain.c
gcc -O3 -flto notmain.o main.c
objdump -D a.out
The disassembly contains:
Disassembly of section .text:
0000000000001040 <main>:
1040: f3 0f 1e fa endbr64
1044: e8 f7 00 00 00 call 1140 <f1>
1049: 83 c0 01 add $0x1,%eax
104c: c3 ret
0000000000001140 <f1>:
1140: 8d 47 01 lea 0x1(%rdi),%eax
1143: c3 ret
and f2
is not present. So f2
was removed even though f1
is used.
We also note that i1
and i2
are gone. The compiler appears to recognize that i1
is never really modified and just "inlines" it as the constant 1
at: add $0x1,%eax
.
Related question: Does GCC LTO perform cross-file dead code elimination? For some reason code elimination does not happen if you compile the object file with -O0
: Why GCC does not do function dead code elimination with LTO when compiling the object file with -O0?
Tested on Ubuntu 23.04 amd64, GCC 12.2.0.
boost
libraries, the resulting.exe
file contains many unused object files and due to the specifications of my current embedded runtime, starting a10mb
applications takes much longer than, for example, starting a500k
application. – Bandog