How to remove unused C/C++ symbols with GCC and ld?
Asked Answered
B

13

158

I need to optimize the size of my executable severely (ARM development) and I noticed that in my current build scheme (gcc + ld) unused symbols are not getting stripped.

The usage of the arm-strip --strip-unneeded for the resulting executables / libraries doesn't change the output size of the executable (I have no idea why, maybe it simply can't).

What would be the way (if it exists) to modify my building pipeline, so that the unused symbols are stripped from the resulting file?


I wouldn't even think of this, but my current embedded environment isn't very "powerful" and saving even 500K out of 2M results in a very nice loading performance boost.

Update:

Unfortunately the current gcc version I use doesn't have the -dead-strip option and the -ffunction-sections... + --gc-sections for ld doesn't give any significant difference for the resulting output.

I'm shocked that this even became a problem, because I was sure that gcc + ld should automatically strip unused symbols (why do they even have to keep them?).

Bandog answered 14/7, 2011 at 1:51 Comment(9)
How do you know that symbols are not used?Allisonallissa
Not referenced anywhere => not being used in the final application. I assume that building call graph while comipling / linking shouldn't be very hard.Bandog
Are you trying to reduce the size of the .o file by removing dead symbols, or you are trying reduce the size of the actual code footprint once loaded into executable memory? The fact that you say "embedded" hints at the latter; the question you ask seems focused on the former.Dapper
@Ira I'm trying to reduce the output executable size, because (as an example) if I attempt to port some existing applications, which use boost libraries, the resulting .exe file contains many unused object files and due to the specifications of my current embedded runtime, starting a 10mb applications takes much longer than, for example, starting a 500k application.Bandog
@Yippie: You want to get rid of code to minimize load time; the code you want to get rid of are unused methods/etc. from libraries. Yes, you need to build a call graph to do this. It isn't that easy; it has to be a global call graph, it has to be conservative (can't remove something that might get used) and has to be accurate (so you have as close to an ideal call graph, so you really know what isn't used). The big problem is doing a global, accurate call graph. Don't know of many compilers that do this, let alone linkers.Dapper
Yes, but how do you know that they are not referenced anywhere?Allisonallissa
What version of gcc/ld are you using?Bonedry
If you upgrade your toolchain (should be pretty straight forward, fear not) possibly Nemos advice will start working?Stellate
Try a LTO build, that often reduces the size of the binary quite a bit.Peepul
L
178

For GCC, this is accomplished in two stages:

First compile the data but tell the compiler to separate the code into separate sections within the translation unit. This will be done for functions, classes, and external variables by using the following two compiler flags:

-fdata-sections -ffunction-sections

Link the translation units together using the linker optimization flag (this causes the linker to discard unreferenced sections):

-Wl,--gc-sections

So if you had one file called test.cpp that had two functions declared in it, but one of them was unused, you could omit the unused one with the following command to gcc(g++):

gcc -Os -fdata-sections -ffunction-sections test.cpp -o test -Wl,--gc-sections

(Note that -Os is an additional compiler flag that tells GCC to optimize for size)

Leavenworth answered 21/7, 2011 at 1:11 Comment(2)
Please note this will slow down the executable as per GCC's option descriptions (I tested).Nuris
With mingw this does not work when linking statically statically libstdc++ and libgcc with the flag -static. The linker option -strip-all helps quite a bit, but still the generated executable (or dll) is about 4 way bigger than what Visual Studio would generate. Point is, I have no control on how libstdc++ was compiled. There should be a ld only option.Continue
A
41

If this thread is to be believed, you need to supply the -ffunction-sections and -fdata-sections to gcc, which will put each function and data object in its own section. Then you give and --gc-sections to GNU ld to remove the unused sections.

Autobiographical answered 14/7, 2011 at 1:56 Comment(15)
Unfortunately this didn't work, I have no idea why, probably the arm-gcc compiler issue or something ARM-related. If there is something else I could try...Bandog
@Michael: True, but the documentation links I provided are current and seem to support the idea that it should work... Oh well.Autobiographical
It's still valid. I've never understood why it's not the default for GCC; I'd even question whether the current default (keep unused symbols) makes any sense.Pedigo
@MSalters: It's not the default, because it violates the C and C++ standards. Suddenly global initialization doesn't happen, which results in some very surprised programmers.Mongolia
@Ben Voigt: Obviously such a symbol is in use if its presence matters. So essentially you're saying that GCC can't properly detect which symbols are actually used?Pedigo
@MSalters: You're assuming that initialization has no side-effects. In C and C++, that is a bad assumption.Mongolia
@Ben Voigt: Nope, didn't assume that at all. I just observed that GCC assumes that. Did I understand your point correctly, that GCC will also eliminate symbols that are unused except for the side effects of their initialization?Pedigo
@MSalters: Only if you pass the non-standard behavior-breaking options, which you proposed to make the default behavior.Mongolia
@Ben Voigt: Obviously. So I assume that if I'd provide a patch that would properly detect unused symbols, then -gc-sections would become the default?Pedigo
@MSalters: If you can make a patch that runs static initializers if and only if the side effects are necessary to the correct operation of the program, that would be awesome. Unfortunately I think doing it perfectly often requires solving the halting problem, so you'll probably need to err on the side of including some extra symbols at times. Which basically is what Ira says in his comments to the question. (BTW: "not necessary to the correct operation of the program" is a different definition of "unused" than how that term is used in the standards)Mongolia
@BenVoigt in C, global initialization cannot have side-effects (initializers must be constant expressions)Boley
@Matt: But that's not true in C++... and they share the same linker.Mongolia
@BenVoigt I was rebutting your claim "You're assuming that initialization has no side-effects. In C and C++, that is a bad assumption" and the C part of "it violates the C and C++ standards". How does it allegedly violate the C standard?Boley
@Matt: After some research I think you're right. It's a bad assumption only in C++. I suppose C can get code into the initialization sequence using pragmas and attributes that control sections, but not via initialization.Mongolia
OK. A good point about C++ nonetheless. I use these switches in production in C, but had not considered how they might interact with static initialization if I were also to use them in C++.Boley
P
29

You'll want to check your docs for your version of gcc & ld:

However for me (OS X gcc 4.0.1) I find these for ld

-dead_strip

Remove functions and data that are unreachable by the entry point or exported symbols.

-dead_strip_dylibs

Remove dylibs that are unreachable by the entry point or exported symbols. That is, suppresses the generation of load command commands for dylibs which supplied no symbols during the link. This option should not be used when linking against a dylib which is required at runtime for some indirect reason such as the dylib has an important initializer.

And this helpful option

-why_live symbol_name

Logs a chain of references to symbol_name. Only applicable with -dead_strip. It can help debug why something that you think should be dead strip removed is not removed.

There's also a note in the gcc/g++ man that certain kinds of dead code elimination are only performed if optimization is enabled when compiling.

While these options/conditions may not hold for your compiler, I suggest you look for something similar in your docs.

Plumlee answered 14/7, 2011 at 2:12 Comment(4)
This seems to do nothing with mingw.Continue
-dead_strip is not a gcc option.Names
-dead_strip is passed by gcc to the linker, but is only supported on Darwin (macOS).Springhead
that's because macOS doesn't have gcc, it has clang. And clang has a compatibility layer you can call as gcc.Soulsearching
E
26

Programming habits could help too; e.g. add static to functions that are not accessed outside a specific file; use shorter names for symbols (can help a bit, likely not too much); use const char x[] where possible; ... this paper, though it talks about dynamic shared objects, can contain suggestions that, if followed, can help to make your final binary output size smaller (if your target is ELF).

Eyeleen answered 16/7, 2011 at 14:26 Comment(3)
How does it help to choose shorter names for symbols?Warlock
if symbols are not stripped away, ça va sans dire—but it seems it needed to be said now.Eyeleen
@Warlock The paper is talking about dynamic shared objects (eg. .so on Linux), so the symbol names have to be retained so that APIs like Python's ctypes FFI module can use them to look up symbols by name at runtime.Gatekeeper
T
26

The answer is -flto. You have to pass it to both your compilation and link steps, otherwise it doesn't do anything.

It actually works very well - reduced the size of a microcontroller program I wrote to less than 50% of its previous size!

Unfortunately it did seem a bit buggy - I had instances of things not being built correctly. It may have been due to the build system I'm using (QBS; it's very new), but in any case I'd recommend you only enable it for your final build if possible, and test that build thoroughly.

Thurstan answered 18/7, 2014 at 10:29 Comment(3)
"-Wl,--gc-sections" doesn't work on MinGW-W64, "-flto" works for me. ThanksHutto
The output assembly is very weird with -flto I do not understand what it does behind the scene.Names
I believe with -flto it doesn't compile each file to assembly, it compiles them to LLVM IR, and then the final link compiles them as if they were all in one compilation unit. That means it can eliminate unused functions and inline non-static ones, and probably other things too. See llvm.org/docs/LinkTimeOptimization.htmlThurstan
F
13

While not strictly about symbols, if going for size - always compile with -Os and -s flags. -Os optimizes the resulting code for minimum executable size and -s removes the symbol table and relocation information from the executable.

Sometimes - if small size is desired - playing around with different optimization flags may - or may not - have significance. For example toggling -ffast-math and/or -fomit-frame-pointer may at times save you even dozens of bytes.

Flatboat answered 14/7, 2011 at 7:19 Comment(1)
Most optimization tweaks will still yield correct code as long as you comply with the language standard, but I've had -ffast-math wreak havoc in completely standards-compliant C++ code, so I would never recommend it.Debunk
T
13

It seems to me that the answer provided by Nemo is the correct one. If those instructions do not work, the issue may be related to the version of gcc/ld you're using, as an exercise I compiled an example program using instructions detailed here

#include <stdio.h>
void deadcode() { printf("This is d dead codez\n"); }
int main(void) { printf("This is main\n"); return 0 ; }

Then I compiled the code using progressively more aggressive dead-code removal switches:

gcc -Os test.c -o test.elf
gcc -Os -fdata-sections -ffunction-sections test.c -o test.elf -Wl,--gc-sections
gcc -Os -fdata-sections -ffunction-sections test.c -o test.elf -Wl,--gc-sections -Wl,--strip-all

These compilation and linking parameters produced executables of size 8457, 8164 and 6160 bytes, respectively, the most substantial contribution coming from the 'strip-all' declaration. If you cannot produce similar reductions on your platform,then maybe your version of gcc does not support this functionality. I'm using gcc(4.5.2-8ubuntu4), ld(2.21.0.20110327) on Linux Mint 2.6.38-8-generic x86_64

Trinary answered 18/7, 2011 at 13:44 Comment(0)
C
11

strip --strip-unneeded only operates on the symbol table of your executable. It doesn't actually remove any executable code.

The standard libraries achieve the result you're after by splitting all of their functions into seperate object files, which are combined using ar. If you then link the resultant archive as a library (ie. give the option -l your_library to ld) then ld will only include the object files, and therefore the symbols, that are actually used.

You may also find some of the responses to this similar question of use.

Causative answered 14/7, 2011 at 5:14 Comment(1)
The separate object files in the library is only relevant when doing a static link. With shared libraries, the whole library is loaded, but not included in the executable, of course.Alphorn
F
8

I don't know if this will help with your current predicament as this is a recent feature, but you can specify the visibility of symbols in a global manner. Passing -fvisibility=hidden -fvisibility-inlines-hidden at compilation can help the linker to later get rid of unneeded symbols. If you're producing an executable (as opposed to a shared library) there's nothing more to do.

More information (and a fine-grained approach for e.g. libraries) is available on the GCC wiki.

Fifty answered 17/7, 2011 at 16:1 Comment(1)
This requires marking the public symbols as public. See gcc.gnu.org/wiki/VisibilityBulwerlytton
L
8

From the GCC 4.2.1 manual, section -fwhole-program:

Assume that the current compilation unit represents whole program being compiled. All public functions and variables with the exception of main and those merged by attribute externally_visible become static functions and in a affect gets more aggressively optimized by interprocedural optimizers. While this option is equivalent to proper use of static keyword for programs consisting of single file, in combination with option --combine this flag can be used to compile most of smaller scale C programs since the functions and variables become local for the whole combined compilation unit, not for the single source file itself.

Largish answered 15/6, 2013 at 20:29 Comment(3)
Yeah but that presumably doesn't work with any kind of incremental compilation and is probably going to be a bit slow.Thurstan
@Timmmm: I suspect you're thinking of -flto.Mongolia
Yes! I subsequently found that (why is it not any of the answers?). Unfortunately it seemed a bit buggy, so I'd only recommend it for the final build and then test that build a lot!Thurstan
F
1

-fdata-sections -ffunction-sections -Wl,--gc-sections minimal example analysis

These options were mentioned at: https://mcmap.net/q/150689/-how-to-remove-unused-c-c-symbols-with-gcc-and-ld and I just wanted to confirm that they work and inspect a bit how with objdump.

The conclusions we draw similar to what others posts mentioned:

  • if any symbol of a section is used, then the entire section goes in, even if some other symbols aren't used at all
  • inlining makes a symbol not be considered as used
  • -flto leads to unused symbols being removed even if other symbols are used in the same compilation unit

Separate files, -O3 only

notmain.c

int i1 = 1;
int i2 = 2;

int f1(int i) {
    return i + 1;
}

int f2(int i) {
    return i + 2;
}

main.c

extern int i1;
int f1(int i);

int main(int argc, char **argv) {
    return f1(argc) + i1;
}

Compile only with -O3:

gcc -c -O3 notmain.c
gcc -O3 notmain.o main.c

Disassemble notmain.o:

objdump -D notmain.o

The output contains:

Disassembly of section .text:

0000000000000000 <f1>:
   0:   f3 0f 1e fa             endbr64
   4:   8d 47 01                lea    0x1(%rdi),%eax
   7:   c3                      ret
   8:   0f 1f 84 00 00 00 00    nopl   0x0(%rax,%rax,1)
   f:   00

0000000000000010 <f2>:
  10:   f3 0f 1e fa             endbr64
  14:   8d 47 02                lea    0x2(%rdi),%eax
  17:   c3                      ret

Disassembly of section .data:

0000000000000000 <i2>:
   0:   02 00                   add    (%rax),%al
        ...

0000000000000004 <i1>:
   4:   01 00                   add    %eax,(%rax)
        ...

Disassemble notmain.o:

objdump -D a.out

The output contains:

Disassembly of section .text:

0000000000001040 <main>:
    1040:       f3 0f 1e fa             endbr64
    1044:       48 83 ec 08             sub    $0x8,%rsp
    1048:       e8 03 01 00 00          call   1150 <f1>
    104d:       03 05 c1 2f 00 00       add    0x2fc1(%rip),%eax        # 4014 <i1>
    1053:       48 83 c4 08             add    $0x8,%rsp
    1057:       c3                      ret
    1058:       0f 1f 84 00 00 00 00    nopl   0x0(%rax,%rax,1)
    105f:       00

0000000000001150 <f1>:
    1150:       f3 0f 1e fa             endbr64
    1154:       8d 47 01                lea    0x1(%rdi),%eax
    1157:       c3                      ret
    1158:       0f 1f 84 00 00 00 00    nopl   0x0(%rax,%rax,1)
    115f:       00

0000000000001160 <f2>:
    1160:       f3 0f 1e fa             endbr64
    1164:       8d 47 02                lea    0x2(%rdi),%eax
    1167:       c3                      ret

Disassembly of section .data:

0000000000004010 <i2>:
    4010:       02 00                   add    (%rax),%al
        ...

0000000000004014 <i1>:
    4014:       01 00                   add    %eax,(%rax)

Conclusion both i2 and f2 were present in the final output file even though the weren't used.

Even if we had added -Wl,--gc-sections to:

gcc -O3 -Wl,--gc-sections notmain.o main.c

to try and remove unused sections, that wouldn't have changed anything, because in the object file notmain.o i2 appears in the same section as i1 (.data), and f2 appears in the same section as f1 (.text), which were used and therefore bring their entire sections in the final file.

-fdata-sections -ffunction-sections -Wl,--gc-sections

We modify the compilation commands to:

gcc -c -O3 -fdata-sections -ffunction-sections notmain.c
gcc -O3 -Wl,--gc-sections notmain.o main.c

Disassemble notmain.o:

objdump -D notmain.o

Output contains:

Disassembly of section .text.f1:

0000000000000000 <f1>:
   0:   f3 0f 1e fa             endbr64
   4:   8d 47 01                lea    0x1(%rdi),%eax
   7:   c3                      ret

Disassembly of section .text.f2:

0000000000000000 <f2>:
   0:   f3 0f 1e fa             endbr64
   4:   8d 47 02                lea    0x2(%rdi),%eax
   7:   c3                      ret

Disassembly of section .data.i2:

0000000000000000 <i2>:
   0:   02 00                   add    (%rax),%al
        ...

Disassembly of section .data.i1:

0000000000000000 <i1>:
   0:   01 00                   add    %eax,(%rax)

So we see how everything gets its own section named based on the symbol name itself.

Disassemble notmain.o:

objdump -D a.out

The output contains:

Disassembly of section .text:

0000000000001040 <main>:
    1040:       f3 0f 1e fa             endbr64
    1044:       48 83 ec 08             sub    $0x8,%rsp
    1048:       e8 03 01 00 00          call   1150 <f1>
    104d:       03 05 b5 2f 00 00       add    0x2fb5(%rip),%eax        # 4008 <i1>
    1053:       48 83 c4 08             add    $0x8,%rsp
    1057:       c3                      ret
    1058:       0f 1f 84 00 00 00 00    nopl   0x0(%rax,%rax,1)
    105f:       00

0000000000001150 <f1>:
    1150:       f3 0f 1e fa             endbr64
    1154:       8d 47 01                lea    0x1(%rdi),%eax
    1157:       c3                      ret

Disassembly of section .data:

0000000000004008 <i1>:
    4008:       01 00                   add    %eax,(%rax)

and it does not contain i2 nor f2. This is because this time every symbol was in its own section, and so -Wl,--gc-sections was able to remove every single unused symbol.

Inlining makes a symbol not be considered as used

To test the effect of inlining, let's move our test symbols to the same file as main.c:

main2.c

int i1 = 1;
int i2 = 2;

int f1(int i) {
    return i + 1;
}

int f2(int i) {
    return i + 2;
}

int main(int argc, char **argv) {
    return f1(argc) + i1;
}

And then:

gcc -c -O3 main2.c
gcc -O3 -Wl,--gc-sections -o main2.out main2.o

Disassemble main2.o:

objdump -D main2.o

The output contains:

Disassembly of section .text:

0000000000000000 <f1>:
   0:   f3 0f 1e fa             endbr64
   4:   8d 47 01                lea    0x1(%rdi),%eax
   7:   c3                      ret
   8:   0f 1f 84 00 00 00 00    nopl   0x0(%rax,%rax,1)
   f:   00

0000000000000010 <f2>:
  10:   f3 0f 1e fa             endbr64
  14:   8d 47 02                lea    0x2(%rdi),%eax
  17:   c3                      ret

Disassembly of section .data:

0000000000000000 <i2>:
   0:   02 00                   add    (%rax),%al
        ...

0000000000000004 <i1>:
   4:   01 00                   add    %eax,(%rax)
        ...

Disassembly of section .text.startup:

0000000000000000 <main>:
   0:   f3 0f 1e fa             endbr64
   4:   8b 05 00 00 00 00       mov    0x0(%rip),%eax        # a <main+0xa>
   a:   8d 44 38 01             lea    0x1(%rax,%rdi,1),%eax
   e:   c3                      ret

Interesting how main is on a separate section .text.startup, possibly to allow the rest of text to be GC'ed.

We also see that f1 was fully inlined on lea 0x1(%rax,%rdi,1),%eax (directly adds 1), while for reasons I don't understand i1 is still used at mov 0x0(%rip),%eax pending relocation, see also: What do linkers do? The relocation will be clear after disassembling main2.out below.

Disassemble main2.out:

objdump -D main2.out

The output contains:

Disassembly of section .text:

0000000000001040 <main>:
    1040:       f3 0f 1e fa             endbr64
    1044:       8b 05 c2 2f 00 00       mov    0x2fc2(%rip),%eax        # 400c <i1>
    104a:       8d 44 38 01             lea    0x1(%rax,%rdi,1),%eax
    104e:       c3                      ret
    104f:       90                      nop

Disassembly of section .data:

0000000000004008 <i2>:
    4008:       02 00                   add    (%rax),%al
        ...

000000000000400c <i1>:
    400c:       01 00                   add    %eax,(%rax)

and f1 and f2 were entirely removed, because f1 was inlined and therefore not marked as used anymore, so the entire .text section got removed.

If we forced f1 not to be inlined with:

int __attribute__ ((noinline)) f1(int i) {
    return i + 1;
}

then both f1 and f2 would appear on main2.out.

Sections of different object files are separate even though they have the same name

Obviously, e.g.:

notmain2.c

int i3 = 3;
int i4 = 4;

int f3(int i) {
    return i + 3;
}

int f4(int i) {
    return i + 4;
}

and then:

gcc -c -O3 notmain.c
gcc -c -O3 notmain2.c
gcc -O3 -Wl,--gc-sections notmain.o notmain2.o main.c
objdump -D a.out

does not contain f3 and f4, even though f1 and f2 were included, and both are no sections called .text.

Possible downside of: -fdata-sections -ffunction-sections -Wl,--gc-sections: slower link speed

We should find some benchmark, but this is likely, as it would require more relocations to be done when one symbol refers to another symbol from the same compilation unit, as they are no present in independent section anymore.

-flto leads to symbols being removed even if other symbols in the same compilation unite are used

Also, this happens whether or not LTO would lead to inline happening. Consider:

notmain.c

int i1 = 1;
int i2 = 2;

int __attribute__ ((noinline)) f1(int i) {
    return i + 1;
}

int f2(int i) {
    return i + 2;
}

main.c

extern int i1;
int f1(int i);

int main(int argc, char **argv) {
    return f1(argc) + i1;
}

Compile and disassemble:

gcc -c -O3 -flto notmain.c
gcc -O3 -flto notmain.o main.c
objdump -D a.out

The disassembly contains:

Disassembly of section .text:

0000000000001040 <main>:
    1040:       f3 0f 1e fa             endbr64
    1044:       e8 f7 00 00 00          call   1140 <f1>
    1049:       83 c0 01                add    $0x1,%eax
    104c:       c3                      ret

0000000000001140 <f1>:
    1140:       8d 47 01                lea    0x1(%rdi),%eax
    1143:       c3                      ret

and f2 is not present. So f2 was removed even though f1 is used.

We also note that i1 and i2 are gone. The compiler appears to recognize that i1 is never really modified and just "inlines" it as the constant 1 at: add $0x1,%eax.

Related question: Does GCC LTO perform cross-file dead code elimination? For some reason code elimination does not happen if you compile the object file with -O0: Why GCC does not do function dead code elimination with LTO when compiling the object file with -O0?

Tested on Ubuntu 23.04 amd64, GCC 12.2.0.

Frontward answered 15/7, 2023 at 20:20 Comment(0)
A
0

Legacy behaviour recommended to build static libraries with all the optional code and to reduce the compilation unit to the minimum necessary to hold a tiny task (recommended also as a pattern in unix design)

When you link the code and specify a static library (a .a archive) the linker only processes all the compiled modules that are referenced from the initial crt0.o code, and this can be achieved without any section divided compiling code.

We have done this in our code getting a probably not optimum benefit, but allowing us to continue development with a good memory footprint and saving a lot of unused code, but never incurring in issues like making the compiler to investigate that. I always use this lemma: if the feature is not necessary, don't tie to it.

Ashford answered 6/7, 2023 at 9:58 Comment(0)
E
-1

You can use strip binary on object file(eg. executable) to strip all symbols from it.

Note: it changes file itself and don't create copy.

Elyssa answered 24/11, 2015 at 18:43 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.