Remove dead code when linking static library into dynamic library
Asked Answered
S

2

10

Suppose I have the following files:

libmy_static_lib.c:

#include <stdio.h>
void func1(void){
    printf("func1() called from a static library\n");
}
void unused_func1(void){
    printf("printing from the unused function1\n");
}
void unused_func2(void){
    printf("printing from unused function2\n");
}

libmy_static_lib.h:

void func(void);
void unused_func1(void);
void unused_func2(void);

my_prog.c:

#include "libmy_static_lib.h"
#include <stdio.h>
void func_in_my_prog()
{
    printf("in my prog\n");
    func1();

}

And here is how I link the library:

# build the static library libmy_static_lib.a
gcc -fPIC -c -fdata-sections --function-sections -c libmy_static_lib.c -o libmy_static_lib.o
ar rcs libmy_static_lib.a libmy_static_lib.o

# build libmy_static_lib.a into a new shared library
gcc -fPIC -c ./my_prog.c -o ./my_prog.o
gcc -Wl,--gc-sections -shared -m64 -o libmy_shared_lib.so ./my_prog.o -L. -l:libmy_static_lib.a

There are 2 functions in libmy_static_lib.c that are not used, and from this post, I think

gcc fdata-sections --function-sections

should create a symbol for each function, and

gcc -Wl,--gc-sections

should remove the unused symbols when linking

however when I run

nm libmy_shared_lib.so

It is showing that these 2 unused functions are also being linked into the shared library.

Any suggestions on how to have gcc remove the unused functions automatically?

Edit: I am able to use the above options for gcc to remove the unused functions if I am linking a static library directly to executable. But it doesn't remove the unused functions if I link the static library to a shared library.

Stalky answered 15/6, 2018 at 19:26 Comment(8)
the more portable way to do this is to put each function into a separate translation unit.Benedix
your .h is wrong, it should read void func1(void); not void func(void);Cultivar
you might try using -flto if --gc-sections isn't supported on your platform. Check this answer here #6688130Jewelry
@Jean-FrançoisFabre no warnings when compiling and I fixed the .h but the unused functions are still thereStalky
Possible duplicate of How to remove unused C/C++ symbols with GCC and ld?Jewelry
Actually I tested this and the functions were removed iff not compiling-sharedBenedix
@Jewelry not a duplicate. J Lui ^ perhaps link that one and state that it doesn't help.Benedix
Though - the question is... what symbol is needed - since you don't have main is there any? :DBenedix
A
7

You can use a version script to mark the entry points in combination with -ffunction-sections and --gc-sections.

For example, consider this C file (example.c):

int
foo (void)
{
  return 17;
}

int
bar (void)
{
  return 251;
}

And this version script, called version.script:

{
  global: foo;
  local: *;
};

Compile and link the sources like this:

gcc -Wl,--gc-sections -shared -ffunction-sections -Wl,--version-script=version.script example.c

If you look at the output of objdump -d --reloc a.out, you will notice that only foo is included in the shared object, but not bar.

When removing functions in this way, the linker will take indirect dependencies into account. For example, if you turn foo into this:

void *
foo (void)
{
  extern int bar (void);
  return bar;
}

the linker will put both foo and bar into the shared object because both are needed, even though only bar is exported.

(Obviously, this will not work on all platforms, but ELF supports this.)

Aether answered 15/6, 2018 at 20:33 Comment(1)
Do you have any idea if the version script can work for LinkTimeOptimization for shared library? I wonder how we can specify entry functions when compiling shared library with lto support, then all unused functions in dependent .a files can be removed from the final .so binary.Duplex
A
4

You're creating a library, and your symbols aren't static, so it's normal that the linker doesn't remove any global symbols.

This -gc-sections option is designed for executables. The linker starts from the entrypoint (main) and discovers the function calls. It marks the sections that are used, and discards the others.

A library doesn't have 1 entrypoint, it has as many entrypoints as global symbols, which explains that it cannot clean your symbols. What if someone uses your .h file in his program and calls the "unused" functions?

To find out which functions aren't "used", I'd suggest that you convert void func_in_my_prog() to int main() (or copy the source into a modified one containing a main()), then create an executable with the sources, and add -Wl,-Map=mapfile.txt option when linking to create a mapfile.

gcc -Wl,--gc-sections -Wl,--Map=mapfile.txt -fdata-sections -ffunction-sections libmy_static_lib.c my_prog.c

This mapfile contains the discarded symbols:

Discarded input sections

 .drectve       0x00000000       0x54 c:/gnatpro/17.1/bin/../lib/gcc/i686-pc-mingw32/6.2.1/crt2.o
 .drectve       0x00000000       0x1c c:/gnatpro/17.1/bin/../lib/gcc/i686-pc-
 ...
 .text$unused_func1
                0x00000000       0x14 C:\Users\xx\AppData\Local\Temp\ccOOESqJ.o
 .text$unused_func2
                0x00000000       0x14 C:\Users\xx\AppData\Local\Temp\ccOOESqJ.o
 .rdata$zzz     0x00000000       0x38 C:\Users\xx\AppData\Local\Temp\ccOOESqJ.o
  ...

now we see that the unused functions have been removed. They don't appear in the final executable anymore.

There are existing tools that do that (using this technique but not requiring a main), for instance Callcatcher. One can also easily create a tool to disassemble the library and check for symbols defined but not called (I've written such tools in python several times and it's so much easier to parse assembly than from high-level code)

To cleanup, you can delete the unused functions manually from your sources (one must be careful with object-oriented languages and dispatching calls when using existing/custom assembly analysis tools. On the other hand, the compiler isn't going to remove a section that could be used, so that is safe)

You can also remove the relevant sections in the library file, avoiding to change source code, for instance by removing sections:

$ objcopy --remove-section .text$unused_func1 --remove-section text$unused_func2 libmy_static_lib.a  stripped.a    
$ nm stripped.a

libmy_static_lib.o:
00000000 b .bss
00000000 d .data
00000000 r .rdata
00000000 r .rdata$zzz
00000000 t .text
00000000 t .text$func1
00000000 T _func1
         U _puts
Argive answered 15/6, 2018 at 19:53 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.