Hiding symbol names in library
Asked Answered
M

4

13

I want to hide symbol names which are not relevant to the end user and make visible only APIs in my shared or static library. I have a simple code like:

int f_b1(){
return 21 ;
}

int f_b3(){
return f_b1() ;
}

I applied the all methods stated here such as using __attribute__ ((visibility ("hidden"))) and static but without success. My operating system is Ubuntu on an x86_64 processor. Do I need to use special options while compiling with gcc? I am listing modules and function of libraries with nm command. In my example above I only want to make the f_b3 function visible. When I use attribute hidden macro compiler does not give any error but the function still exists in list outputted by the nm command.

Medius answered 7/3, 2014 at 7:49 Comment(0)
F
50

The visibility("hidden") attribute does not suppress a symbol from an object file and cannot prevent a symbol being extracted by nm. It just instructs the dynamic linker that the symbol cannot be called from outside a shared library that contains it.

Consider a source file file.c containing your example functions:

int f_b1(){
return 21 ;
}

int f_b3(){
return f_b1() ;
}

Compile the file:

gcc -c -o file.o file.c

Run nm file.o to list the symbols. Output:

0000000000000000 T f_b1
000000000000000b T f_b3

Now run objdump -t file.o for fuller information about the symbols. Output:

file.o:     file format elf64-x86-64

SYMBOL TABLE:
0000000000000000 l    df *ABS*  0000000000000000 file.c
0000000000000000 l    d  .text  0000000000000000 .text
0000000000000000 l    d  .data  0000000000000000 .data
0000000000000000 l    d  .bss   0000000000000000 .bss
0000000000000000 l    d  .note.GNU-stack    0000000000000000 .note.GNU-stack
0000000000000000 l    d  .eh_frame  0000000000000000 .eh_frame
0000000000000000 l    d  .comment   0000000000000000 .comment
0000000000000000 g     F .text  000000000000000b f_b1
000000000000000b g     F .text  000000000000000b f_b3

Here we see that f_b1 and f_b3 are global (g) functions (F) in the .text section.

Now modify the file like this:

__attribute__((visibility ("hidden"))) int f_b1(void){
return 21 ;
}

__attribute__((visibility ("hidden"))) int f_b3(void){
return f_b1() ;
}

Run objdump again:

file.o:     file format elf64-x86-64

SYMBOL TABLE:
0000000000000000 l    df *ABS*  0000000000000000 file.c
0000000000000000 l    d  .text  0000000000000000 .text
0000000000000000 l    d  .data  0000000000000000 .data
0000000000000000 l    d  .bss   0000000000000000 .bss
0000000000000000 l    d  .note.GNU-stack    0000000000000000 .note.GNU-stack
0000000000000000 l    d  .eh_frame  0000000000000000 .eh_frame
0000000000000000 l    d  .comment   0000000000000000 .comment
0000000000000000 g     F .text  000000000000000b .hidden f_b1
000000000000000b g     F .text  000000000000000b .hidden f_b3

The output is the same, except that the symbols f_b1 and f_b3 are now marked .hidden. They still have external (global) linkage and could be statically called, for example, from other modules within a library that contains them, but could not be dymamically called from outside that library.

So, if you want to conceal f_b1 and f_b3 from dynamic linkage in a shared library, you can use visibility ("hidden") as shown.

If you want to conceal f_b1 and f_b3 from static linkage in a static library, you cannot use the visibility attribute to do that at all.

In the case of a static library, you can "hide" a symbol only be giving it internal instead of external linkage. The way to do that is by prefixing the standard static keyword. But internal linkage means that the symbol is visible only within its own compilation unit: it can't be referenced from other modules. It is not available to the linker at all.

Modify file.c again, like this:

static int f_b1(void){
return 21 ;
}

static int f_b3(void){
return f_b1() ;
}

And run objump again:

file.o:     file format elf64-x86-64

SYMBOL TABLE:
0000000000000000 l    df *ABS*  0000000000000000 file.c
0000000000000000 l    d  .text  0000000000000000 .text
0000000000000000 l    d  .data  0000000000000000 .data
0000000000000000 l    d  .bss   0000000000000000 .bss
0000000000000000 l     F .text  000000000000000b f_b1
000000000000000b l     F .text  000000000000000b f_b3
0000000000000000 l    d  .note.GNU-stack    0000000000000000 .note.GNU-stack
0000000000000000 l    d  .eh_frame  0000000000000000 .eh_frame
0000000000000000 l    d  .comment   0000000000000000 .comment

You see that f_b1 and f_b3 are still reported as functions in the .text section, but are now classified local (l), not global. That is internal linkage. Run nm file.o and the output is:

0000000000000000 t f_b1
000000000000000b t f_b3

That is the same as for the original file, except that instead of 'T' flags we now have 't' flags. Both flags mean that the symbol is in the .text section, but 'T' means it is global and 't' means it is local.

Apparently, what you would like nm to report for this file is no symbols at all. You should now understand that nm file.o will report a symbol if it exists in file.o, but its existence has got nothing to do with whether it is visible for static or dynamic linkage.

To make the function symbols disappear, compile file.c yet again (still with the static keyword), this time with optimisation enabled:

gcc -c -O1 -o file.o file.c

Now, objdump reports:

file.o:     file format elf64-x86-64

SYMBOL TABLE:
0000000000000000 l    df *ABS*  0000000000000000 file.c
0000000000000000 l    d  .text  0000000000000000 .text
0000000000000000 l    d  .data  0000000000000000 .data
0000000000000000 l    d  .bss   0000000000000000 .bss
0000000000000000 l    d  .note.GNU-stack    0000000000000000 .note.GNU-stack
0000000000000000 l    d  .comment   0000000000000000 .comment

f_b1 and f_b3 are gone, and nm file.o reports nothing at all. Why? Because static tells the compiler that these symbols can only be called from within the file it is compiling, and optimisation decides that there is no need to refer to them; so the compiler eliminates them from the object code. But if they weren't already invisible to linker, without optimisation, then we couldn't optimise them away.

Bottom line: It doesn't matter whether nm can extract a symbol. If the symbol is local/internal, it can't be linked, either statically or dynamically. If the symbol is marked .hidden then it can't be dynamically linked. You can use visibility("hidden") to mark a symbol .hidden. Use the standard static keyword to make a symbol local/internal.

Farwell answered 7/3, 2014 at 15:33 Comment(11)
Is there a way to hide object file names also? My nm command outputs like this. b.o: 0000000000000000 T f_b2 0000000000000010 T f_b3 c.o: I would like to hide b.o and c.o names, as well. Thanks..Medius
Yes: Run strip -s file.o. See man strip. Note that when you remove all symbols you can't debug the object file.Farwell
strip -s file.o removes all symbols, not only can I debug but I also can not use any functions since their symbols does not exists on symbol table. I only need to remove object file names from the library.Medius
Another problem exists with class methods, that is if a class method's implementation is defined in header file the symbol name does not appear on symbol table. However, if it is defined in .cpp file separately they appears, whether they are static or not.Medius
with strip --strip-debug libtest.a it is possible to remove 0000000000000000 l df *ABS* 0000000000000000 file.c section from the library or object file which presents file names but object file names still appears on nm command, which shows the section names. I could not remove them yet.Medius
Re. class member symbols. In C++ a member function defined within the class definition will be inlined if the compiler can reasonably do so; otherwise not, unless the inline keyword is applied. If a definition is inlined then it need not be called, so no symbol is needed by which to call it.Farwell
Re. purging filenames. I was referring to -s|--strip-all, not -S|--strip-debug. Anyway, you can't remove the object filenames from a .a archive because it is an archive of the object files, and I can't see why you would think you need to.Farwell
As I understand it is not possible to hide class member symbols without changing structure of classes. Using inline keyword converts methods to static which can not use internal variables of class which are not static. You again emphasize on -s|--strip-all command but it removes all symbols and it makes library unusable since the linker can not find any symbols. Am I missing something here? I only would like to make visible necessary APIs and hide all internal data. Is there a way to change symbol names on an object/library?Medius
@Mike Kinghan "It doesn't matter whether nm can extract a symbol." I disagree considering the potential desire to obfuscate the library from those who might have an easier time reverse engineering your machine instructions.Voltmer
Don't miss the --version_script for GNU ld and compatible linkers.Elbowroom
@Mike Kinghan objectdump -t libfoo.so | grep -i hidden outputs nothing, whereas objectdump -t foo.o | grep -i hidden outputs many symbols to the console. Why?Flighty
S
12

I realize this is already an old thread. However, I'd like to share some facts about static linking in the sense of making hidden symbols local and hence prevent those symbols from (global) static linkage in an object file or static library. This does not mean making them invisible in the symbol table.

Mike Kingham's answer is very useful but not complete with respect to the following detail:

If you want to conceal f_b1 and f_b3 from static linkage in a static library, you cannot use the visibility attribute to do that at all.

Let me show that hidden symbols can certainly be made local by using the example of the simple code in file.c and applying ypsu's answer in Symbol hiding in static libraries built with Xcode/gcc. As a first step let's reproduce the objdump output with the hidden attribute visible on f_b1 and f_b3. This can be done by the following command, which gives all functions in file.c the hidden attribute:

gcc -fvisibility=hidden -c file.c

Output of objdump -t file.o gives

file.o:     file format elf64-x86-64

SYMBOL TABLE:
0000000000000000 l    df *ABS*  0000000000000000 file.c
0000000000000000 l    d  .text  0000000000000000 .text
0000000000000000 l    d  .data  0000000000000000 .data
0000000000000000 l    d  .bss   0000000000000000 .bss
0000000000000000 l    d  .note.GNU-stack    0000000000000000 .note.GNU-stack
0000000000000000 l    d  .eh_frame  0000000000000000 .eh_frame
0000000000000000 l    d  .comment   0000000000000000 .comment
0000000000000000 g     F .text  000000000000000b .hidden f_b1
000000000000000b g     F .text  0000000000000010 .hidden f_b3

This is exactly the same intermediate result as obtained by Mike Kingham. Now let's make the symbols with the hidden attribute local. That is accomplished by using objcopy from binutils as follows:

objcopy --localize-hidden --strip-unneeded file.o

Using objdump, gives

file.o:     file format elf64-x86-64

SYMBOL TABLE:
0000000000000000 l    d  .text  0000000000000000 .text
0000000000000000 l     F .text  000000000000000b .hidden f_b1
000000000000000b l     F .text  0000000000000010 .hidden f_b3
0000000000000000 l    d  .data  0000000000000000 .data
0000000000000000 l    d  .bss   0000000000000000 .bss
0000000000000000 l    d  .comment   0000000000000000 .comment
0000000000000000 l    d  .note.GNU-stack    0000000000000000 .note.GNU-stack
0000000000000000 l    d  .eh_frame  0000000000000000 .eh_frame

Likewise, nm file.o gives

0000000000000000 t f_b1
000000000000000b t f_b3

Although f_b1 and f_b3 are still visible in the the symbol table they are local. Hence, functions f_b1 and f_b3 are concealed from static linking!

I'd also like to add a note on declaring functions static and by that having the possibility to remove them from the symbol table completely. First, the removal can be done deterministically and not depending on compiler optimization by using objcopy.

objcopy --strip-unneeded file.o

The static functions f_b1 and f_b2 are not anymore in the symbol table of file.o.

Secondly, this use of declaring functions static to let them disappear from the symbol table only works in single source file C-projects. As soon as a C-project consists of many components and hence files this only can be done by merging all C-source and -header files into one single source file and declare all internal interfaces (functions) static with obvious the exception of the global (top) interface. If that is not possible, one can fallback to the method originally described by ypsu (and probably many others - see for instance Restricting symbols in a Linux static library).

Sarcophagus answered 21/6, 2017 at 10:54 Comment(0)
D
2

Actually, in the ELF structure there are 2 symbol tables: "symtab" and "dynsym". In my custom libs, I'm always stripping all the symbols, because they are not needed for proper linking - i.e. the "symtab" (which is printed by the "nm" utility) can be empty, because the linker is actually using the "dynsym" table. This allows to reduce the lib size by ~10-20% (typically)

The functions with "hidden" attribute are removed only from "symtab", but they can still be visible in the "dynsym" table.

You can verify this by using:

readelf --syms --dyn-syms <your dso here>

The "dynsym" table always contains all the entries needed by the linker, including f.e. the STD:: functions, marked as "UND" (undefined -> to be resolved by the linker)

Regards.

Disarray answered 19/1, 2020 at 21:8 Comment(0)
S
1

Note that for MacOS/iOS the linker has some extra options to control symbol visibility;

  • -[un|re]exported_symbols_list
  • -[un]exported_symbol

For more information check e.g. the ld64 documentation or have a look here.

Shotputter answered 25/10, 2017 at 10:57 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.