Is it possible to uniquely identify dynamically imported functions by their name?
Asked Answered
S

3

11

I used

readelf --dyn-sym my_elf_binary | grep FUNC | grep UND

to display the dynamically imported functions of my_elf_binary, from the dynamic symbol table in the .dynsym section to be precise. Example output would be:

 [...]
 3: 00000000     0 FUNC    GLOBAL DEFAULT  UND tcsetattr@GLIBC_2.0 (3)
 4: 00000000     0 FUNC    GLOBAL DEFAULT  UND fileno@GLIBC_2.0 (3)
 5: 00000000     0 FUNC    GLOBAL DEFAULT  UND isatty@GLIBC_2.0 (3)
 6: 00000000     0 FUNC    GLOBAL DEFAULT  UND access@GLIBC_2.0 (3)
 7: 00000000     0 FUNC    GLOBAL DEFAULT  UND open64@GLIBC_2.2 (4)
 [...]

Is it safe to assume that the names associated to these symbols, e.g. the tcsetattr or access, are always unique? Or is it possible, or reasonable*), to have a dynamic symbol table (filtered for FUNC and UND) which contains two entries with the same associated string?

The reason I am asking is that I am looking for a unique identifier for dynamically imported functions ...

*) Wouldn't the dynamic linker resolve all "UND FUNC symbols" with the same name to the same function anyway?

Subsolar answered 15/5, 2015 at 16:43 Comment(1)
I don't know for sure, so I'm not going to 'answer" your questions. I'm rooting for "yes it is safe to assume that" and "yes, the dynamic linker would resolve all those symbols to the same function". But not an answer!Tiphane
C
15

Yes, given a symbol name and the set of libraries an executable is linked against, you can uniquely identify the function. This behavior is required for linking and dynamic linking to work.


An illustrative example

Consider the following two files:

librarytest1.c:

#include <stdio.h>
int testfunction(void)
{
   printf("version 1");
   return 0;
}

and librarytest2.c:

#include <stdio.h>
int testfunction(void)
{
   printf("version 2");
   return 0;
}

Both compiled into shared libraries:

% gcc -fPIC -shared -Wl,-soname,liblibrarytest.so.1 -o liblibrarytest.so.1.0.0 librarytest1.c -lc 
% gcc -fPIC -shared -Wl,-soname,liblibrarytest.so.2 -o liblibrarytest.so.2.0.0 librarytest2.c -lc

Note that we cannot put both functions by the same name into a single shared library:

% gcc -fPIC -shared -Wl,-soname,liblibrarytest.so.0 -o liblibrarytest.so.0.0.0 librarytest1.c librarytest2.c -lc                                                                                                     
/tmp/cctbsBxm.o: In function `testfunction':
librarytest2.c:(.text+0x0): multiple definition of `testfunction'
/tmp/ccQoaDxD.o:librarytest1.c:(.text+0x0): first defined here
collect2: error: ld returned 1 exit status

This shows that symbol names are unique within a shared library, but do not have to be among a set of shared libraries.

% readelf --dyn-syms liblibrarytest.so.1.0.0 | grep testfunction 
12: 00000000000006d0    28 FUNC    GLOBAL DEFAULT   10 testfunction
% readelf --dyn-syms liblibrarytest.so.2.0.0 | grep testfunction 
12: 00000000000006d0    28 FUNC    GLOBAL DEFAULT   10 testfunction

Now lets link our shared libraries with an executable. Consider linktest.c:

int testfunction(void);
int main()
{
  testfunction();
  return 0;
}

We can compile and link this against either shared library:

% gcc -o linktest1 liblibrarytest.so.1.0.0 linktest.c 
% gcc -o linktest2 liblibrarytest.so.2.0.0 linktest.c 

And run each of them (note I'm setting the dynamic library path so the dynamic linker can find the libraries, which are not in a standard library path):

% LD_LIBRARY_PATH=. ./linktest1                    
version 1%                                                                                                              
% LD_LIBRARY_PATH=. ./linktest2
version 2%

Now lets link our executable to both libraries. Each is exporting the same symbol testfunction and each library has a different implementation of that function.

% gcc -o linktest0-1 liblibrarytest.so.1.0.0 liblibrarytest.so.2.0.0 linktest.c
% gcc -o linktest0-2 liblibrarytest.so.2.0.0 liblibrarytest.so.1.0.0 linktest.c

The only difference is the order the libraries are referenced to the compiler.

% LD_LIBRARY_PATH=. ./linktest0-1                                              
version 1%                                                                                                             
% LD_LIBRARY_PATH=. ./linktest0-2
version 2%    

Here are the corresponding ldd output:

% LD_LIBRARY_PATH=. ldd ./linktest0-1 
    linux-vdso.so.1 (0x00007ffe193de000)
    liblibrarytest.so.1 => ./liblibrarytest.so.1 (0x00002b8bc4b0c000)
    liblibrarytest.so.2 => ./liblibrarytest.so.2 (0x00002b8bc4d0e000)
    libc.so.6 => /lib64/libc.so.6 (0x00002b8bc4f10000)
    /lib64/ld-linux-x86-64.so.2 (0x00002b8bc48e8000)
% LD_LIBRARY_PATH=. ldd ./linktest0-2
    linux-vdso.so.1 (0x00007ffc65df0000)
    liblibrarytest.so.2 => ./liblibrarytest.so.2 (0x00002b46055c8000)
    liblibrarytest.so.1 => ./liblibrarytest.so.1 (0x00002b46057ca000)
    libc.so.6 => /lib64/libc.so.6 (0x00002b46059cc000)
    /lib64/ld-linux-x86-64.so.2 (0x00002b46053a4000)

Here we can see that while symbols are not unique, the way the linker resolves them is defined (it appears that it always resolves the first symbol it encounters). Note that this is a bit of a pathological case as you normally wouldn't do this. In the cases where you would go this direction there are better ways of handling symbol naming so they would be unique when exported (symbol versioning, etc)


In summary, yes, you can uniquely identify the function given its name. If there happens to be multiple symbols by that name, you identify the proper one using the order the libraries are resolved in (from ldd or objdump, etc). Yes, in this case you need a bit more information that just its name, but it is possible if you have the executable to inspect.

Chafer answered 15/5, 2015 at 18:57 Comment(4)
Just checking: So if we link our executable to both libraries, the same symbol will appear twice in the dynamic symbol table of the executable BUT the dynamic linker will resolve both occurences to the same value (provided by the library that is looked at first by the dynamic linker). Correct?Subsolar
@stackoverflowwww no, the dynamic symbols of the executable will only have one entry for its undefined function (it doesn't know or care that there is more than one implementation). In fact, the command readelf --dyn-syms linktest0-2 | grep FUNC | grep UND is identical to that for linktest0-1. Each library will have one exported function by the name testfunction and it is the dynamic linker that chooses to map the undefined symbol to one of the global symbols and it seems that it always chooses the first library that provides the global symbol to resolve the undefined symbol.Chafer
This info, together with your first (even more) detailed post, gives an excellent answer! I am impressed. Thank you a lot.Subsolar
Even when I link with one shared object, I see same symbol appearing twice e.g. gzopen64 below (once with single '@' and again with '@@': 180: 0000000000000000 0 FUNC GLOBAL DEFAULT UND gzopen64@ZLIB_1.2.3.3 (6) 6054: 0000000000000000 0 FUNC GLOBAL DEFAULT UND gzopen64@@ZLIB_1.2.3.3 Shouldn't it have single gzopen64 linking to zlib library ?Ludewig
H
3

Note that in your case, the name of the first function import is not just tcsetattr but tcsetattr@GLIBC_2.0. The @ is how the readelf program displays a versioned symbol import.

GLIBC_2.0 is a version tag that glibc uses to stay binary compatible with old binaries in the (unusual but possible) case that the binary interface to one of its functions needs to change. The original .o file produced by the compiler will just import tcsetattr with no version information but during static linking, the linker has noticed that the actual symbol exported by lic.so carries a GLIBC_2.0 tag, and so it creates a binary that insists on importing the particular tcsetattr symbol that has version GLIBC_2.0.

In the future there might be a libc.so that exports one tcsetattr@GLIBC_2.0 and a different tcsetattr@GLIBC_2.42, and the version tag will then be used to find which one a partcular ELF object refers to.

It is possible that the same process may also use tcsetattr@GLIBC_2.42 at the same time, such as if it uses another dynamic library which was linked against a libc.so new enough to provide it. The version tags ensure that both the old binary and the new library get the function they expect from the C library.

Most libraries don't use this mechanism and instead just rename the entire library if they need to make breaking changes to their binary interfaces. For example, if you dump /usr/bin/pngtopnm you'll find that the symbols it imports from libnetpbm and libpng are not versioned. (Or at least that's what I see on my machine).

The cost of this is that you can't have a binary that links against one version of libpng and also links against another library that itself links against a different libpng version; the exported names from the two libpng's would clash.

In most cases this is manageable enough through careful packaging practice that maintaining the library source to produce useful version tags and stay backwards compatible is not worth the trouble.

But in the particular case of the C library and a few other vital system libraries, changing the name of the library would be so extremely painful that it makes sense for the maintainers to jump through some hoops in order to ensure it will never need to happen again.

Helladic answered 15/5, 2015 at 23:1 Comment(4)
So it is the symbol-exporting dynamic library who determines if a symbol in the dynamic symbol table of the importing executable/library has a version tag? In other words, if the exported symbol has a version tag, do ALL executables/libraries importing this symbol automatically add a (required) version tag during compilation?Subsolar
@stackoverflowwww: Yes, that's my understanding. (Except it is "during linking" rather than "during compilation").Helladic
There may be a way to convince the linker to produce a binary that imports a symbol with a particular version tag rather than let the library itself determine it, or even to produce a binary that imports the same symbol with two different version tags for different relocations [certainly if everything else fails use a hex editor on the binary afterwards], but that is definitely not an everyday use case.Helladic
@ Henning: oh yes, it's linking not compilation. Thanks for the correction.Subsolar
H
2

Although in most cases every symbol is unique, there are a handful of exceptions. My favorite is multiple identical symbol import used by PAM (pluggable authentication modules) and NSS (Name Service Switch). In both cases all modules written for either interface use a standard interface with standard names. A common and frequently used example is what happens when you call get host by name. The nss library will call the same function in multiple libraries to get an answer. A common configuration calles the same function in three libraries! I have seen the same function called in five different libraries from one function call, and that was not the limit just what was useful. There is special calls to the dynamic linker need to do this and I have not familiarised myself with the mechanics of doing this, but there is nothing special about the linking of the library module that is so loaded.

Hylozoism answered 15/5, 2015 at 23:56 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.