How to list on-the-fly all the functions/symbols available in C code on a Linux architecture?
Asked Answered
T

5

28

Assume main.c uses symbols from shared libs and local functions declared in main.c.

Is there a nice and elegant way to print a list of all the available function names and symbols at run time?

It should be possible since the data is loaded to the .code segment.

Teethe answered 3/4, 2013 at 4:58 Comment(1)
not C library functions but API from import/Export section is possible., Where C function is possible if they uses as labels in Code sectionProviding
V
32

Since I had the same need to retrieve all loaded symbol names at runtime, I did some research based upon R..'s answer. So here is a detailed solution for linux shared libraries in ELF format which works with my gcc 4.3.4, but hopefully also with newer versions.

I mostly used the following sources to develop this solution:

And here's my code. I used self explaining variable names and added detailed comments to make it understandable. If something is wrong or missing, please let me know... (Edit: I just realized that the question was for C and my code is for C++. But if you leave out the vector and the string it should work for C as well)

#include <link.h>
#include <string>
#include <vector>

using namespace std;

/* Callback for dl_iterate_phdr.
 * Is called by dl_iterate_phdr for every loaded shared lib until something
 * else than 0 is returned by one call of this function.
 */
int retrieve_symbolnames(struct dl_phdr_info* info, size_t info_size, void* symbol_names_vector) 
{

    /* ElfW is a macro that creates proper typenames for the used system architecture
     * (e.g. on a 32 bit system, ElfW(Dyn*) becomes "Elf32_Dyn*") */
    ElfW(Dyn*) dyn;
    ElfW(Sym*) sym;
    ElfW(Word*) hash;

    char* strtab = 0;
    char* sym_name = 0;
    ElfW(Word) sym_cnt = 0;

    /* the void pointer (3rd argument) should be a pointer to a vector<string>
     * in this example -> cast it to make it usable */
    vector<string>* symbol_names = reinterpret_cast<vector<string>*>(symbol_names_vector);

    /* Iterate over all headers of the current shared lib
     * (first call is for the executable itself) */
    for (size_t header_index = 0; header_index < info->dlpi_phnum; header_index++)
    {

        /* Further processing is only needed if the dynamic section is reached */
        if (info->dlpi_phdr[header_index].p_type == PT_DYNAMIC)
        {

            /* Get a pointer to the first entry of the dynamic section.
             * It's address is the shared lib's address + the virtual address */
            dyn = (ElfW(Dyn)*)(info->dlpi_addr +  info->dlpi_phdr[header_index].p_vaddr);

            /* Iterate over all entries of the dynamic section until the
             * end of the symbol table is reached. This is indicated by
             * an entry with d_tag == DT_NULL.
             *
             * Only the following entries need to be processed to find the
             * symbol names:
             *  - DT_HASH   -> second word of the hash is the number of symbols
             *  - DT_STRTAB -> pointer to the beginning of a string table that
             *                 contains the symbol names
             *  - DT_SYMTAB -> pointer to the beginning of the symbols table
             */
            while(dyn->d_tag != DT_NULL)
            {
                if (dyn->d_tag == DT_HASH)
                {
                    /* Get a pointer to the hash */
                    hash = (ElfW(Word*))dyn->d_un.d_ptr;

                    /* The 2nd word is the number of symbols */
                    sym_cnt = hash[1];

                }
                else if (dyn->d_tag == DT_STRTAB)
                {
                    /* Get the pointer to the string table */
                    strtab = (char*)dyn->d_un.d_ptr;
                }
                else if (dyn->d_tag == DT_SYMTAB)
                {
                    /* Get the pointer to the first entry of the symbol table */
                    sym = (ElfW(Sym*))dyn->d_un.d_ptr;


                    /* Iterate over the symbol table */
                    for (ElfW(Word) sym_index = 0; sym_index < sym_cnt; sym_index++)
                    {
                        /* get the name of the i-th symbol.
                         * This is located at the address of st_name
                         * relative to the beginning of the string table. */
                        sym_name = &strtab[sym[sym_index].st_name];

                        symbol_names->push_back(string(sym_name));
                    }
                }

                /* move pointer to the next entry */
                dyn++;
            }
        }
    }

    /* Returning something != 0 stops further iterations,
     * since only the first entry, which is the executable itself, is needed
     * 1 is returned after processing the first entry.
     *
     * If the symbols of all loaded dynamic libs shall be found,
     * the return value has to be changed to 0.
     */
    return 1;

}

int main()
{
    vector<string> symbolNames;
    dl_iterate_phdr(retrieve_symbolnames, &symbolNames);

    return 0;
}
Vasques answered 3/6, 2013 at 12:50 Comment(4)
Using the DT_HASH to get the symbol count appears to be unreliable. When I run the above code there never is a DT_HASH interestingly. Also, symbol_count should be initialized to 0 or shenanigans ensue.Plainclothesman
It turns out its possible to get a DT_GNU_HASH instead of a DT_HASH. Does anyone know how to get the sym_cnt out of a gnu hash instead?Plainclothesman
@justin.m.chase: DT_GNU_HASH provides no easy way to get the symbol count without simply walking all the hash buckets and counting. You can see my code to do it here: git.musl-libc.org/cgit/musl/tree/src/ldso/…Coup
Thank you for this snippet, I found that calling it from a .so itself gives you DT_GNU_HASH that is relevant and you need some code more liike this github.com/axlecrusher/hgengine3/blob/C++/Mercury3/… which itself does not solve all my issues.Ency
C
13

On dynamic-linked ELF-based systems, you may have a function dl_iterate_phdr available. If so, it can be used to gather information on each loaded shared library file, and the information you get is sufficient to examine the symbol tables. The process is basically:

  1. Get the address of the program headers from the dl_phdr_info structure passed back to you.
  2. Use the PT_DYNAMIC program header to find the _DYNAMIC table for the module.
  3. Use the DT_SYMTAB, DT_STRTAB, and DT_HASH entries of _DYNAMIC to find the list of symbols. DT_HASH is only needed to get the length of the symbol table, since it doesn't seem to be stored anywhere else.

The types you need should all be in <elf.h> and <link.h>.

Coup answered 3/4, 2013 at 5:10 Comment(2)
What about symbols that are not dynamically linked? Or, is something like libc also a shared library?Wriest
Yes, assuming you're using dynamic linking, "libc" is a shared library and you can get its symbol table this way too.Coup
P
6

This is not really C specific, but operating system and binary format and (for debugging symbols and unmangled C++ symbol names) even compiler specific question. There is no generic way, and also no truly elegant way.

The most portable and future-proof way is probably running external program such as nm, which is in POSIX. GNU version found in Linuxes probably has a bunch of extensions, which you should avoid if you aim for portability and future-proofness.

Its output should stay stable, and even if binary formats change, it will also get updated and keep working. Just run it with right switches, capture its output (probably by running it through popen to avoid a temp file) and parse that.

Picofarad answered 3/4, 2013 at 5:50 Comment(0)
L
5

I updated the code from Kanalpiroge's answer so it also works in case when DT_HASH is missing (for example, RHEL). It is for 64 bit, but it is relatively easy to modify it to support 32 bit as well. The inspiration came from here: https://chromium-review.googlesource.com/c/crashpad/crashpad/+/876879/18/snapshot/elf/elf_image_reader.cc#b512.

#include <link.h>
#include <string>
#include <vector>

using namespace std;

static uint32_t GetNumberOfSymbolsFromGnuHash(Elf64_Addr gnuHashAddress)
{
    // See https://flapenguin.me/2017/05/10/elf-lookup-dt-gnu-hash/ and
    // https://sourceware.org/ml/binutils/2006-10/msg00377.html
    typedef struct
    {
        uint32_t nbuckets;
        uint32_t symoffset;
        uint32_t bloom_size;
        uint32_t bloom_shift;
    } Header;

    Header* header = (Header*)gnuHashAddress;
    const void* bucketsAddress = (void*)gnuHashAddress + sizeof(Header) + (sizeof(uint64_t) * header->bloom_size);

    // Locate the chain that handles the largest index bucket.
    uint32_t lastSymbol = 0;
    uint32_t* bucketAddress = (uint32_t*)bucketsAddress;
    for (uint32_t i = 0; i < header->nbuckets; ++i)
    {
        uint32_t bucket = *bucketAddress;
        if (lastSymbol < bucket)
        {
            lastSymbol = bucket;
        }
        bucketAddress++;
    }

    if (lastSymbol < header->symoffset)
    {
        return header->symoffset;
    }

    // Walk the bucket's chain to add the chain length to the total.
    const void* chainBaseAddress = bucketsAddress + (sizeof(uint32_t) * header->nbuckets);
    for (;;)
    {
        uint32_t* chainEntry = (uint32_t*)(chainBaseAddress + (lastSymbol - header->symoffset) * sizeof(uint32_t));
        lastSymbol++;

        // If the low bit is set, this entry is the end of the chain.
        if (*chainEntry & 1)
        {
            break;
        }
    }

    return lastSymbol;
}

/* Callback for dl_iterate_phdr.
 * Is called by dl_iterate_phdr for every loaded shared lib until something
 * else than 0 is returned by one call of this function.
 */
int retrieve_symbolnames(struct dl_phdr_info* info, size_t info_size, void* symbol_names_vector) 
{

    /* ElfW is a macro that creates proper typenames for the used system architecture
     * (e.g. on a 32 bit system, ElfW(Dyn*) becomes "Elf32_Dyn*") */
    ElfW(Dyn*) dyn;
    ElfW(Sym*) sym;
    ElfW(Word*) hash;

    char* strtab = 0;
    char* sym_name = 0;
    ElfW(Word) sym_cnt = 0;

    /* the void pointer (3rd argument) should be a pointer to a vector<string>
     * in this example -> cast it to make it usable */
    vector<string>* symbol_names = reinterpret_cast<vector<string>*>(symbol_names_vector);

    /* Iterate over all headers of the current shared lib
     * (first call is for the executable itself) */
    for (size_t header_index = 0; header_index < info->dlpi_phnum; header_index++)
    {

        /* Further processing is only needed if the dynamic section is reached */
        if (info->dlpi_phdr[header_index].p_type == PT_DYNAMIC)
        {

            /* Get a pointer to the first entry of the dynamic section.
             * It's address is the shared lib's address + the virtual address */
            dyn = (ElfW(Dyn)*)(info->dlpi_addr +  info->dlpi_phdr[header_index].p_vaddr);

            /* Iterate over all entries of the dynamic section until the
             * end of the symbol table is reached. This is indicated by
             * an entry with d_tag == DT_NULL.
             *
             * Only the following entries need to be processed to find the
             * symbol names:
             *  - DT_HASH   -> second word of the hash is the number of symbols
             *  - DT_STRTAB -> pointer to the beginning of a string table that
             *                 contains the symbol names
             *  - DT_SYMTAB -> pointer to the beginning of the symbols table
             */
            while(dyn->d_tag != DT_NULL)
            {
                if (dyn->d_tag == DT_HASH)
                {
                    /* Get a pointer to the hash */
                    hash = (ElfW(Word*))dyn->d_un.d_ptr;

                    /* The 2nd word is the number of symbols */
                    sym_cnt = hash[1];

                }
                else if (dyn->d_tag == DT_GNU_HASH && sym_cnt == 0)
                {
                    sym_cnt = GetNumberOfSymbolsFromGnuHash(dyn->d_un.d_ptr);
                }
                else if (dyn->d_tag == DT_STRTAB)
                {
                    /* Get the pointer to the string table */
                    strtab = (char*)dyn->d_un.d_ptr;
                }
                else if (dyn->d_tag == DT_SYMTAB)
                {
                    /* Get the pointer to the first entry of the symbol table */
                    sym = (ElfW(Sym*))dyn->d_un.d_ptr;


                    /* Iterate over the symbol table */
                    for (ElfW(Word) sym_index = 0; sym_index < sym_cnt; sym_index++)
                    {
                        /* get the name of the i-th symbol.
                         * This is located at the address of st_name
                         * relative to the beginning of the string table. */
                        sym_name = &strtab[sym[sym_index].st_name];

                        symbol_names->push_back(string(sym_name));
                    }
                }

                /* move pointer to the next entry */
                dyn++;
            }
        }
    }

    /* Returning something != 0 stops further iterations,
     * since only the first entry, which is the executable itself, is needed
     * 1 is returned after processing the first entry.
     *
     * If the symbols of all loaded dynamic libs shall be found,
     * the return value has to be changed to 0.
     */
    return 1;

}

int main()
{
    vector<string> symbolNames;
    dl_iterate_phdr(retrieve_symbolnames, &symbolNames);

    return 0;
}
Lighting answered 18/7, 2019 at 16:50 Comment(0)
C
1

It should be dl_iterate_phdr(retrieve_symbolnames, &symbolNames);

Cy answered 18/1, 2014 at 4:21 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.