How can I find the dynamic libraries required by an ELF Binary in C++?
Asked Answered
W

4

11

How can I get a list of all the dynamic libraries that is required by an elf binary in linux using C++?

Once I've managed to extract the information (filename?) from the binary I can find the actual file by searching through the PATH, but I haven't been able to find any information regarding extracting unmangled information from the ELF binary.

Thoughts?

Windowpane answered 24/3, 2014 at 14:50 Comment(8)
If you're on a RHEL-based distro (with rpm in the basement of package management), you can try this script in order to resolve the dependencies of your binary not only to *.so files but to the packages that provide them too.Fearfully
Why do you ask? Do you care about indirect dependencies (ie executable foo dynamically linking libbar.so which itself is dynamically linking libgee.so, so ldd foo will tell about both libbar.so and libgee.so)?Argenteuil
I ask because I'm using a static analysis tool and I need to extract the CFG from the targeted binary as well as any dynamic libraries that it depends on.Windowpane
Direct with readelf -d #6243261 , indirect with ldd: unix.stackexchange.com/questions/120015/…Merovingian
#1173149Merovingian
On linux at least, you can use dl_iterate_phdr to iterate through the dynamically loaded program headers of the calling program, which includes the binary itself and the loaded dynamic libraries.Squamation
Possible duplicate of Show all libraries used by executables on linuxMerovingian
@JohnSmith I know it's an old question but take a look at the code I left in my answer.Shankle
E
13

You can call "readelf -d" program and parse the output:

readelf -d /usr/bin/readelf | grep NEEDED
 0x0000000000000001 (NEEDED)             Shared library: [libz.so.1]
 0x0000000000000001 (NEEDED)             Shared library: [libc.so.6]
Epicycle answered 24/3, 2014 at 15:10 Comment(0)
K
12

The list of required shared objects is stored in the so-called dynamic section of the executable. The rough algorithm of getting the necessary info would be something like this:

  1. Parse the ELF header, check that the file is a dynamic executable (ET_EXEC or ET_DYN).
  2. Get the offset and count of the program headers (e_phoff/e_phnum/e_phentsize), check that they're non-zero and valid.
  3. parse the program headers, looking for the PT_DYNAMIC one. Also remember virtual address -> file offset mappings for the PT_LOAD segments.
  4. Once found, parse the dynamic section. Look for the DT_NEEDED and DT_STRTAB entries.

The d_val field of the DT_NEEDED entries is the offset into the DT_STRTAB's string table, which will be the SONAME of the required libraries. Note that since DT_STRTAB entry is the run-time address and not the offset of the string table, you'll need to map it back to a file offset using information stored at step 3.

Kant answered 24/3, 2014 at 15:26 Comment(2)
Thank you, I thought this would be a fairly common problem, accessing information from a binary that is. Is there any library, open source project or similar that can provide this feature without implementing, and maintaining it, on my own?Windowpane
AFAIK most people don't need it in their programs so they rely on readelf, objdump or ldd for scripting. For programmatic access here's libelf but it does not offer a ready-to-use API for this specific task - you'd still have to parse the dynamic section manually.Kant
S
0

You can use libelf to do this. Notice that libelf has a C API.

From their tutorial here, look at the example in section 4.2 (or here) on how to get the Program Header Table. Find the DT_DYNAMIC section and read the dependences from the string tables like the example in section 5.4 (or here).

-- EDIT --

I actually had the chance to write the code. Here is what I've done:

#include <assert.h>
#include <fcntl.h>
#include <gelf.h>
#include <stdio.h>
#include <unistd.h>

void print_dt_needed(const char *elf_path) {
  assert(elf_version(EV_CURRENT) != EV_NONE);

  int fd = open(elf_path, O_RDWR, 0);
  assert(fd >= 0);

  Elf *elf = elf_begin(fd, ELF_C_READ, NULL);
  assert(elf != NULL);
  assert(elf_kind(elf) == ELF_K_ELF);

  Elf_Scn *scn = NULL;
  while ((scn = elf_nextscn(elf, scn)) != NULL) {
    GElf_Shdr shdr = {};
    assert(gelf_getshdr(scn, &shdr) == &shdr);

    if (shdr.sh_type == SHT_DYNAMIC) {
      Elf_Data *data = NULL;
      data = elf_getdata(scn, data);
      assert(data != NULL);

      size_t sh_entsize = gelf_fsize(elf, ELF_T_DYN, 1, EV_CURRENT);

      for (size_t i = 0; i < shdr.sh_size / sh_entsize; i++) {
        GElf_Dyn dyn = {};
        assert(gelf_getdyn(data, i, &dyn) == &dyn);

        if (dyn.d_tag == DT_NEEDED) {
          printf("DT_NEEDED detected: %s\n",
                 elf_strptr(elf, shdr.sh_link, dyn.d_un.d_val));
        }
      }
    }
  }
  assert(elf_end(elf) == 0);
  assert(close(fd) == 0);
}

int main(int argc, char const *argv[]) {
  print_dt_needed(argv[1]);
  return 0;
}
Shankle answered 23/6, 2020 at 1:33 Comment(0)
C
0

I have a code to print dependencies without libelf, only by using libc here:

#include <stdio.h>
#include <string.h>
#include <elf.h>

int print_dependencies(const char *file_name)
{
    Elf64_Ehdr ehdr;
    Elf64_Shdr shdr, shdr_shstrtab, shdr_dynstr;
    Elf64_Phdr phdr;
    Elf64_Dyn dyn;
    long int oldpos, dynpos;
    int dyncount;
    char sname[1000];
    FILE *f = fopen(file_name, "r");
    if(!f) { return 1; }
    if(fseek(f, 0, SEEK_SET) != 0) { fclose(f); return 1; }
    if(fread(&ehdr, sizeof(ehdr), 1, f) <= 0) { fclose(f); return 1; }
    if(memcmp(ehdr.e_ident, ELFMAG, SELFMAG) != 0) { fclose(f); return 1; }
    if(fseek(f, ehdr.e_shoff + (ehdr.e_shentsize * ehdr.e_shstrndx), SEEK_SET) != 0) { fclose(f); return 1; }
    if(fread(&shdr_shstrtab, sizeof(shdr_shstrtab), 1, f) <= 0) { fclose(f); return 1; }
    if(fseek(f, ehdr.e_shoff, SEEK_SET) != 0) { fclose(f); return 1; }
    for(int i = 0; i < ehdr.e_shnum; i++) {
        if(fread(&shdr_dynstr, sizeof(shdr_dynstr), 1, f) <= 0) { shdr_dynstr.sh_type = SHT_NULL; break; }
        if(shdr_dynstr.sh_type == SHT_STRTAB) {
            oldpos = ftell(f);
            sname[8] = 0;
            if(fseek(f, shdr_shstrtab.sh_offset + shdr_dynstr.sh_name, SEEK_SET) == 0) {
                fgets(sname, 8, f);
            }
            if(strcmp(sname, ".dynstr") == 0) { break; } else { shdr_dynstr.sh_type = SHT_NULL; }
            if(fseek(f, oldpos, SEEK_SET) != 0) { break; }
        }
    }
    if(fseek(f, ehdr.e_shoff, SEEK_SET) == 0) {
        for(int i = 0; i < ehdr.e_shnum; i++) {
            if(fread(&shdr, sizeof(shdr), 1, f) <= 0) { break; }
            if(shdr.sh_type == SHT_DYNAMIC) {
                oldpos = ftell(f);
                dyncount = shdr.sh_size / shdr.sh_entsize;
                if(fseek(f, shdr.sh_offset, SEEK_SET) == 0) {
                    for(int i = 0; i < dyncount; i++) {
                        if(fread(&dyn, sizeof(dyn), 1, f) <= 0 || dyn.d_tag == DT_NULL) { break; }
                        if(dyn.d_tag == DT_NEEDED) {
                            dynpos = ftell(f);
                            if(fseek(f, shdr_dynstr.sh_offset + dyn.d_un.d_val, SEEK_SET) == 0) {
                                sname[sizeof(sname) - 1] = 0;
                                fgets(sname, sizeof(sname) - 1, f);
                                printf("shdr.sh_type is SHT_DYNAMIC, dyn.d_tag is DT_NEEDED: %s\n", sname);
                            }
                            if(fseek(f, dynpos, SEEK_SET) != 0) { break; }
                        }
                    }
                }
                if(fseek(f, oldpos, SEEK_SET) != 0) { break; }
            }
        }
    }
    if(fseek(f, ehdr.e_phoff, SEEK_SET) == 0) {
        for(int i = 0; i < ehdr.e_phnum; i++) {
            if(fread(&phdr, sizeof(phdr), 1, f) <= 0) { break; }
            if(phdr.p_type == PT_DYNAMIC) {
                oldpos = ftell(f);
                dyncount = phdr.p_filesz / sizeof(dyn);
                if(fseek(f, phdr.p_offset, SEEK_SET) == 0) {
                    for(int i = 0; i < dyncount; i++) {
                        if(fread(&dyn, sizeof(dyn), 1, f) <= 0 || dyn.d_tag == DT_NULL) { break; }
                        if(dyn.d_tag == DT_NEEDED) {
                            dynpos = ftell(f);
                            if(fseek(f, shdr_dynstr.sh_offset + dyn.d_un.d_val, SEEK_SET) == 0) {
                                sname[sizeof(sname) - 1] = 0;
                                fgets(sname, sizeof(sname) - 1, f);
                                printf("phdr.p_type is PT_DYNAMIC, dyn.d_tag is DT_NEEDED: %s\n", sname);
                            }
                            if(fseek(f, dynpos, SEEK_SET) != 0) { break; }
                        }
                    }
                }
                if(fseek(f, oldpos, SEEK_SET) != 0) { break; }
            }
        }
    }
    fclose(f);
    return 0;
}

int main(int argc, char* argv[])
{
    if(argc > 1) {
        print_dependencies(argv[1]);
    }
    return 0;
}

This is a C code, i have tried to make it handle all errors and not cause any memory leak, fgets could be replaced by something that reads byte by byte and allocate only required amount of bytes to conserve some memory. You can use malloc and realloc based arrays to store sname to be used by caller function. C++ can be used too if desired, then you can use C++ vector for storing sname in a convenient way. This code uses fseek extensively, if you don't want to seek and read through the file many times, you can cache the file into the memory and process it from memory, or just cache revelant parts of the file. Note that if you need to know all files required to run an app, like ldd does, you need to use print_dependencies recursively inside itself, getting dependencies of dependencies etc.

Cameroncameroon answered 18/3, 2024 at 11:49 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.