Symbol addresses during load-time linking vs run-time linking in Linux
Asked Answered
T

4

14

I am trying to understand the difference in the mechanisms underlying load-time linking (using gcc -l) versus run-time linking (using dlopen(), dlsym()) of dynamic libraries in Linux, and how these mechanisms affect the state of the library and the addresses of its symbols.

The experiment

I have three simple files:

libhello.c:

int var;
int func() {
    return 7;
}

libhello.h:

extern int var;
int func();

main.c:

#include <inttypes.h>
#include <stdio.h>
#include <stdint.h>
#include <dlfcn.h>
#include "libhello.h"

int main() {
    void* h = dlopen("libhello.so", RTLD_NOW);
    printf("Address  Load-time linking    Run-time linking\n");
    printf("-------  -----------------    ----------------\n");
    printf("&var     0x%016" PRIxPTR "   0x%016" PRIxPTR "\n", (uintptr_t)&var , (uintptr_t)dlsym(h, "var" ));
    printf("&func    0x%016" PRIxPTR "   0x%016" PRIxPTR "\n", (uintptr_t)&func, (uintptr_t)dlsym(h, "func"));
}

I compile libhello.c with the command gcc -shared -o libhello.so -fPIC libhello.c

I compile main.c with the command gcc main.c -L. -lhello -ldl

The observation

Running the main.c executable prints something like this:

Address  Load-time linking    Run-time linking
-------  -----------------    ----------------
&var     0x0000000000601060   0x00007fdb4acb1034
&func    0x0000000000400700   0x00007fdb4aab0695

The load-time linking addresses remain the same, but the run-time linking addresses change every run.

The questions

  1. Why do the run-time addresses change every run? Do they change due to Address space layout randomization?
  2. If this is the case, why don't addresses change for load-time linking? Isn't load-time linking vulnerable to the same attacks that address randomization aims to protect against?
  3. In the above program, the same library is loaded twice - once at load-time and then at run-time using dlopen(). The second load does not copy the state of the first load. I.e. if the value of var is changed before dlopen(), this value isn't reflected in the version of var loaded via dlsym(). Is there any way to retain this state during the second load?
Tyrannize answered 8/6, 2017 at 6:48 Comment(3)
Related, see Configuring ASLR with randomize_va_space and Jump Over ASLR: Attacking Branch Predictors to Bypass ASLR. If you want to randomize data segments, then I believe you need to set randomize_va_space = 2.Dupondius
In linux/amd64, I got similar results; note: I used this command to link the executable: gcc -o main main.c -L. -lhello -ldl -Wl,-rpath $(pwd -L)Monte
Also tried in AIX/PowerPC, both pair of addresses ('func' and 'var') were equal. The linking command was: gcc -o main main.c -L. -lhello -ldl -Wl,-brtl,-blibpath:$(pwd -L):/usr/libMonte
B
14
  1. Yes, it's ASLR.

  2. Because PIE (Position Independent Executables) is quite expensive (in performance). So many systems do the tradeoff where they randomize libraries because they have to be position independent anyway, but don't randomize executables because it costs too much performance. Yes, it is more vulnerable this way, but most security is a tradeoff.

  3. Yes, don't search symbols through the handle, instead use RTLD_DEFAULT. It's generally a bad idea to have two instances of the same dynamic library loaded like this. Some systems can just skip loading a library in dlopen if they know the same library is already loaded and what the dynamic linker considers "the same library" can change depending on your library path. You're very much in the territory of quite badly/weakly defined behavior here that has evolved over the years more to deal with bugs and problems and less through deliberate design.

Note that RTLD_DEFAULT will return the address of the symbol in the main executable or the first (load time) loaded dynamic library and the dynamically loaded library will be ignored.

Also, another thing worth keeping in mind is that if you reference var in libhello it will always resolve the symbol from the load time version of the library even in the dlopen:ed version. I modified func to return var and added this code to your example code:

int (*fn)(void) = dlsym(h, "func");
int *vp;

var = 17;
printf("%d %d %d %p\n", var, func(), fn(), vp);

vp = dlsym(h, "var");
*vp = 4711;
printf("%d %d %d %p\n", var, func(), fn(), vp);

vp = dlsym(RTLD_DEFAULT, "var");
*vp = 42;
printf("%d %d %d %p\n", var, func(), fn(), vp);

and get this output:

$ gcc main.c -L. -lhello -ldl && LD_LIBRARY_PATH=. ./a.out
17 17 17 0x7f2e11bec02c
17 17 17 0x7f2e11bec02c
42 42 42 0x601054
Address  Load-time linking    Run-time linking
-------  -----------------    ----------------
&var     0x0000000000601054   0x0000000000601054
&func    0x0000000000400700   0x0000000000400700
Bacciform answered 8/6, 2017 at 7:38 Comment(1)
There is more subtlety to this than I was expecting. Thanks for the clarification. RTLD_DEFAULT is not documented extensively, and the dlopen() functions resolving to the load-time variables is also surprising behavior. This answer helps with both.Tyrannize
W
4

What you see depends on many variables. Here on a Debian 64bit I got in my first try

Address  Load-time linking    Run-time linking
-------  -----------------    ----------------
&var     0x0000000000600d58   0x0000000000600d58
&func    0x00000000004006d0   0x00000000004006d0

Which means, that dlopen used the already linked library, which your system seems not to do. To get advantage of ASLR, you need to compile main.c with position independend code: gcc -fPIC main.c ./libhello.so -ldl.

Address  Load-time linking    Run-time linking
-------  -----------------    ----------------
&var     0x00007f4e6cec6944   0x00007f4e6cec6944
&func    0x00007f4e6ccc6670   0x00007f4e6ccc6670
Wallachia answered 8/6, 2017 at 7:15 Comment(0)
H
1

I hope this hint can help you.

  1. The main program is an ELF file, and need to relocation. And the relocation occurs at the loading time. So the var and func address in the main program has relocated before you call the dlsym.

  2. the dlsym func return the symbol address in the OS ad runtime without relocation, this address is in the SO mapping region.

And you can use the mapping info to find the different:

wutiejun@linux-00343520:~/Temp/sotest> LD_LIBRARY_PATH=./ ./test
Address  Load-time linking    Run-time linking
-------  -----------------    ----------------
&var     0x000000000804a028   0x00000000f77a9014
&func    0x0000000008048568   0x00000000f77a744c


wutiejun@linux-00343520:~> cat /proc/7137/maps
08048000-08049000 r-xp 00000000 08:02 46924194                           /home/wutiejun/Temp/sotest/test
08049000-0804a000 r--p 00000000 08:02 46924194                           /home/wutiejun/Temp/sotest/test
0804a000-0804b000 rw-p 00001000 08:02 46924194                           /home/wutiejun/Temp/sotest/test
0804b000-0806c000 rw-p 00000000 00:00 0                                  [heap]
f75d3000-f7736000 r-xp 00000000 08:02 68395411                           /lib/libc-2.11.3.so
f7736000-f7738000 r--p 00162000 08:02 68395411                           /lib/libc-2.11.3.so
f7738000-f7739000 rw-p 00164000 08:02 68395411                           /lib/libc-2.11.3.so
f7739000-f773c000 rw-p 00000000 00:00 0
f773c000-f7740000 r-xp 00000000 08:02 68395554                           /lib/libachk.so
f7740000-f7741000 r--p 00003000 08:02 68395554                           /lib/libachk.so
f7741000-f7742000 rw-p 00004000 08:02 68395554                           /lib/libachk.so
f777a000-f777c000 rw-p 00000000 00:00 0
f777c000-f7784000 r-xp 00000000 08:02 68395441                           /lib/librt-2.11.3.so
f7784000-f7785000 r--p 00007000 08:02 68395441                           /lib/librt-2.11.3.so
f7785000-f7786000 rw-p 00008000 08:02 68395441                           /lib/librt-2.11.3.so
f7786000-f779d000 r-xp 00000000 08:02 68395437                           /lib/libpthread-2.11.3.so
f779d000-f779e000 r--p 00016000 08:02 68395437                           /lib/libpthread-2.11.3.so
f779e000-f779f000 rw-p 00017000 08:02 68395437                           /lib/libpthread-2.11.3.so
f779f000-f77a2000 rw-p 00000000 00:00 0
f77a2000-f77a5000 r-xp 00000000 08:02 68395417                           /lib/libdl-2.11.3.so
f77a5000-f77a6000 r--p 00002000 08:02 68395417                           /lib/libdl-2.11.3.so
f77a6000-f77a7000 rw-p 00003000 08:02 68395417                           /lib/libdl-2.11.3.so
f77a7000-f77a8000 r-xp 00000000 08:02 46924193                           /home/wutiejun/Temp/sotest/libhello.so
f77a8000-f77a9000 r--p 00000000 08:02 46924193                           /home/wutiejun/Temp/sotest/libhello.so
f77a9000-f77aa000 rw-p 00001000 08:02 46924193                           /home/wutiejun/Temp/sotest/libhello.so
f77aa000-f77ab000 rw-p 00000000 00:00 0
f77ab000-f77ca000 r-xp 00000000 08:02 68395404                           /lib/ld-2.11.3.so
f77ca000-f77cb000 r--p 0001e000 08:02 68395404                           /lib/ld-2.11.3.so
f77cb000-f77cc000 rw-p 0001f000 08:02 68395404                           /lib/ld-2.11.3.so
ffd99000-ffdba000 rw-p 00000000 00:00 0                                  [stack]
ffffe000-fffff000 r-xp 00000000 00:00 0                                  [vdso]
wutiejun@linux-00343520:~>
Halifax answered 8/6, 2017 at 7:37 Comment(0)
S
0

In my opinion, I would say that:

  • When you compile the library directly with the executable (static linking) think as if the functions would be directly injected into the source code. If you examine the executable file, you'll see that each section (code, data, ...) will have a fixed "virtual memory" address. If I remember well, each Linux executable will start at the default address of 0x100000, so you'll see that each static linked function will have a fixed address of (0x100000 + fixed offset) and that will never change. Each time the executable is loaded, each specific function will be loaded at that precise address in "virtual memory", meaning that the OS will decide which physical address is used but you won't see that. In your example, the var variable will always have the virtual address of 0x0000000000601060 but you'll never know where in physical memory it will reside.

  • When you load at runtime a dynamic library, the OS has already loaded the executable into memory so you won't have a virtual fixed address. Instead the OS reserves in the executable address space a range of virtual addresses starting at 0x00007fxxxxxxxxxx where it will load and map the newly loaded symbols and functions. Depending of what has already been loaded and memory randomization algorithms, those addresses can be different in each run.

Given this brief explanation, it's simple to assume that the two values you're comparing in your point 3) are completely different variables (each one loaded at a different memory location) so they have different values and don't interact.

Sewan answered 8/6, 2017 at 7:9 Comment(2)
Note that the OP does not perform any static linking here, he links with a shared library in 2 different ways.Committal
Yes, you're right of course. Leaving my answer here because maybe it's helpful for others.Sewan

© 2022 - 2024 — McMap. All rights reserved.