Memory location of variables defined in a shared library
Asked Answered
S

1

6

TL;DR Why do variables defined in a shared library seem to reside in segments defined in the main program and not the shared library?

I am trying to understand ELF file dynamic linking. I wrote a dummy shared library

// varlib.so

int x = 42;

void set_x() {
    x = 16;
}

and a program that uses it

// main.out

#include <stdlib.h>
#include <stdio.h>

extern int x;
void set_x();

int f() {
    return x;
}

int main(int argc, char** argv) { 
    set_x();
    printf("%d\n", f());
    return 0;
}

before I looked at the assembly I assumed that the segment that holds x would come from varlib.so (probably the .data segment) and main.out would use it's GOT table (and a relocation to fix up the GOT table entry) to access x. However on inspection I find that

In main.out

The function f is defined as

0000000000400637 <f>:
  400637:   55                      push   rbp
  400638:   48 89 e5                mov    rbp,rsp
  40063b:   8b 05 f7 09 20 00       mov    eax,DWORD PTR [rip+0x2009f7]        # 601038 <x>
  400641:   5d                      pop    rbp
  400642:   c3                      ret    

with relocation

Relocation section '.rela.dyn' at offset 0x490 contains 3 entries:
    Offset             Info             Type               Symbol's Value  Symbol's Name + Addend
0000000000601038  0000000600000005 R_X86_64_COPY          0000000000601038 x + 0

where 0x601038 is in the .bss section of main.out.

In libvar.so

set_x is defined as

00000000000005aa <set_x>:
 5aa:   55                      push   rbp
 5ab:   48 89 e5                mov    rbp,rsp
 5ae:   48 8b 05 23 0a 20 00    mov    rax,QWORD PTR [rip+0x200a23]        # 200fd8 <x-0x48>
 5b5:   c7 00 10 00 00 00       mov    DWORD PTR [rax],0x10
 5bb:   90                      nop
 5bc:   5d                      pop    rbp
 5bd:   c3                      ret    

with relocation

Relocation section '.rela.dyn' at offset 0x3d0 contains 8 entries:
    Offset             Info             Type               Symbol's Value  Symbol's Name + Addend
0000000000200fd8  0000000500000006 R_X86_64_GLOB_DAT      0000000000201020 x + 0

where 0x200fd8 is in the .got section of varlib.so.

So it would seem that x is actually located in a segment of main.out (specifically the .bss segment) and libvar.so has to use it's .got table to access it. i.e. the exact opposite of what I though! This seems odd as x is defined as extern in main.out and given a value in varlib.so. I think I understand most of the technical details (although am still a bit confused about the exact meanings of the R_X86_64_COPY and R_X86_64_GLOB_DAT relocation types; If anyone has a good guide on relocation types that would be much appreciated).

So my main question is why do it this way and not the way I originally though it was done with x 'living' in a libvar.so segment and main.out accessing it through the GOT (or some other relocation mechanism)?

Sputum answered 3/6, 2019 at 16:1 Comment(5)
this seems to be a quite good explanationRunlet
@IngoLeonhardt nice explanation that has links to additional references such as this one, How to Write Shared Libraries.Patent
I expect that this way of accessing the variables was done for efficiency and run time as well as to make it easier to have separate processes using shared code but with process specific variables and data space.Patent
@RichardChambers Mainly, the idea was to make it so that all program parts that are not shared libraries need not be changed.Intransigent
developers.redhat.com/blog/2019/05/06/…Oncoming
A
5

So it would seem that x is actually located in a segment of main.out (specifically the .bss segment) and libvar.so has to use it's .got table to access it. i.e. the exact opposite of what I though!

Yes and no. Setting aside for a moment the question of which ELF object actually provides x, we know that that variable is defined with a nonzero initializer. If we see such a variable assigned to an ELF object's .bss section then we know that something strange is going on, because that section is for default-initialized data. It occupies no space in the dynamic object because the all-bits-zero initial values are not actually stored. More on this in a moment.

[...] I think I understand most of the technical details (although am still a bit confused about the exact meanings of the R_X86_64_COPY and R_X86_64_GLOB_DAT relocation types;

And those relocation types are key. R_X86_64_COPY is a relocation type for initialized external variables defined in a different ELF object, and R_X86_64_GLOB_DAT is the corresponding a relocation type for globally-visible objects whose initial value is stored in the current ELF object.

Recall that each program that uses the library must have its own copies of all the modifiable objects, whereas a large part of the point of a shared library is that it is resident in memory only once. It makes sense, therefore, for the variables to be provided by the program instead of by the library. But they must appear in each ELF object's relocation table, however, because the functions of the library need to access the right copy of the variable.

On the other hand, the initial values of such variables need to be recorded in the library, because there's no place else to get them when its clients are built. In principle, the initial values could be copied into executables when they are built, but then they would unnecessarily increase the sizes of the executable objects (since they need to be in the library object no matter what), and the executables would need to be rebuilt if the library were modified to initialize the variable differently.

If anyone has a good guide on relocation types that would be much appreciated).

Requests for off-site resources are off-topic for SO, but I'm sure Google can provide several. In a nutshell, however, what they will tell you is this:

  • R_X86_64_COPY identifies an object whose storage is provided by the current ELF object, but whose initial value needs to be copied from another object, and

  • R_X86_64_GLOB_DAT identifies an object whose storage is provided by a different ELF object, but whose initial value is provided by this one

The dynamic linker uses these together to copy the initial value from library to the executable's copy of the variable, and to (greedily) process relocations of the variable in the library.

This seems odd as x is defined as extern in main.out and given a value in varlib.so.

It seems odd only because you suppose that the logical properties of C translation units should map directly and correspondingly onto the physical properties of corresponding ELF objects. That's not crazy -- they do map to a significant extent -- but they cannot map perfectly because ELF semantics do not perfectly mirror C semantics.

Authorized answered 3/6, 2019 at 17:5 Comment(5)
Thanks. This clarifies a number of things. However you say "Recall that each program that uses the library must have its own copies of all the modifiable objects, whereas a large part of the point of a shared library is that it is resident in memory only once. It makes sense, therefore, for the variables to be provided by the program instead of by the library." But the shared library has a .data section that may be different for every copy of the library. Why not just put x in there? Is there a good reason not to do this or is it just a design choice / convention?Sputum
I'm inclined to think that there is a functional reason, @Michael, probably related to the fact that in ELF, the same dynamic symbol can resolve differently in different ELF objects (another difference from C semantics). But this is a deeper magic than I can explain in any detail.Authorized
okay, @John. It's still just kinda weird in my head as there is the extra time & memory overhead of the copying but I can't see any gains! (although I am sure there are)Sputum
@JohnBollinger: according to Florian Weimer at developers.redhat.com/blog/2019/05/06/… At the same time, for executables, there was a desire not to reduce performance by changing the compilation model for executables. This means that when compiling source code for a main program (i.e., without -fPIC, and in this particular case, without -fPIE as well), the array_get function is compiled to the exact same instruction sequence, before the introduction of dynamic shared objects.Oncoming
So, while shared objects have to be PIC, it was apparently decided to avoid that for the main executable, and that's the reason for using a copy relocation.Oncoming

© 2022 - 2024 — McMap. All rights reserved.