What does SEGMENT_START("text-segment", 0x400000) represent?
Asked Answered
N

2

6

I'm learning about the layout of executable binaries. My end goal is to analyze a specific executable for things that could be refactored (in its source) to reduce the compiled output size.

I've been using https://www.embeddedrelated.com/showarticle/900.php and https://www.geeksforgeeks.org/memory-layout-of-c-program/ as references for this initial learning.

From what I've learned, a linker script specifies the addresses where sections of compiled binaries are placed. E.g.

> ld --verbose | grep text
  PROVIDE (__executable_start = SEGMENT_START("text-segment", 0x400000)); . = SEGMENT_START("text-segment", 0x400000) + SIZEOF_HEADERS;
      *(.rela.text .rela.text.* .rela.gnu.linkonce.t.*)

I think this means that the text segments of compiled binaries starts at memory address 0x400000 - true?

What does that value, 0x400000, represent? I'm probably not understanding something properly, but surely that 0x400000 does not represent a physical memory location, does it? E.g. if I were to run two instances of my compiled a.out executable in parallel, they couldn't both simultaneously occupy the space at 0x400000, right?

Nab answered 20/3, 2019 at 21:21 Comment(0)
L
4

0x4000000 is not a physical address in the sense how your memory chips see it. This is a virtual address as it's seen from CPU's point of view.

Loader of your program will map a few pages of physical memory to VA 0x400000 and copy the contents of text-segment to it. And yes, another instance of your program could occupy the same physical and virtual block of memory for the text-segment, because text (code) is readable and executable but not writeable. Other segments (data, bss, stack, heap) may have identical VA but each will be mapped to their private protected physical block of memory.

Leyba answered 21/3, 2019 at 20:2 Comment(0)
T
0

What is 0x400000

I think this means that the text segments of compiled binaries starts at memory address 0x400000 - true?

No, this is well explained in the official documentation at: https://sourceware.org/binutils/docs/ld/Builtin-Functions.html

SEGMENT_START(segment, default)

Return the base address of the named segment. If an explicit value has already been given for this segment (with a command-line ‘-T’ option) then that value will be returned otherwise the value will be default. At present, the ‘-T’ command-line option can only be used to set the base address for the “text”, “data”, and “bss” sections, but you can use SEGMENT_START with any segment name.

Therefore, SEGMENT_START is not setting the address, but rather it is returning it, and 0x4000000 in your case is just the default if that value was not deterministically set by some CLI mechanism mentioned in the documentation (e.g. -Ttext=0x200 as mentioned in man ld)

Physical vs virtual

As you've said, doing things in physical addresses is very uncommon in userland, and would at the very least always require sudo as it would break process separation. Here is an example of userland doing physical address stuff for example: How to access physical addresses from user space in Linux?

Therefore, when the kernel loads an ELF binary with the exec syscalls, all addresses are interpreted as virtual addresses.

Note however that this is just a matter of convention. For example, when I give my Linux kernel ELF binary for QEMU to load into memory to start simulation, or when a bootloader does that in a real system, the ELF addresses would then be treated as physical addresses since there is no page table available at that point.

Trixi answered 1/4, 2020 at 15:24 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.