Ubuntu 20.04 glibc 2.31 RTFS + GDB
glibc does some setup before main so that some of its functionalities will work. Let's try to track down the source code for that.
hello.c
#include <stdio.h>
int main() {
puts("hello");
return 0;
}
Compile and debug:
gcc -ggdb3 -O0 -std=c99 -Wall -Wextra -pedantic -o hello.out hello.c
gdb hello.out
Now in GDB:
b main
r
bt -past-main
gives:
#0 main () at hello.c:3
#1 0x00007ffff7dc60b3 in __libc_start_main (main=0x555555555149 <main()>, argc=1, argv=0x7fffffffbfb8, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffffffbfa8) at ../csu/libc-start.c:308
#2 0x000055555555508e in _start ()
This already contains the line of the caller of main: https://github.com/cirosantilli/glibc/blob/glibc-2.31/csu/libc-start.c#L308.
The function has a billion ifdefs as can be expected from the level of legacy/generality of glibc, but some key parts which seem to take effect for us should simplify to:
# define LIBC_START_MAIN __libc_start_main
STATIC int
LIBC_START_MAIN (int (*main) (int, char **, char **),
int argc, char **argv,
{
/* Initialize some stuff. */
result = main (argc, argv, __environ MAIN_AUXVEC_PARAM);
exit (result);
}
Before __libc_start_main
are are already at _start
, which by adding gcc -Wl,--verbose
we know is the entry point because the linker script contains:
ENTRY(_start)
and is therefore is the actual very first instruction executed after the dynamic loader finishes.
To confirm that in GDB, we an get rid of the dynamic loader by compiling with -static
:
gcc -ggdb3 -O0 -std=c99 -Wall -Wextra -pedantic -o hello.out hello.c
gdb hello.out
and then make GDB stop at the very first instruction executed with starti
and print the first instructions:
starti
display/12i $pc
which gives:
=> 0x401c10 <_start>: endbr64
0x401c14 <_start+4>: xor %ebp,%ebp
0x401c16 <_start+6>: mov %rdx,%r9
0x401c19 <_start+9>: pop %rsi
0x401c1a <_start+10>: mov %rsp,%rdx
0x401c1d <_start+13>: and $0xfffffffffffffff0,%rsp
0x401c21 <_start+17>: push %rax
0x401c22 <_start+18>: push %rsp
0x401c23 <_start+19>: mov $0x402dd0,%r8
0x401c2a <_start+26>: mov $0x402d30,%rcx
0x401c31 <_start+33>: mov $0x401d35,%rdi
0x401c38 <_start+40>: addr32 callq 0x4020d0 <__libc_start_main>
By grepping the source for _start
and focusing on x86_64 hits we see that this seems to correspond to sysdeps/x86_64/start.S:58
:
ENTRY (_start)
/* Clearing frame pointer is insufficient, use CFI. */
cfi_undefined (rip)
/* Clear the frame pointer. The ABI suggests this be done, to mark
the outermost frame obviously. */
xorl %ebp, %ebp
/* Extract the arguments as encoded on the stack and set up
the arguments for __libc_start_main (int (*main) (int, char **, char **),
int argc, char *argv,
void (*init) (void), void (*fini) (void),
void (*rtld_fini) (void), void *stack_end).
The arguments are passed via registers and on the stack:
main: %rdi
argc: %rsi
argv: %rdx
init: %rcx
fini: %r8
rtld_fini: %r9
stack_end: stack. */
mov %RDX_LP, %R9_LP /* Address of the shared library termination
function. */
#ifdef __ILP32__
mov (%rsp), %esi /* Simulate popping 4-byte argument count. */
add $4, %esp
#else
popq %rsi /* Pop the argument count. */
#endif
/* argv starts just at the current stack top. */
mov %RSP_LP, %RDX_LP
/* Align the stack to a 16 byte boundary to follow the ABI. */
and $~15, %RSP_LP
/* Push garbage because we push 8 more bytes. */
pushq %rax
/* Provide the highest stack address to the user code (for stacks
which grow downwards). */
pushq %rsp
#ifdef PIC
/* Pass address of our own entry points to .fini and .init. */
mov __libc_csu_fini@GOTPCREL(%rip), %R8_LP
mov __libc_csu_init@GOTPCREL(%rip), %RCX_LP
mov main@GOTPCREL(%rip), %RDI_LP
#else
/* Pass address of our own entry points to .fini and .init. */
mov $__libc_csu_fini, %R8_LP
mov $__libc_csu_init, %RCX_LP
mov $main, %RDI_LP
#endif
/* Call the user's main function, and exit with its value.
But let the libc call main. Since __libc_start_main in
libc.so is called very early, lazy binding isn't relevant
here. Use indirect branch via GOT to avoid extra branch
to PLT slot. In case of static executable, ld in binutils
2.26 or above can convert indirect branch into direct
branch. */
call *__libc_start_main@GOTPCREL(%rip)
which ends up calling __libc_start_main
as expected.
Unfortunately -static
makes the bt
from main
not show as much info:
#0 main () at hello.c:3
#1 0x0000000000402560 in __libc_start_main ()
#2 0x0000000000401c3e in _start ()
If we remove -static
and start from starti
, we get instead:
=> 0x7ffff7fd0100 <_start>: mov %rsp,%rdi
0x7ffff7fd0103 <_start+3>: callq 0x7ffff7fd0df0 <_dl_start>
0x7ffff7fd0108 <_dl_start_user>: mov %rax,%r12
0x7ffff7fd010b <_dl_start_user+3>: mov 0x2c4e7(%rip),%eax # 0x7ffff7ffc5f8 <_dl_skip_args>
0x7ffff7fd0111 <_dl_start_user+9>: pop %rdx
By grepping the source for _dl_start_user
this seems to come from sysdeps/x86_64/dl-machine.h:L147
/* Initial entry point code for the dynamic linker.
The C function `_dl_start' is the real entry point;
its return value is the user program's entry point. */
#define RTLD_START asm ("\n\
.text\n\
.align 16\n\
.globl _start\n\
.globl _dl_start_user\n\
_start:\n\
movq %rsp, %rdi\n\
call _dl_start\n\
_dl_start_user:\n\
# Save the user entry point address in %r12.\n\
movq %rax, %r12\n\
# See if we were run as a command with the executable file\n\
# name as an extra leading argument.\n\
movl _dl_skip_args(%rip), %eax\n\
# Pop the original argument count.\n\
popq %rdx\n\
and this is presumably the dynamic loader entry point.
If we break at _start
and continue, this seems to end up in the same location as when we used -static
, which then calls __libc_start_main
.
When I try a C++ program instead:
hello.cpp
#include <iostream>
int main() {
std::cout << "hello" << std::endl;
}
with:
g++ -ggdb3 -O0 -std=c++11 -Wall -Wextra -pedantic -o hello.out hello.cpp
the results are basically the same, e.g. the backtrace at main
is the exact same.
I think the C++ compiler is just calling into hooks to achieve any C++ specific functionality, and things are pretty well factored across C/C++.
TODO:
main()
as "start of the program" – Bifacial