In an ELF executable, this is referred to as the "ELF interpreter". On linux (e.g.) this is /lib64/ld-linux-x86-64.so.2
This is not part of the kernel and [generally] with glibc
et. al.
When the kernel executes an ELF executable, it must map the executable into userspace memory. It then looks inside for a special sub-section known as INTERP
[which contains a string that is the full path].
The kernel then maps the interpreter into userspace memory and transfers control to it. Then, the interpreter does the necessary linking/loading and starts the program.
Because ELF
stands for "extensible linker format", this allows many different sub-sections with the ELF file.
Rather than burdening the kernel with having to know about all the myriad of extensions, the ELF interpreter that is paired with the file knows.
Although usually only one format is used on a given system, there can be several different variants of ELF files on a system, each with its own ELF interpreter.
This would allow [say] a BSD ELF file to be run on a linux system [with other adjustments/support] because the ELF file would point to the BSD ELF interpreter rather than the linux one.
UPDATE:
every process(vlc player, chrome) had the shared library ld.so as part of their address space.
Yes. I assume you're looking at /proc/<pid>/maps
. These are mappings (e.g. like using mmap
) to the files. That is somewhat different than "loading", which can imply [symbol] linking.
So primarily loader after loading the executable(code & data) onto memory , It loads& maps dynamic linker (.so) to its address space
The best way to understand this is to rephrase what you just said:
So primarily the kernel after mapping the executable(code & data) onto memory, the kernel maps dynamic linker (.so) to the program address space
That is essentially correct. The kernel also maps other things, such as the bss
segment and the stack. It then "pushes" argc
, argv
, and envp
[the space for environment variables] onto the stack.
Then, having determined the start address of ld.so
[by reading a special section of the file], it sets that as the resume address and starts the thread.
Up until now, it has been the kernel doing things. The kernel does little to no symbol linking.
Now, ld.so
takes over ...
which further Loads shared Libraries , map & resolve references to libraries. It then calls entry function (_start)
Because the original executable (e.g. vlc
) has been mapped into memory, ld.so
can examine it for the list of shared libraries that it needs. It maps these into memory, but does not necessarily link the symbols right away.
Mapping is easy and quick--just an mmap
call.
The start address of the executable [not to be confused with the start address of ld.so
], is taken from a special section of the ELF executable. Although, the symbol associated with this start address has been traditionally called _start
, it could actually be named anything (e.g. __my_start
) as it is what is in the section data that determines the start address and not address of the symbol _start
Linking symbol references to symbol definitions is a time consuming process. So, this is deferred until the symbol is actually used. That is, if a program has references to printf
, the linker doesn't actually try to link in printf
until the first time the program actually calls printf
This is sometimes called "link-on-demand" or "on-demand-linking". See my answer here: Which segments are affected by a copy-on-write? for a more detailed explanation of that and what actually happens when an executable is mapped into userspace.
If you're interested, you could do ldd /usr/bin/vlc
to get a list of the shared libraries it uses. If you looked at the output of readelf -a /usr/bin/vlc
, you'll see these same shared libraries. Also, you'd get the full path of the ELF interpreter and could do readelf -a <full_path_to_interpreter>
and note some of the differences. You could repeat the process for any .so
files that vlc
wanted.
Combining all that with /proc/<pid>maps
et. al. might help with your understanding.
ld.so
is not a part of the kernel, but is loaded by it. – Terpineol