more info on Memory layout of an executable program (process)
Asked Answered
A

3

20

I attended interview for samsung. They asked lot of questions on memory layout of the program. I barely know anything about this.

I googled it "Memory layout of an executable program". "Memory layout of process".

I'm surprised to see that there isn't much info on these topics. Most of the results are forum queries. I just wonder why?

These are the few links I found:

  1. Run-Time Storage Organization
  2. Run-Time Memory Organization
  3. Memory layout of C process ^pdf^

I want to learn this from a proper book instead of some web links.(Randy Hyde's is also a book but some other book). In which book can I find clear & more information on this subject?

I also wonder, why didn't the operating systems book cover this in their books? I read stallings 6th edition. It just discusses the Process Control Block.

This entire creation of layout is task of linker right? Where can I read more about this process. I want COMPLETE info from a program on the disk to its execution on the processor.

EDIT:

Initially, I was not clear even after reading the answers given below. Recently, I came across these articles after reading them, I understood things clearly.

Resources that helped me in understanding:

  1. www.tenouk.com/Bufferoverflowc/Bufferoverflow1b.html
  2. 5 part PE file format tutorial: http://win32assembly.online.fr/tutorials.html
  3. Excellent article : http://www.linuxforums.org/articles/understanding-elf-using-readelf-and-objdump_125.html
  4. PE Explorer: http://www.heaventools.com/

Yes, "layout of an executable program(PE/ELF)" != "Memory layout of process"). Findout for yourself in the 3rd link. :)

After clearing my concepts, my questions are making me look so stupid. :)

Ashe answered 27/12, 2009 at 20:15 Comment(4)
This is going to be somewhat different between DOS, Windows and Unix, probably different between flavors of Unix as well. You should be more specificMaletta
I don't care about that "somewhat". Learning for one OS should give an overview for all other OSes right? In any case, I would go with WINDOWS.Ashe
Unable to access the linuxforums database. Please try again laterPeery
links 2 and 3 are brokenKirven
F
8

How things are loaded depends very strongly on the OS and on the binary format used, and the details can get nasty. There are standards for how binary files are laid out, but it's really up to the OS how a process's memory is laid out. This is probably why the documentation is hard to find.

To answer your questions:

  1. Books:
    • If you're interested in how processes lay out their memory, look at Understanding the Linux Kernel. Chapter 3 talks about process descriptors, creating processes, and destroying processes.
    • The only book I know of that covers linking and loading in any detail is Linkers and Loaders by John Levine. There's an online and a print version, so check that out.

  2. Executable code is created by the compiler and the linker, but it's the linker that puts things in the binary format the OS needs. On Linux, this format is typically ELF, on Windows and older Unixes it's COFF, and on Mac OS X it's Mach-O. This isn't a fixed list, though. Some OS's can and do support multiple binary formats. Linkers need to know the output format to create executable files.

  3. The process's memory layout is pretty similar to the binary format, because a lot of binary formats are designed to be mmap'd so that the loader's task is easier.

    It's not quite that simple though. Some parts of the binary format (like static data) are not stored directly in the binary file. Instead, the binary just contains the size of these sections. When the process is loaded into memory, the loader knows to allocate the right amount of memory, but the binary file doesn't need to contain large empty sections.

    Also, the process's memory layout includes some space for the stack and the heap, where a process's call frames and dynamically allocated memory go. These generally live at opposite ends of a large address space.

This really just scratches the surface of how binaries get loaded, and it doesn't cover anything about dynamic libraries. For a really detailed treatment of how dynamic linking and loading work, read How to Write Shared Libraries.

Franek answered 27/12, 2009 at 20:34 Comment(5)
So, CONSTANTS are stored in -- section. GLOBAL variables are stored in -- section. STATIC variables are stored in -- section etc. are just OS dependent? Isn't there any standard about how such sections r available? what are they? which content goes into what section?. If there aren't any standards. Why do people ask such questions (esp. in interviews) :(Ashe
This is standard for the binary formats, but not standard as far as process layout. Look at ELF or some other binary spec if you're interested in seeing what sections things go in.Franek
Alright, 1. if they are standard where can I find this info? 2. If they are standard as far as binary format , then the 2 things that differ is a)how this binary is converted into exectuable (where file formats comes into play) & b) how this format file is loaded into memory (loader's task). But the final layout in the memory should be fixed. In other words it contains those same sections (which are standard) and they contain standard content. right?Ashe
1. in the binary specs. 2a. The binary is the executable (or the library, which is almost like an executable). 2b. Final layout in memory depends on the OS but is mostly standard. The loader has to fix up references to symbols when it puts a process or a library into memory. Again, look at the Linux kernel book for a general overview of process layout. Look at How to write Shared Libraries for a really detailed description of how the runtime linker/loader resolves symbol references to where those symbols actually live in memory.Franek
well, I'm a completely confused. I think Its better for me, if I'll get back to this thread after reading the stuff you suggested.Ashe
G
3

Here is one way a program can be executed from a file (*nix).

  • The process is created (e.g. fork()). This gives the new process its own memory map. This includes a stack in some area of memory (usually high up in memory somewhere).
  • The new process calls exec() to replace the current executable (often a shell) with the new executable. Often, the new executables .text (executable code and constants) and .data (r/w initialized variables) are set up for demand page mapping, that is, they are mapped into the process memory space as needed. Often, the .text section comes first, followed by .data. The .bss section (uninitialized variables) is often allocated after the .data section. Many times it is mapped to return a page of zeros when the page containing a bss variable is first accessed. The heap often starts at the next page boundary after the .bss section. The heap then grows up in memory while the stack grows down (remember I said usually, there are exceptions!).

If the heap and stack collide, that often causes an out of memory situation, which is why the stack is often placed in high memory.

In a system without a memory management unit, demand paging is usually unavailable but the same memory layout is often used.

Gaylord answered 27/12, 2009 at 21:18 Comment(1)
Very nicely explained.Treadway
F
1

Art of assembly programming http://homepage.mac.com/randyhyde/webster.cs.ucr.edu/www.artofasm.com/Windows/PDFs/MemoryAccessandOrg.pdf

Formate answered 2/7, 2010 at 3:26 Comment(1)
Http/1.1 Service UnavailablePeery

© 2022 - 2024 — McMap. All rights reserved.