I've been learning these topics and read many articles and books but they all lack some complementary information and confused me even more. So here, I’d like to explain what I know while I am asking my questions. Hopefully, this topic will be useful for many like me. I'd also like to learn validity of my knowledge and corrections if necessary.
Virtual Memory
Some articles say “Virtual Memory is some space of Hard Disk which emulates Physical Memory so that we can have more memory than we actually have.”. Some other articles say “Virtual Memory is the combination of Physical Memory (RAM), a section of hard disk which acts like Physical Memory and Page Tables.” However they are different things and I don’t understand why there are different explanations like that.
Let’s go with the second explanation since it is also how Wikipedia describes Virtual Memory as well. At this point Virtual Address makes sense since we use address at Virtual Memory instead of physical memory directly.
By the way, my Mac says I have 8GB physical memory and 8GB virtual memory. In this case does VM include Physical Memory or it is the amount of space in HD used as memory? Do I have 16GB memory available for my programs?
Question 1:
Intel i5 has 36 bit address bus and this means you can address 64GB memory. Let’s say I installed 4GB RAM to my computer. However, my programs may not be aware of the size of the memory installed as it will be used on many different systems with different sizes of memory. This is where Virtual Memory becomes handy. It abstracts away the actual size of the memory installed.
However, what happens when my programs want to access memory address 0xFFFFFFFFF? I do only have 4GB installed and perhaps some memory space in HD.
I have two theory for this question:
1. Since page tables are maintained by OS, the OS decodes that address and finds out which page that is and checks that page in the page table to see whether they have a physical address associated with it (valid and invalid flags), if yes then goes to physical address the page entry points at in the physical memory + offset defined in the virtual address and brings that value. Otherwise a page fault happens and OS looks for that page in the secondary storage, fetches it and puts it in memory and updates page table.
2. It throws a OutOfMemory type of exception which says I don’t have any memory which the given address can address.
The disadvantage of the first theory is that what happens when a program wants to use 64GB memory? Then we need to have 60GB memory space in HD since we only have 4GB. However, in the screen shot below MAC tells me that there is only 8GB Virtual Memory.
Question 2:
How processes are put in Virtual Memory? I mean does each process has 0x0 - 0xFFFFFFFFF virtual memory space available for them or there is only one Virtual Memory address space where all the process are placed?
If each process assumes that they have all the memory available for them, then the memories look like following:
If there is only one Virtual Memory concept, then it would look like this:
Page Table
So page table is a data structure which sits between physical addresses and virtual addresses. It is an associative array (or like a dictionary) which for each page (key), there is a physical address associated (value).
OS uses MMU (Memory Management Unit) to perform this translation from virtual address to physical address.
Question 3:
Is there one big giant page table which includes all the pages for every process or each process has its own page table?
Paging
Paging is a memory management method. Virtual Memory and Physical memory get divided into pages (which are fixed and same size blocks) by Memory Management Unit. This technique is useful when you swap pages between memory and secondary storage so that you can swap pages between them. Your program for instance requests a data located in an address. However, that address your program is using is a virtual address and MMU translates it using page table. During this, MMU checks page table whether the requested is present in page table and OS gets it from secondary storage if not and updates the page table.
Question 4:
Let’s say a process requests the data from an address which is converted to a physical address which has some data already. How is it known that data does not belong to the requester processes and should be replaced with the one that is in secondary storage?
There is dirty bit for example which is used whether to write that page back to hard disk or not but I don’t think it is what determines the owner process.