What causes page faults?
Asked Answered
S

8

42

According to Wikipedia:

A page fault is a trap to the software raised by the hardware when a program accesses a page that is mapped in the virtual address space, but not loaded in physical memory. (emphasis mine)

Okay, that makes sense.

But if that's the case, why is it that whenever the process information in Process Hacker is refreshed, I see about 15 page faults?

Screenshot

Or in other words, why is any memory getting paged out? (I have no idea if it's user or kernel memory.) I have no page file, and the RAM usage is about 1.2 GB out of 4 GB, which is after a clean reboot. There's no shortage of any resource; why would anything get paged out?

Surmise answered 16/4, 2011 at 3:59 Comment(0)
H
68

(I'm the author of Process Hacker.)

Firstly:

A page fault is a trap to the software raised by the hardware when a program accesses a page that is mapped in the virtual address space, but not loaded in physical memory.

That's not entirely correct, as explained later in the same article (Minor page fault). There are soft page faults, where all the kernel needs to do is add a page to the working set of the process. Here's a table from the Windows Internals book (I've excluded the ones that result in an access violation):

Reason for Fault Result
Accessing a page that isn’t resident in memory but is on disk in a page file or a mapped file Allocate a physical page, and read the desired page from disk and into the relevant working set
Accessing a page that is on the standby or modified list Transition the page to the relevant process, session, or system working set
Accessing a demand-zero page Add a zero-filled page to the relevant working set
Writing to a copy-on-write page Make process-private (or session-private) copy of page, and replace original in process or system working set

Page faults can occur for a variety of reasons, as you can see above. Only one of them has to do with reading from the disk. If you try to allocate a block from the heap and the heap manager allocates new pages, then accesses those pages, you'll get a demand-zero page fault. If you try to hook a function in kernel32 by writing to kernel32's pages, you'll get a copy-on-write fault because those pages are silently being copied so your changes don't affect other processes.

Now to answer your question more specifically: Process Hacker only seems to have page faults when updating its service information - that is, when it calls EnumServicesStatusEx, which RPCs to the SCM (services.exe). My guess is that in the process, a lot of memory is being allocated, leading to demand-zero page faults (the service information requires several pages to store, IIRC).

Harday answered 17/4, 2011 at 0:34 Comment(5)
Ah, apparently father knows best, haha. :) (Yup I knew you were the author, I'm also the guy who suggested moving to MSVCRT. :P) Hm... so this is being caused by RPCs, huh. Interesting... so if you take out the call, you won't get those page faults?Surmise
Apparently you still get the faults if you remove that call... any ideas?Surmise
@Mehrdad: My point was that page faults occur for many reasons, and they aren't really something to worry about if you're writing a program.Harday
Yeah okay... I was just wondering what the exact reasons might be in a case like this, and the RPC thing was definitely enlightning, thanks.Surmise
For completeness, in operating system terminology, there's a 3rd kind of page-fault besides minor(soft) / major(hard): An invalid page fault is when the kernel's page-fault handler decides that the process doesn't even logically have that virtual address mapped. e.g. a NULL pointer deref - virtual address 0 isn't mapped in the HW page tables (thus a #PF x86 hardware exception), and when the kernel checks how to correct the situation and retry the faulting instruction, it finds there's no fix. (On Linux, the kernel delivers a SIGSEGV signal (segfault) or kills the process if no handler.)Pham
T
8

A slow but steady source of page faults is the OS probing for infrequently accessed pages. In this case, the operating system marks some pages not present, but leaves them in memory as-is. If an application accesses the page, then the #PF trap occurs and the OS simply marks the page present again without further ado. If a "long time" passes and a page never trips a fault, then the OS knows the page is a good candidate for swapping should the need arise. This mechanism can run proactively even in times of no resource pressure.

Typesetting answered 16/4, 2011 at 18:27 Comment(4)
That sounds like a plausible reason, though it's not too convincing. Still a good potential reason, though. +1Surmise
@vladr - I recall this was a Xen tactic at least, but some brief Googling failed to turn up any proof.Typesetting
Wouldn't OSes just clear the "Accessed" bit (wiki.osdev.org/Paging#Page_Directory) in the page-table entry, instead of actually making the PTE invalid? So if the page is needed, there's no fault, just a microcode assist to atomically update the PTE. (This is how x86 works, other ISAs may have not have an Accessed bit?) I could see Xen doing this if the nested page tables don't support an accessed bit, or on HW that doesn't support nested page tables so the Accessed bit is in use by the guest OS.Pham
(I think mis-speculated accesses wouldn't set the accessed bit, since the it takes a microcode assist to do that in current CPUs. That's basically like a fault handler, probably not triggering until its known non-speculative, i.e. not until the load or store reaches retirement. But if I'm wrong about that, that could justify unmapping pages, since mis-speculation won't actually take a page fault.)Pham
N
4

"page that is mapped in the virtual address space, but not loaded in physical memory" does not imply that it previously was in physical memory. Suppose you map a file? It's still on disk, not in memory yet.

Suppose you map a log file and keep appending to it. Every time you exceed the end of committed memory, a page fault occurs, the OS will provide you with a new empty page and adjust the file length.


It could also be access violations which are caught and handled by the program.


It could also be that the program uses more memory segments than fit in the TLB (which is a cache for the page tables). When pages are contiguous, they can all be handled by a single page table entry. But if memory is fragmented in physical address space, many page table entries are needed, and they may not fit in the TLB. When a TLB miss occurs, the OS page fault handler is invoked and looks up the mapping in the process's page table.

In some ways, this is a variation on Dean's answer: the pages are already in physical RAM, and the OS does need to load those mappings into the TLB, but not because of IPC.

Brian pointed out that x86 (and therefore all Win32 systems) handles this without a page fault.


Yet another cause of page faults is triggering guard pages used for stack growth and copy-on-write, but usually those would not occur without bound. I'm not 100% sure if those would show up as access violations or not, because they will be marked as an access violation on entry to the MMU trap, but are probably handled by the OS page fault handler and not transformed into the user mode (SEH) access violation.

Negligence answered 16/4, 2011 at 4:13 Comment(10)
@Ben: That's a great point, but AFAIK there's no logging going on here. It should be reading the same values each time.Surmise
@Ben: Interesting point about access violations, lemme check in a couple minutes...Surmise
@Ben: Turns out there's no access violations happening.Surmise
@Ben: Interesting that you mention the translation lookaside buffer, I didn't think about that. But I don't think that's the case because almost all of the other processes stay at completely constant page faults, even though they're doing some work. (e.g. I move my mouse around in Chrome, which obviously receives WM_MOUSEMOVE messages, but it's Process Hacker's page fault that goes up, not Chrome's.)Surmise
@Mehrdad: The page table is per-process, and gets completely reloaded during context switch. That reloading of the TLB doesn't count against the process. Only if while the process is executing, it touches a page whose page table entry didn't fit into the TLB, does a page fault get charged.Negligence
@Ben: Huh... I didn't know it gets reloaded in a context switch. That seems like a plausible explanation then... any [easy] way to test it? :-)Surmise
@Mehrdad: I googled for that and didn't find it, but I did find some experts discussing the same issueNegligence
Just FYI on x86, TLB misses are handled directly by hardware without trapping into the OS. Only if the page table does not contain the appropriate entry is the OS invoked.Rosalindarosalinde
@Brian: Ok, thanks, didn't know that. I guess one of the privileged registers (is it CR3?) holds a pointer to the full page table?Negligence
@Brian, @Ben: Ah interesting, so it's not the TLB. (Yeah, CR3 holds a pointer to the page table.)Surmise
P
2

Any time a mmap'd section is read, a page fault is generated, which includes whenever you load a DLL. So, loading a DLL doesn't actually read all of the DLL into memory, it only causes it to be faulted in as the code is executed.

Plasterboard answered 16/4, 2011 at 4:13 Comment(9)
"Any time a mmap'd section is read, a page fault is generated"? Are you sure about that? That would be pretty darn slow... I thought it only faults if the page is absent from memory?Surmise
@Mehrdad: That's correct, it will fault once per page, and after that the page is present in RAM. But he's on the right track, explaining how page faults happen without anything being paged out.Negligence
@Ben: I'm confused. If it's paged in, and never paged out, why would there be any page fault on subsequent operations?Surmise
@Mehrdad: Because new and different pages are being accessed? Or because no page is loaded as a result of the fault, an access violation is raised, and the program handles it and continues (but later accesses the same invalid address again, causing another fault).Negligence
@Ben: No access violations (I checked). And exactly why would a repetitive action like this cause a new page to be accessed every single time? It just doens't sound reasonable.Surmise
@Mehrdad: With an access violation, instead of a new and different page being accessed each time, it could be a new same page being accessed (the page is still "new" because nothing was committed). But since you checked and found no access violations, I'm out of explanations, at least for the moment. Can you hook up a kernel debugger?Negligence
@Ben: Not really... I've never done kernel debugging and this is on my only laptop, no other computer available with the tools right now. :-) Is it possible that the access violations are not being recorded? I used Visual Studio to attach to the program, and it only logged two access violations at startup, but none thereafter. Could any of them not be shown (and if so, why)?Surmise
@Mehrdad: No, I would think that any access violations which get counted as page faults would be caught as first-chance exceptions in the debugger. But I have another possible explanation: page table too big to fit in the TLB -- see my answer.Negligence
to fault in -- delightful :-)Reef
Q
1

You'll see soft page faults when memory is being shared between processes. Basically, if you have a memory-mapped file shared between two processes, when the second process loads the memory-mapped file, soft page faults are generated - the memory is already in physical RAM, but the operating system needs to fix up the memory manager's tables so that the virtual memory address in your process points to the correct physical page.

Particularly for something like Process Hacker, which is likely injecting code into every running process (in order to collect information) it's likely making quite heavy use of shared memory for doing IPC.

Quibbling answered 16/4, 2011 at 4:11 Comment(2)
That should still only happen when new pages get committed into the mapping. Not on every context switch.Negligence
Interesting information in the first paragraph, sounds reasonable. But second paragraph, though: I don't think Process Hacker injects any code without me telling it to; any reads it does should be from kernel memory, which is shared across processes.Surmise
M
1

Operating Systems use paging to group items witch should be placed in physical memory and move them between physical memory and shared memory. most of the time, data items witch place in a single page, are related to each other. when data items in a page are not used for a long time, operating system moves it to virtual memory to free some space in physical memory. and then when a page is required witch is in virtual memory, operating system moves it from virtual memory (hard disk) to physical memory. this is Page Fault !

and remember, different operating systems are different in paging algorithms.

Basics of Page Faults

May answered 16/4, 2011 at 4:11 Comment(9)
How long is "a long time"? Would ~50 ms be a long time?Surmise
@Mehrdad: The OS is only going to map stuff out if all memory is used (and even then, only if the OS feels that the disk cache is too small).Negligence
@Ben: But like I specifically mentioned, neither RAM nor handles nor disk space (nor even screen real estate, haha) is something that's on a shortage here... so there shouldn't be anything mapped out.Surmise
@Mehrdad: Right... I'm disagreeing with Farzin's answer (or at least its relevance to your situation). I think the reason you have page faults is not because the pages got evicted from RAM, but because they haven't been paged in yet.Negligence
@Ben: Ah. But I don't understand: Why would anything not be paged in after the thousands of refreshes the process goes through? What exactly might it be paging in every time that wasn't paged in at the previous refresh?Surmise
@Mehrdad: That's why I'm promulgating the access violation explanation. If the cache manager can't find data to swap in, because the address isn't associated with any file mapping, or because the source is unavailable (I/O error, mapped a network file that's unavailable, etc) then the root cause of the fault isn't addressed, and the next access to that region will cause another trap.Negligence
@Ben: I checked, there are no access violations happening.Surmise
@Mehrdad: Your reason : "because they haven't been paged in yet" is one of the other reasons of page fault !May
@Farzin: That doesn't make sense. Everything should already be paged in, because there's nothing new being loaded hundreds of times per second..Surmise
S
0

Resource allocation is a delicate balance between keeping primary storage available for use and preventing needing to go to secondary as much as possible. If a process tries to allocate memory and can't that's usually an exception and sometimes a fatal exception.

Essentially, you can't keep everything in RAM with no free resources available because when a program starts or asks for more it will crash.

Surcingle answered 16/4, 2011 at 4:13 Comment(2)
Of course not. That's because the OS is managing each process correctly. Remember, every process, that's every process gets 4GB of address space (more on 64-bit systems). Your screenshot alone has 10 processes listed. Have you got 40GB of RAM available? The takeaway is that page faults are normal and expected behavior.Surcingle
You do know how virtual memory works, right? Even if everything that's loaded was put into RAM at once due to poor memory management, it still wouldn't reach 4 GB... let alone 40 GB.Surmise
L
0

Let us understand the basics of the working of memory. Memory has physical and logical addresses. The physical address has blocks named frames. The logical address has blocks called pages. The CPU generates the logical address and divides it into two parts, namely offset and page number. A page table consists of indexes to respective page numbers and their matching to the individual frame numbers. It adds the offset to the frame number, finally giving the address in the physical memory.

The absence of the required page number in the table gives rise to a page fault. The main reason for page faults is the absence of a piece of memory in the physical memory(or main memory) or the lack of the amount of memory at the requested location. It means that a piece of memory is part of the program's working set, but the system cannot trace it in the physical memory. It is an exception generated by the computer's hardware, which tells the operating system about the missing references.

The memory Management Unit(MMU) handles this process. Now, there are two types of page faults, namely hard page faults and soft page faults.

Hard Page Fault It occurs when the required page is not in the main physical memory, so the system must fetch it from the virtual memory. There is a validity bit associated with the entries in the page table. If the validity bit is found to be zero for some entry, it indicates that the page has not been assigned any frames, so page fault occurs.

Soft Page Fault It occurs when the page to be found is found somewhere else in the memory. This actual place where the system finds the piece of memory can be the 'cache.'

Now let us understand the different causes of page fault.

A page fault can occur when you wish to access a page that is not a main memory resident. But, the piece of memory to be accessed is in the virtual memory from where it is to be fetched.

A page fault can occur when you want to access a page that is part of a page table that comes under the 'standby' mode category. It means that the page is under a modified page table, and to solve this; the page has to be transferred to the correct session's page table or the proper working set.

An occurrence of a page fault may be due to the practice of accessing a demand-zero page. It happens when you wish to allocate a block from the heap memory, and the heap manager gives new pages, and when you try to access those pages, a demand-zero page fault occurs.

When you try to hook a function in kernel32 by writing to its pages, a copy-on-write page fault occurs. It is because these pages get copied silently. So, the changes on these do not affect other processes.

So, a page fault occurs for more than one reason discussed above.

Linstock answered 13/6, 2023 at 4:8 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.