What may cause page fault at C++ level

P

5

I'm a C++ developer and I'm wondering what may cause Page Fault at the C++ level.

I've read some articles about Page Fault and I think fork() and malloc/new can cause Page Fault.

Are there other reasons which can cause Page Fault?

Does an executable file with a huge size have a higher possibility to cause Page Fault?

Does an executable file with very complex logical structure have a higher possibility to cause Page Fault?

Priming answered 28/11, 2019 at 12:5 Comment(5)

Anything that accesses memory can cause a page fault. If the user launches photoshop for a while all your process' memory may be swapped to disk. When it wakes up the first few memory accesses will page fault, signaling the OS to reload that memory from disk. – Leanneleanor 28/11, 2019 at 12:7

As for your other questions: the larger your executable, the smaller the chance every page will be used equally often, so the larger the chance that parts of it are swapped out. But again, the OS knows where to reload those pages from, so a page fault is more of a nuisance. – Leanneleanor 28/11, 2019 at 12:9

@Leanneleanor I have this question because I think considering how to code c++ with less Page Fault may be necessary. So in your opinion, I should't worry about it while coding c++? – Priming 28/11, 2019 at 12:9

You should not worry about it in nearly all C++ programs. Unless you have hard real-time constraints, but then your OS will typically not do much dynamic memory allocation or swapping either :) – Leanneleanor 28/11, 2019 at 12:10

By contrast, it often IS worthwhile to optimize your code for fewer cache misses by improving data locality and cache usage. – Leanneleanor 28/11, 2019 at 12:11

S

1

Actually, malloc doesn't cause any page faults. The memory is only allocated virtually, so until you use it, it doesn't take up space neither on RAM nor on the disk. If you really want to cause page faults rapidly, you'll have to actually access the buffers in question either for read or for write.

It all boils down to memory usage, if the application is accessing the same 2-3 GB of data, it may be able to live almost without any pagefaults ocurring (assumin that no other application is currently abusing your RAM). So only when your application needs to access a lot of memory, or memory that has gone "cold" for lack of use, you'll have pagefaults.

Additionally, the OS loads entire pages from the disk even if you need to access a single byte from that page. This means that if your data is spread across a large area in the memory, you may experience more page faults, than if all of your data was cancentrated in the same vicinity.

A good test application to uderstand this mechanism would be to allocate huge buffers, more than your RAM can hold, and then to start to modify a single character in 4K intervals (the usual size of a single page in both Linux and Windows). The idea is to dirty as many pages as possible with minimal effort, similar to the concept of ruining a perfectly good package of white paper with a single black dot on every page until your RAM cannot hold so many dirty pages and has to swap them to the disk in order to load other pages for you to dirty.

while (true) {
    char * data = malloc(HUGE_NUMBER)
    for (size_t i=0 ; i<HUGE_NUMBER ; i+=4096)
        data[i] = (char)rand(); // dirty in 4K intervals
}

So a good approach to minimize page faults would be to keep a high memory locality of your data access patterns (use arrays which are sequncial in memory and not lists or maps that may spread all over), and to avoid writing applications that require more RAM than what the target server has to offer.

Regading the executable size, it also depends on how much of the code is actually in use. If your apllication spends 90% of its time running 10% of the code, then the probability for a page faults due to the size of the executable is low, and vice versa.

Selby answered 28/11, 2019 at 12:15 Comment(0)

C

5

Any and every single instruction may cause a page fault. It may be the page with the instruction itself that is not currently loaded.

Note that the instruction does not have to be at the beginning of a page, because the program might have been sleeping, and it might sleep at any point as it may be preempted.

Any and every instruction that has a memory operand may also cause page fault accessing that operand.

Note that these days many systems don't have swap, so anonymous (allocated with malloc) pages have nowhere to be unloaded, but file-backed pages including all the executable code can always be unloaded, so the first case is actually more probable.

As correctly explained by @eerorika, page faults are handled by the kernel and are completely transparent for C++ (except for the fact they can cause non-deterministic timing—you need a real-time OS to get those).

Chlorosis answered 28/11, 2019 at 12:21 Comment(0)

S

1

Actually, malloc doesn't cause any page faults. The memory is only allocated virtually, so until you use it, it doesn't take up space neither on RAM nor on the disk. If you really want to cause page faults rapidly, you'll have to actually access the buffers in question either for read or for write.

It all boils down to memory usage, if the application is accessing the same 2-3 GB of data, it may be able to live almost without any pagefaults ocurring (assumin that no other application is currently abusing your RAM). So only when your application needs to access a lot of memory, or memory that has gone "cold" for lack of use, you'll have pagefaults.

Additionally, the OS loads entire pages from the disk even if you need to access a single byte from that page. This means that if your data is spread across a large area in the memory, you may experience more page faults, than if all of your data was cancentrated in the same vicinity.

A good test application to uderstand this mechanism would be to allocate huge buffers, more than your RAM can hold, and then to start to modify a single character in 4K intervals (the usual size of a single page in both Linux and Windows). The idea is to dirty as many pages as possible with minimal effort, similar to the concept of ruining a perfectly good package of white paper with a single black dot on every page until your RAM cannot hold so many dirty pages and has to swap them to the disk in order to load other pages for you to dirty.

while (true) {
    char * data = malloc(HUGE_NUMBER)
    for (size_t i=0 ; i<HUGE_NUMBER ; i+=4096)
        data[i] = (char)rand(); // dirty in 4K intervals
}

So a good approach to minimize page faults would be to keep a high memory locality of your data access patterns (use arrays which are sequncial in memory and not lists or maps that may spread all over), and to avoid writing applications that require more RAM than what the target server has to offer.

Regading the executable size, it also depends on how much of the code is actually in use. If your apllication spends 90% of its time running 10% of the code, then the probability for a page faults due to the size of the executable is low, and vice versa.

Selby answered 28/11, 2019 at 12:15 Comment(0)

D

0

C++ language itself is completely agnostic to page faults. In fact, a (freestanding) language implementation could be made for a system that doesn't use virtual memory and thus does not have page faults.

A cpu triggers a page fault when ever a memory page is accessed, that is not mapped by the memory management unit into the address space of the process. It is the responsibility of the operating system to handle the interrupt.

As a rule of thumb, the more memory the program accesses, the more page faults there will be.

Digest answered 28/11, 2019 at 12:19 Comment(0)

P

0

Page faults is an OS/MMU/CPU level concept

Page faults (page-fault) is not a concept at the c++ language level. It's something that happens behind the scenes at the OS/MMU/CPU level - basically to allow extension of RAM onto disk.

For example, an application that randomly accesses huge amounts of memory (like a video editing program) is therefore much more prone to page faults.

That being said, it is possible to lock pages at the OS level so they won't be swapped out - however that is seldom done (even by experts) as it is extremely hard to be clever about.

Pintle answered 28/11, 2019 at 12:29 Comment(0)

K

0

A page fault is caused when a process wants to access a specific memory which is not present there in a loaded page. A few reasons might be: 1. One large process and a lot of I/O activities. 2. Two memory intensive processes. 3. Lots of small processes being executed at the same time. 4. Huge number of recursions which push any other functions or variable out of page.

There might be a few more, but the whole point is, when a lot of things try to access memory, due to high swapping in and out, the page might not contain a specific data at the time of requirement or access, resulting in page fault.

Karli answered 28/11, 2019 at 12:38 Comment(0)

Recommended topics

Hot tags