stack prefaulting in linux - single or multiple faults needed

D

2

9

In Linux, when process asks for some (virtual) memory from system, it just registered in vma (descriptor of process's virtual memory) but physical page for every virtual is not reserved at time of call. Later, when process will access this page, it will be faulted (access will generate Page Fault interrupt), and PF# handler will allocate physical page and update process page table.

There are two cases: fault when reading may turn into link to zero page (special global pre-zeroed page) which is write-protected; and fault on writing (both on zero-page and on just-required-yet-not-physically mapped page) will result in actual private physical page allocation.

For mmaps (and brk/sbrk, which is internally mmap too) this method is per-page; all mmaped regions are registered as whole in vma (they have begin and end addresses). But stack is handled in other way, because it has only start address (higher one on typical platform; grow to lower addresses).

The question is:

When I access new unallocated memory near stack, it will get PF# and grow. How this growing handled, if I access not the page next to stack, but the page which is 10 or 100 pages away from stack?

E.g.

int main() {
  int *a = alloca(100); /* some useful data */
  int *b = alloca(50*4096); /* skip 49 pages */
  int *c = alloca(100);

  a[0]=1;
 /* no accesses to b - this is untouched hole of 49 pages */
  c[0]=1;

}

Will this program get 2 or 50 private physical pages allocated for stack?

I think it can be profitable to ask kernel to allocate tens physical pages in single pagefault then do tens pagefaults allocating page by page (1 interrupt + 1 context-switch + simple, cache-friendly loop over N page-allocation requests versus N interrupts + N context-switches + N page allocations, when mm code may be evicted from Icache).

Danieldaniela answered 19/12, 2012 at 7:26 Comment(0)

N

3

The automatic growing of the stack can be thought of as automatic calls to mremap to resize the virtual address region that counts as "stack". Once that's handled, page faults to the stack area or to a vanilla mmap region are handled the same, i.e., one page at a time.

Thus you should end up with ~2 pages allocated, not ~51. @perreal's empirical answer validates this ...

To the last part of the question, the cost of contiguous page faults is one of the factors that lead to the development of "huge pages". I don't think there are other ways in Linux to "batch" page fault handling. Maybe madvise might do something but I suspect its mostly optimizing the really expensive part of page faults which is looking up the backing pages on storage). Stack page faults which map to zero pages are relatively lightweight by comparison.

Novobiocin answered 22/12, 2012 at 17:14 Comment(2)

And mlock(2) will prefault too. – Danieldaniela 23/12, 2012 at 0:30

Using MAP_POPULATE in the mmap call will batch up all the page faulting handling since it will happen before the mmap returns. This doesn't help if you aren't calling mmap directly however (e.g., if you got your memory from malloc). – Randolph 6/5, 2018 at 21:33

W

4

With this code:

int main() {
  int *a = alloca(100); /* some useful data */
  int *b = alloca(50*4096); /* skip 49 pages */
  int *c = alloca(100);
  int i;
#if TOUCH > 0
  a[0] = 1;               // [1]
#endif
#if TOUCH > 1
  c[0] = 1;               // [2]
#endif
#if TOUCH > 2
  for (i=0; i<25; i++)    // [3]
    b[i*1024] = 1;
#endif
#if TOUCH > 3
  for (i=25; i<50; i++)   // [4]
    b[i*1024] = 1;
#endif
  return 0;
}

And this script:

for i in 1 2 3 4; do
  gcc d.c -DTOUCH=$i
  echo "Upto [$i]" $(perf stat ./a.out 2>&1 | grep page-faults)
done

The output:

Upto [1] 105 page-faults # 0.410 M/sec
Upto [2] 106 page-faults # 0.246 M/sec
Upto [3] 130 page-faults # 0.279 M/sec
Upto [4] 154 page-faults # 0.290 M/sec

Wager answered 21/12, 2012 at 2:12 Comment(0)

N

3