Why is MAP_GROWSDOWN mapping does not grow?
Asked Answered
A

3

9

I tried to create MAP_GROWSDOWN mapping with the expectation it would grow automatically. As specified in the manual page:

MAP_GROWSDOWN

This flag is used for stacks. It indicates to the kernel virtual memory system that the mapping should extend downward in memory. The return address is one page lower than the memory area that is actually created in the process's virtual address space. Touching an address in the "guard" page below the mapping will cause the mapping to grow by a page. This growth can be repeated until the mapping grows to within a page of the high end of the next lower mapping, at which point touching the "guard" page will result in a SIGSEGV signal.

So I wrote the following example to test the mapping growing:

#ifndef _GNU_SOURCE
    #define _GNU_SOURCE
#endif
#include <stdlib.h>
#include <string.h>
#include <inttypes.h>
#include <errno.h>
#include <sys/mman.h>
#include <stdio.h>

int main(void){
    char *mapped_ptr = mmap(NULL, 4096,
                            PROT_READ | PROT_WRITE,
                            MAP_ANONYMOUS | MAP_PRIVATE | MAP_STACK | MAP_GROWSDOWN,
                            -1, 0);
    if(mapped_ptr == MAP_FAILED){
        int error_code = errno;
        fprintf(stderr, "Cannot do MAP_FIXED mapping."
                        "Error code = %d, details = %s\n", error_code, strerror(error_code));
                        exit(EXIT_FAILURE);
    }
    volatile char *c_ptr_1 = mapped_ptr; //address returned by mmap
    *c_ptr_1 = 'a'; //fine

    volatile char *c_ptr_2 = mapped_ptr - 4095; //1 page below the guard
    *c_ptr_2 = 'b'; //crashes with SEGV
}

So I got SEGV instead of growing the mapping. What does it mean by growing here?

Abstain answered 4/7, 2019 at 13:11 Comment(10)
The mapped_ptr is already in the guard page. Touch it first before touching a page below it.Mako
@ThomasJager Yeah. I tried to touch mapped_ptr. It was fined. But if I touch mapped_ptr - 4095 after this it segfaults anyway (I expected it to grow further).Abstain
Please add the includes!Direct
Hmm I run the code and I couldn't get it working. In fact as far as I could read, the MAP_GROWSDOWN is not used by anything anymore, not for stacks, and it doesn't really work and should be removed anyway. It does not have sufficient protection.Direct
Please consider this: lkml.iu.edu/hypermail/linux/kernel/0808.1/2846.html. Rather allocate a mapping big enough for the stack and map bottom of it from writes.Direct
What's your kernel version?Direct
@AnttiHaapala I use 4.18.0-24-generic Ubuntu 18.04.Abstain
@AnttiHaapala Could you please elaborate a bit about the Patch you referred to? So allocating a guard page can overwrite the mapping of the another mmap call (I suspect the guard page mapping is done via mmap with flags MAP_FIXED and PROT_NONE protection). If so then it is clear what the patch author was talking about.Abstain
@Abstain I am just linking that discussion because Drepper says that MAP_GROWSDOWN is fundamentally broken anyway. It isn't used on my machine for any processes either.Direct
In any case on my machine there is only a <100KiB gap before the next mapping so it wouldn't probably work as you expected anyway!Direct
K
3

Replace:

volatile char *c_ptr_1 = mapped_ptr - 4096; //1 page below

With

volatile char *c_ptr_1 = mapped_ptr;

Because:

The return address is one page lower than the memory area that is actually created in the process's virtual address space. Touching an address in the "guard" page below the mapping will cause the mapping to grow by a page.

Note that I tested the solution and it works as expected on kernel 4.15.0-45-generic.

Kannada answered 4/7, 2019 at 13:32 Comment(16)
did you test if this works? I couldn't get it to grow anyway, even if I successfully read c_ptr_1[0] and it returns 0 and I can set it, reads, writes to [-1] will sigsegv.Direct
OK then kernel version might matter! 4.18.0-24-generic x86_64 Ubuntu here.Direct
But did you try to map more?Direct
@AnttiHaapala Added a kernel version for you.Kannada
I had it fail on 4.15.0-50-generic x86_64 Ubuntu 18.04.Mako
@ThomasJager I cannot confirm or deny your observations.Kannada
Can you explain why you removed volatile?Abstain
@Abstain Added back. I compiled your code with no optimizations, so volatile wasn't necessary for me. But it works with volatile just as well.Kannada
@Abstain please edit the question so that there are 2 accesses and the first access works and the second does not and then Maxim can verify it / Maxim: did you try accessing another page too?Direct
@AnttiHaapala Added the example when accessing 1 page below the guard fails.Abstain
@Abstain mapped_ptr - 4095 is not supposed to work at all. This is what my answer is trying to say.Kannada
@MaximEgorushkin So there is only 1 guard page created on the mmap with MAP_GROWSDOWN. I thought as soon as we touch page that is right below the guard it should be reserved. There is This growth can be repeated until the mapping grows to within a page of the high end of the next lower mapping part about repeating the grow...Abstain
@AnttiHaapala and Maxim: Yeah the OP's updated code doesn't work for me. And the returned pointer = the bottom of the mapping according to /proc/PID/maps, not one page below like the man page claims. It's probably obsolete; MAP_GROWSDOWN might be broken or maybe finally removed in my 5.0.1 (Arch Linux) kernel. lwn.net/Articles/294001 (from 2008) says it's not usable, and should be deprecated. The behaviour I see on Linux 5.0.1 is consistent with MAP_GROWSDOWN being silently ignored. The man page saying "used for stacks" is a total joke; it isn't.Coxcomb
@AnttiHaapala: Update: I'm sure it's not fully removed from the kernel. /proc/PID/smaps does show a gd flag. But it may only work for mappings that start larger than 1 page? The code in the question works for me after changing 4096 to 100*4096. And the maps entry does actually change to a lower start address but the same end address, so it is literally growing downward when I start with a 400k mapping. bugs.centos.org/view.php?id=4767 may be related.Coxcomb
But contrary to the man page, the mmap return value was the start address of the mapping in mapsCoxcomb
@PeterCordes hmm that could be an explanationDirect
C
8

First of all, you don't want MAP_GROWSDOWN, and it's not how the main thread stack works. Analyzing memory mapping of a process with pmap. [stack] Nothing uses it, and pretty much nothing should use it. The stuff in the man page saying it's "used for stacks" is wrong and should be fixed.

I suspect it might be buggy (because nothing uses it so usually nobody cares or even notices if it breaks.)


Your code works for me if I change the mmap call to map more than 1 page. Specifically, I tried 4096 * 100. I'm running Linux 5.0.1 (Arch Linux) on bare metal (Skylake).

/proc/PID/smaps does show a gd flag.

And then (when single-stepping the asm) the maps entry does actually change to a lower start address but the same end address, so it is literally growing downward when I start with a 400k mapping. This gives a 400k initial allocation above the return address, which grows to 404kiB when the program runs. (The size for a _GROWSDOWN mapping is not the growth limit or anything like that.)

https://bugs.centos.org/view.php?id=4767 may be related; something changed between kernel versions in CentOS 5.3 and 5.5. And/or it had something to do with working in a VM (5.3) vs. not growing and faulting on bare metal (5.5).


I simplified the C to use ptr[-4095] etc:

int main(void){
    volatile char *ptr = mmap(NULL, 4096*100,
                            PROT_READ | PROT_WRITE,
                            MAP_ANONYMOUS | MAP_PRIVATE | MAP_STACK | MAP_GROWSDOWN,
                            -1, 0);
    if(ptr == MAP_FAILED){
        int error_code = errno;
        fprintf(stderr, "Cannot do MAP_FIXED mapping."
                        "Error code = %d, details = %s\n", error_code, strerror(error_code));
                        exit(EXIT_FAILURE);
    }

    ptr[0] = 'a';      //address returned by mmap
    ptr[-4095] = 'b';  // grow by 1 page
}

Compiling with gcc -Og gives asm that's nice-ish to single-step.


BTW, various rumours about the flag having been removed from glibc are obviously wrong. This source does compile, and it's clear that it's also supported by the kernel, not silently ignored. (Although the behaviour I see with size 4096 instead of 400kiB is exactly consistent with the flag being silently ignored. However the gd VmFlag is still there in smaps, so it's not ignored at that stage.)

I checked and there was room for it to grow without coming close to another mapping. So IDK why it didn't grow when the GD mapping was only 1 page. I tried a couple times and it segfaulted each time. With the larger initial mapping it never faulted.

Both times were with a store to the mmap return value (the first page of the mapping proper), then a store 4095 bytes below that.

Coxcomb answered 7/7, 2019 at 10:5 Comment(0)
C
8

I know the OP has already accepted one of the answers, but unfortunately it does not explain why MAP_GROWSDOWN seems to work sometimes. Since this Stack Overflow question is one of the first hits in search engines, let me add my answer for others.

The documentation of MAP_GROWSDOWN needs updating. In particular:

This growth can be repeated until the mapping grows to within a page of the high end of the next lower mapping, at which point touching the "guard" page will result in a SIGSEGV signal.

In reality, the kernel does not allow a MAP_GROWSDOWN mapping to grow closer than stack_guard_gap pages away from the preceding mapping. The default value is 256, but it can be overridden on the kernel command line. Since your code does not specify any desired address for the mapping, the kernel chooses one automatically, but is quite likely to end up within 256 pages from the end of an existing mapping.

EDIT:

Additionally, kernels before v5.0 deny access to an address which is more than 64k+256 bytes below stack pointer. See this kernel commit for details.

This program works on x86 even with pre-5.0 kernels:

#include <sys/mman.h>
#include <stdint.h>
#include <stdio.h>

#define PAGE_SIZE   4096UL
#define GAP     512 * PAGE_SIZE

static void print_maps(void)
{
    FILE *f = fopen("/proc/self/maps", "r");
    if (f) {
        char buf[1024];
        size_t sz;
        while ( (sz = fread(buf, 1, sizeof buf, f)) > 0)
            fwrite(buf, 1, sz, stdout);
        fclose(f);
    }
}

int main()
{
    char *p;
    void *stack_ptr;

    /* Choose an address well below the default process stack. */
    asm volatile ("mov  %%rsp,%[sp]"
        : [sp] "=g" (stack_ptr));
    stack_ptr -= (intptr_t)stack_ptr & (PAGE_SIZE - 1);
    stack_ptr -= GAP;
    printf("Ask for a page at %p\n", stack_ptr);
    p = mmap(stack_ptr, PAGE_SIZE, PROT_READ | PROT_WRITE,
         MAP_PRIVATE | MAP_STACK | MAP_ANONYMOUS | MAP_GROWSDOWN,
         -1, 0);
    printf("Mapped at %p\n", p);
    print_maps();
    getchar();

    /* One page is already mapped: stack pointer does not matter. */
    *p = 'A';
    printf("Set content of that page to \"%s\"\n", p);
    print_maps();
    getchar();

    /* Expand down by one page. */
    asm volatile (
        "mov  %%rsp,%[sp]"  "\n\t"
        "mov  %[ptr],%%rsp" "\n\t"
        "movb $'B',-1(%%rsp)"   "\n\t"
        "mov  %[sp],%%rsp"
        : [sp] "+&g" (stack_ptr)
        : [ptr] "g" (p)
        : "memory");
    printf("Set end of guard page to \"%s\"\n", p - 1);
    print_maps();
    getchar();

    return 0;
}
Chervonets answered 2/7, 2020 at 18:2 Comment(0)
K
3

Replace:

volatile char *c_ptr_1 = mapped_ptr - 4096; //1 page below

With

volatile char *c_ptr_1 = mapped_ptr;

Because:

The return address is one page lower than the memory area that is actually created in the process's virtual address space. Touching an address in the "guard" page below the mapping will cause the mapping to grow by a page.

Note that I tested the solution and it works as expected on kernel 4.15.0-45-generic.

Kannada answered 4/7, 2019 at 13:32 Comment(16)
did you test if this works? I couldn't get it to grow anyway, even if I successfully read c_ptr_1[0] and it returns 0 and I can set it, reads, writes to [-1] will sigsegv.Direct
OK then kernel version might matter! 4.18.0-24-generic x86_64 Ubuntu here.Direct
But did you try to map more?Direct
@AnttiHaapala Added a kernel version for you.Kannada
I had it fail on 4.15.0-50-generic x86_64 Ubuntu 18.04.Mako
@ThomasJager I cannot confirm or deny your observations.Kannada
Can you explain why you removed volatile?Abstain
@Abstain Added back. I compiled your code with no optimizations, so volatile wasn't necessary for me. But it works with volatile just as well.Kannada
@Abstain please edit the question so that there are 2 accesses and the first access works and the second does not and then Maxim can verify it / Maxim: did you try accessing another page too?Direct
@AnttiHaapala Added the example when accessing 1 page below the guard fails.Abstain
@Abstain mapped_ptr - 4095 is not supposed to work at all. This is what my answer is trying to say.Kannada
@MaximEgorushkin So there is only 1 guard page created on the mmap with MAP_GROWSDOWN. I thought as soon as we touch page that is right below the guard it should be reserved. There is This growth can be repeated until the mapping grows to within a page of the high end of the next lower mapping part about repeating the grow...Abstain
@AnttiHaapala and Maxim: Yeah the OP's updated code doesn't work for me. And the returned pointer = the bottom of the mapping according to /proc/PID/maps, not one page below like the man page claims. It's probably obsolete; MAP_GROWSDOWN might be broken or maybe finally removed in my 5.0.1 (Arch Linux) kernel. lwn.net/Articles/294001 (from 2008) says it's not usable, and should be deprecated. The behaviour I see on Linux 5.0.1 is consistent with MAP_GROWSDOWN being silently ignored. The man page saying "used for stacks" is a total joke; it isn't.Coxcomb
@AnttiHaapala: Update: I'm sure it's not fully removed from the kernel. /proc/PID/smaps does show a gd flag. But it may only work for mappings that start larger than 1 page? The code in the question works for me after changing 4096 to 100*4096. And the maps entry does actually change to a lower start address but the same end address, so it is literally growing downward when I start with a 400k mapping. bugs.centos.org/view.php?id=4767 may be related.Coxcomb
But contrary to the man page, the mmap return value was the start address of the mapping in mapsCoxcomb
@PeterCordes hmm that could be an explanationDirect

© 2022 - 2024 — McMap. All rights reserved.