Unlimited stack cannot grow beyond the initial 132KiB if MAP_FIXED involved?
Asked Answered
O

1

0

I'm running some experiments with stack and the following got me stuck.

It can be seen that Linux has initial [stack] mapping 132KiB in size. In case of ulimit -s unlimited we can expand the stack any further if we adjust rsp accordingly. So I set ulimit -s unlimited and ran the following program:

PAGE_SIZE     equ 0x1000

;mmap staff
PROT_READ     equ 0x01
PROT_WRITE    equ 0x02
MAP_ANONYMOUS equ 0x20
MAP_PRIVATE   equ 0x02
MAP_FIXED     equ 0x10

;syscall numbers
SYS_mmap      equ 0x09
SYS_exit      equ 0x3c

section .text

global _start

_start:
    ; page alignment
    and rsp, -0x1000

    ; call mmap 0x101 pages below the rsp with fixed mapping
    mov rax, SYS_mmap
    lea rdi, [rsp - 0x101 * PAGE_SIZE]
    mov rsi, PAGE_SIZE
    mov rdx, PROT_READ | PROT_WRITE
    mov r10, MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED
    mov r8, -1
    mov r9, 0
    syscall

    sub rsp, 0x80 * PAGE_SIZE
    mov qword [rsp], -1 ; SEGV

    mov rax, SYS_exit
    mov rdi, 0
    syscall

Even in spite of adjusting the rsp it segfaults anyway. I don't really get the point. I manually created a fixed mapping at the address rsp - 0x101 * PAGE_SIZE 101 pages below the rsp.

My expectation was that it would not interfere with expanding the stack (rsp - 0x80 in my case) till we hit the fixed mapping rsp - 0x101 * PAGE_SIZE.

Btw, If I remove MAP_FIXED from the mapping it is not honored and no segfault occurs (as expected). Here is the strace output:

mmap(0x7ffe4e0fe000, 4096, PROT_READ|PROT_WRITE, 
     MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x1526e3f3a000

But MAP_FIXED does the job:

mmap(0x7ffd8979c000, 4096, PROT_READ|PROT_WRITE, 
     MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7ffd8979c000

UPD: The segfault is not triggered if lea rdi, [rsp - 0x101 * PAGE_SIZE] is replaced with lea rdi, [rsp - 0x200 * PAGE_SIZE].

Ophthalmoscope answered 7/7, 2019 at 17:56 Comment(11)
The red zone has nothing to do with this. The red zone is simply the 128 bytes below RSP and has nothing to do with growing the stack. The red zone exists as a space that can be used by programs without worrying about it being clobbered by a signal handler etc. You need to get away from thinking the red zone plays a part in this.Gyneco
Linux uses the concept of guard pages on the stack. This may be of some interest: lkml.org/lkml/2017/6/22/345Gyneco
@MichaelPetch I was thought that we are only safe to touch the memory within the red zone. In another case we have to adjust the rsp as specified here https://mcmap.net/q/23281/-why-does-this-code-crash-with-address-randomization-onOphthalmoscope
No, the redzone is simply a place you can store temporary data without it being clobbered. The thing the redzone buys you is that you don't have to use instructions like sub rsp, ## to reserve space for local variables (as long as they take up no more than 128 bytes below RSP) and your function doesn't call other functions.. If you need more space then you adjust RSP to account for it. You can access memory beyond that but you have to concern yourself with Linux guard pages.Gyneco
@MichaelPetch That's interesting. So the initial 132KiB mapping is sort of guard pages for the [stack]?Ophthalmoscope
Presumably the OS tries to grow the stack in some fixed increment which simply does not fit between your 0x80 and 0x101 pages but does fit into the gap if you use 0x200.Fenelia
@Fenelia I tried to sub rsp, 0x1000 * PAGE_SIZE that is below the fixed mapping, but segfault also occurred. No manually created mappings exist below.Ophthalmoscope
You can't grow the stack through the fixed mapping for obvious reasons.Fenelia
The interesting thing to try would be to see if the kernel treats that space as reserved, using MAP_FIXED_NOREPLACE or just a normal non-NULL hint address without either FIXED flag. (mmap won't randomly pick it, and it might be reserved with a mechanism that stops it from being allocated without MAP_FIXED)Pittel
@PeterCordes I did not find the MAP_FIXED_NOREPLACE macro declaration in the sys/mman.h neither in the man mmap. The only hint was specified here. I use kernel 4.18.0. Tying to manually add the flag to mmap arguments also resulted in segfault.Ophthalmoscope
@St.Antario: then your user-space glibc is too old, but your kernel is new enough. You can just define it yourself because the mmap function in glibc is just a thin wrapper for the system call; it doesn't need to understand the flags.Pittel
F
5

Linux kernel enforces a gap between the stack and other mappings. If that gap can not be maintained then the stack will not grow.

Relevant source code in mm/mmap.c, from line 2498

/* enforced gap between the expanding stack and other mappings. */
unsigned long stack_guard_gap = 256UL<<PAGE_SHIFT;

static int __init cmdline_parse_stack_guard_gap(char *p)
{
    unsigned long val;
    char *endptr;

    val = simple_strtoul(p, &endptr, 10);
    if (!*endptr)
        stack_guard_gap = val << PAGE_SHIFT;

    return 0;
}
__setup("stack_guard_gap=", cmdline_parse_stack_guard_gap);

and line 2424:

int expand_downwards(struct vm_area_struct *vma,
                   unsigned long address)
{
    struct mm_struct *mm = vma->vm_mm;
    struct vm_area_struct *prev;
    int error = 0;

    address &= PAGE_MASK;
    if (address < mmap_min_addr)
        return -EPERM;

    /* Enforce stack_guard_gap */
    prev = vma->vm_prev;
    /* Check that both stack segments have the same anon_vma? */
    if (prev && !(prev->vm_flags & VM_GROWSDOWN) &&
            (prev->vm_flags & (VM_WRITE|VM_READ|VM_EXEC))) {
        if (address - prev->vm_end < stack_guard_gap)
            return -ENOMEM;
    }

You can see it's adjustable via kernel parameter but the default is 256. Thus this gap does not fit between 0x80 and 0x101 pages, but does fit if you use 0x200.

Fenelia answered 7/7, 2019 at 18:29 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.