Why does this code crash with address randomization on?
Asked Answered
O

2

8

I am learning amd64 assembler, and trying to implement a simple Unix filter. For an unknown reason, even simplified to the bare minimum version (code below), it crashes at random.

I tried to debug this program in GNU Debugger (gdb). In the default configuration of gdb, the program runs fine, but if I enable address randomization (set disable-randomization off), the program starts crashing (SIGSEGV). The problematic instruction is marked in the listing:

format ELF64 executable

sys_read                        =       0
sys_write                       =       1
sys_exit                        =       60

entry $
foo:
    label .inbuf   at rbp - 65536
    label .outbuf  at .inbuf - 65536
    label .endvars at .outbuf
    mov rbp, rsp

    mov rax, sys_read
    mov rdi, 0
    lea rsi, [.inbuf]
    mov rdx, 65536
    syscall

    xor ebx, ebx
    cmp eax, ebx
    jl .read_error
    jz .exit

    mov r8, rax  ; r8  - count of valid bytes in input buffer
    xor r9, r9   ; r9  - index of byte in input buffer, that is being processed.
    xor r10, r10 ; r10 - index of next free position in output buffer.

.next_byte:
    cmp r9, r8
    jg .exit
    mov al, [.inbuf + r9]
    mov [.outbuf + r10], al ;; SIGSEGV here in GDB
    inc r10
    inc r9
    jmp .next_byte

.read_error:
    mov rax, sys_exit
    mov rdi, 1
    syscall
.exit:
    mov rax, sys_write
    mov rdi, 1
    lea rsi, [.outbuf]
    mov rdx, r10
    syscall

    mov rax, sys_exit
    xor rdi, rdi
    syscall


This program is meant to read at most 64kB from stdin, store it into a buffer on the stack, copy the read data byte-by-byte into the output buffer and write the content of the output buffer to the standard output stream. Essentially, it should behave as a limited version of cat.

On my computer, it either works as intended, or crashes with SIGSEGV, with an approximate rate of 1 successful run to 4 crashes.

Orchidectomy answered 22/6, 2019 at 18:35 Comment(0)
C
4

The red zone in amd64 is only 128 bytes long, but you're using 131072 bytes below rsp. Move the stack pointer down to encompass the buffers that you want to store on the stack.

Chaldea answered 22/6, 2019 at 18:41 Comment(1)
That doesn't fully explain why it faults in practice, especially not some of the time.Tailpiece
T
6

sub rsp, <size> to reserve stack space before touching it, if you're using more than 128 bytes below RSP.


When it crashes, look at your process memory map. You might be using memory so far below RSP that the kernel doesn't grow the stack mapping and thus it's just an ordinary access to an unmapped page = invalid page fault => kernel delivers SIGSEGV.

(The ABI only defines a 128-byte red-zone, but in practice the only thing that can clobber that memory is a signal handler (which you didn't install) or GDB running print some_func() using your program's stack to call a function in your program.)

Normally Linux is pretty willing to grow the stack mapping without touching intervening pages, but apparently does check the value of RSP. Normally you move RSP instead of just using memory far below the stack pointer (because there's no guarantee it's safe). See How is Stack memory allocated when using 'push' or 'sub' x86 instructions?

Another duplicate: Which exception can be generated when subtracting ESP or RSP register? (stack growing) where using sub rsp, 5555555 before touching new stack memory was sufficient.

Stack ASLR might start RSP in different places relative to a page boundary, so you might be just barely getting away with it sometimes. Linux initially maps 132kiB of stack space, and that includes space for the environment and args on the stack on entry to _start. Your 128kiB is very close to that, so it's totally plausible that it randomly works sometimes.


And BTW, there's zero reason to actually copy memory in user-space, especially not 1 byte at a time. Just pass the same address to write.

Or at least filter in-place if possible, so your cache footprint is smaller.

Also, the normal way to load a byte is movzx eax, byte [mem]. Only use mov al, [mem] if you specifically want to merge with the old value of RAX. On some CPUs, mov to al has a false dependency on the old value which you can break by writing the full register.


And BTW, if your program always uses this space, you might as well statically allocate it in the BSS. That makes more efficient indexed addressing possible if you choose to assemble a position-dependent (non-PIE) executable.

Tailpiece answered 22/6, 2019 at 18:44 Comment(2)
I know, that copying by 1 byte in userspace is wierd, but I had to simplify my code. Originally, there were some processing. Thank you for info about movzx.Orchidectomy
@DmitryBogatov: your loop is still pretty overcomplicated :P two separate indexes and you're still using indexed addressing modes instead of just pointer increments. And you don't have the conditional branch at the bottom. See Why are loops always compiled into "do...while" style (tail jump)?Tailpiece
C
4

The red zone in amd64 is only 128 bytes long, but you're using 131072 bytes below rsp. Move the stack pointer down to encompass the buffers that you want to store on the stack.

Chaldea answered 22/6, 2019 at 18:41 Comment(1)
That doesn't fully explain why it faults in practice, especially not some of the time.Tailpiece

© 2022 - 2024 — McMap. All rights reserved.