How to write self-modifying code in x86 assembly

Asked 27/1, 2011 at 4:53 Answered 7/7 at 1:50

Solved assembly x86 jit vm-implementation self-modifying

I'm looking at writing a JIT compiler for a hobby virtual machine I've been working on recently. I know a bit of assembly, (I'm mainly a C programmer. I can read most assembly with reference for opcodes I don't understand, and write some simple programs.) but I'm having a hard time understanding the few examples of self-modifying code I've found online.

This is one such example: http://asm.sourceforge.net/articles/smc.html

The example program provided does about four different modifications when run, none of which are clearly explained. Linux kernel interrupts are used several times, and aren't explained or detailed. (The author moved data into several registers before calling the interrupts. I assume he was passing arguments, but these arguments aren't explained at all, leaving the reader to guess.)

What I'm looking for is the simplest, most straightforward example in code of a self-modifying program. Something that I can look at, and use to understand how self-modifying code in x86 assembly has to be written, and how it works. Are there any resources you can point me to, or any examples you can give that would adequately demonstrate this?

I'm using NASM as my assembler.

EDIT: I'm also running this code on Linux.

Rwanda answered 27/1, 2011 at 4:53 Comment(2)

linux.die.net/man/2/mprotect should explain what the arguments for mprotect are. The function ID to call is passed in EAX and the next arguments are passed in EBX ECX and EDX. – Leibowitz 27/1, 2011 at 5:11

Related: How to get c code to execute hex machine code? shows copying machine-code bytes into a page with write+exec permission, and calling a function in them. Including the necessary GNU C __builtin___clear_cache on the range (actually just syncs I-cache however is required on the target ISA.) – Ku 10/1, 2022 at 1:30

wow, this turned out to be a lot more painful than I expected. 100% of the pain was linux protecting the program from being overwritten and/or executing data.

Two solutions shown below. And a lot of googling was involved so the somewhat simple put some instruction bytes and execute them was mine, the mprotect and aligning on page size was culled from google searches, stuff I had to learn for this example.

The self modifying code is straight forward, if you take the program or at least just the two simple functions, compile and then disassemble you will get the opcodes for those instructions. or use nasm to compile blocks of assembler, etc. From this I determined the opcode to load an immediate into eax then return.

Ideally you simply put those bytes in some ram and execute that ram. To get linux to do that you have to change the protection, which means you have to send it a pointer that is aligned on a mmap page. So allocate more than you need, find the aligned address within that allocation that is on a page boundary and mprotect from that address and use that memory to put your opcodes and then execute.

the second example takes an existing function compiled into the program, again because of the protection mechanism you cannot simply point at it and change bytes, you have to unprotect it from writes. So you have to back up to the prior page boundary call mprotect with that address and enough bytes to cover the code to be modified. Then you can change the bytes/opcodes for that function in any way you want (so long as you don't spill over into any function you want to continue to use) and execute it. In this case you can see that fun() works, then I change it to simply return a value, call it again and now it has been modified.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/mman.h>

unsigned char * testfun;

unsigned int fun(unsigned int a) {
    return (a + 13);
}

unsigned int fun2(void) {
    return (13);
}

int main(void) {
    unsigned int ra;
    unsigned int pagesize;
    unsigned char * ptr;
    unsigned int offset;

    pagesize = getpagesize();
    testfun = malloc(1023 + pagesize + 1);
    if (testfun == NULL) return (1);
    //need to align the address on a page boundary
    printf("%p\n", testfun);
    testfun = (unsigned char * )(((long) testfun + pagesize - 1) & ~(pagesize - 1));
    printf("%p\n", testfun);

    if (mprotect(testfun, 1024, PROT_READ | PROT_EXEC | PROT_WRITE)) {
        printf("mprotect failed\n");
        return (1);
    }

    //400687: b8 0d 00 00 00          mov    $0xd,%eax
    //40068d: c3                      retq

    testfun[0] = 0xb8;
    testfun[1] = 0x0d;
    testfun[2] = 0x00;
    testfun[3] = 0x00;
    testfun[4] = 0x00;
    testfun[5] = 0xc3;

    ra = ((unsigned int( * )()) testfun)();
    printf("0x%02X\n", ra);

    testfun[0] = 0xb8;
    testfun[1] = 0x20;
    testfun[2] = 0x00;
    testfun[3] = 0x00;
    testfun[4] = 0x00;
    testfun[5] = 0xc3;

    ra = ((unsigned int( * )()) testfun)();
    printf("0x%02X\n", ra);

    printf("%p\n", fun);
    offset = (unsigned int)(((long) fun) & (pagesize - 1));
    ptr = (unsigned char * )((long) fun & (~(pagesize - 1)));

    printf("%p 0x%X\n", ptr, offset);

    if (mprotect(ptr, pagesize, PROT_READ | PROT_EXEC | PROT_WRITE)) {
        printf("mprotect failed\n");
        return (1);
    }

    //for(ra=0;ra&lt;20;ra++) printf("0x%02X,",ptr[offset+ra]); printf("\n");

    ra = 4;
    ra = fun(ra);
    printf("0x%02X\n", ra);

    ptr[offset + 0] = 0xb8;
    ptr[offset + 1] = 0x22;
    ptr[offset + 2] = 0x00;
    ptr[offset + 3] = 0x00;
    ptr[offset + 4] = 0x00;
    ptr[offset + 5] = 0xc3;

    ra = 4;
    ra = fun(ra);
    printf("0x%02X\n", ra);

    return (0);
}

Zion answered 27/1, 2011 at 16:38 Comment(7)

not only Linux but most modern OSes also protect writable memory from executing – Bobseine 28/11, 2013 at 11:35

Can this be done in Windows, i.e. unprotecting a page of RAM, or would we be stuck with blue screens of death? I want to use this method to create a self-modifying encryption system. – Embryologist 15/12, 2013 at 9:27

The code worked fine on 32-bit Arch Linux, but failed on 64-bit RHEL (both the 64-bit ELF, of course, but also when using the 32-bit ELF). Don't know if this has to do with additional memory protection on RHEL or something else. The output was: ``` 0x9a00008 0x9a01000 mprotect failed ``` – Blower 13/3, 2014 at 14:10

This isn't self-modifying code, it's just normal JIT into a buffer. Those mov-immediate instructions don't modify their own instruction bytes. – Ku 31/8, 2018 at 5:7

The technical problem here was not "SELF" modifying code, the technical problem here was protection, once you overcome protection then you can self modify to your hearts content. Wasnt meant to be a "SELF" modifying code answer. – Zion 31/8, 2018 at 12:30

It is self modifying in the sense that this program modified its own memory space at run time. Call this JIT if you want, could easily argue your statement is JIT as well. Again the problem is protection not modification. – Zion 31/8, 2018 at 12:32

@Blower mprotect fails because the calculation for page alignment is broken for 64 bit. You need long pagesize. Otherwise addresses are 64 bit values and the bitmasks used are 32 bit. – Inky 10/6, 2019 at 11:56

Since you're writing a JIT compiler, you probably don't want self-modifying code, you want to generate executable code at runtime. These are two different things. Self-modifying code is code that is modified after it has already started running. Self-modifying code has a large performance penalty on modern processors, and therefore would be undesirable for a JIT compiler.

Generating executable code at runtime should be a simple matter of mmap()ing some memory with PROT_EXEC and PROT_WRITE permissions. You could also call mprotect() on some memory you allocated yourself, as dwelch did above.

Educt answered 30/1, 2011 at 7:48 Comment(3)

Self modifying code doesn't always had performance penalties on modern processors. You have to be careful about what you change, and make sure the CPU cache is in sync and branch protection isn't altered. Changing those will tank your performance. – Larimer 5/12, 2014 at 18:41

if the self-modification happens relatively infrequently and/or on parts of the code that are not currently being executed, is the temporary performance hit negligible? – Eumenides 14/12, 2016 at 12:30

@ErikAllik: yeah, it's a one-time hit that costs a pipeline flush, maybe similar in cost to a memory-order mis-speculation. Maybe hundreds of cycles, so pretty easy to amortize of repeated use of the updated code. – Ku 31/8, 2018 at 5:10

I'm working on a self-modifying game to teach x86 assembly, and had to solve this exact problem. I used the following three libraries:

AsmJit + AsmTk for assembling: https://github.com/asmjit/asmjit + https://github.com/asmjit/asmtk UDIS86 for disassembling: https://github.com/vmt/udis86

Instructions are read with Udis86, the user can edit them as a string, and then AsmJit/AsmTk is used to assemble the new bytes. These can be written back to memory, and as other users have pointed out, the write-back requires using VirtualProtect on Windows or mprotect on Unix to fix the memory page permissions.

The code samples are a just a little long for StackOverflow, so I'll refer you to an article I wrote with code samples:

https://medium.com/squallygame/how-we-wrote-a-self-hacking-game-in-c-d8b9f97bfa99

A functioning repo is here (very light-weight):

https://github.com/Squalr/SelfHackingApp

Ihram answered 31/8, 2018 at 4:57 Comment(0)

A little bit simpler example based on the example above. Thanks to dwelch helped a lot.

#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <sys/mman.h>

char buffer [0x2000];
void* bufferp;

char* hola_mundo = "Hola mundo!";
void (*_printf)(const char*,...);

void hola()
{ 
    _printf(hola_mundo);
}

int main ( void )
{
    //Compute the start of the page
    bufferp = (void*)( ((unsigned long)buffer+0x1000) & 0xfffff000 );
    if(mprotect(bufferp, 1024, PROT_READ|PROT_EXEC|PROT_WRITE))
    {
        printf("mprotect failed\n");
        return(1);
    }
    //The printf function has to be called by an exact address
    _printf = printf;

    //Copy the function hola into buffer
    memcpy(bufferp,(void*)hola,60 //Arbitrary size);


    ((void (*)())bufferp)();  

    return(0);
}

Exploiter answered 22/4, 2012 at 8:58 Comment(1)

If you don't generate position independent code for hola(), this could fail drastically. – Damper 10/7, 2015 at 8:45

This is written in AT&T assembly. As you can see from the execution of the program, output has changed because of self-modifying code.

Compilation: gcc -m32 modify.s modify.c

the -m32 option is used because the example works on 32 bit machines

Aessembly:

.globl f4
.data     

f4:
    pushl %ebp       #standard function start
    movl %esp,%ebp

f:
    movl $1,%eax # moving one to %eax
    movl $0,f+1  # overwriting operand in mov instuction over
                 # the new immediate value is now 0. f+1 is the place
                 # in the program for the first operand.

    popl %ebp    # standard end
    ret

C test-program:

 #include <stdio.h>

 // assembly function f4
 extern int f4();
 int main(void) {
 int i;
 for(i=0;i<6;++i) {
 printf("%d\n",f4());
 }
 return 0;
 }

Output:

Signal answered 21/5, 2016 at 18:27 Comment(1)

You may be able to ditch the pushl %ebp, movl %esp,%ebp and popl %ebp – Dielu 13/3, 2018 at 6:28

You can also look at projects like GNU lightning. You give it code for a simplified RISC-type machine, and it generates correct machine dynamically.

A very real problem you should think about is interfacing with foreign libraries. You will probably need to support at least some system-level calls/operations for your VM to be useful. Kitsune's advice is a good start to get you thinking about system-level calls. You would probably use mprotect to ensure that the memory you have modified becomes legally executable. (@KitsuneYMG)

Some FFI allowing calls to dynamic libraries written in C should be sufficient to hide a lot of the OS specific details. All these issues can impact your design quite a bit, so it is best to start thinking about them early.

Gerontology answered 27/1, 2011 at 7:18 Comment(0)

This question is tagged with 'assembly' and 'x86' but not with 'C'. While the person who asked the question mentions they work mostly with C, this question is likely to be encountered by people looking for a pure assembly solution (including me in the past). Hence, this is my attempt at the simplest possible demonstration of a JIT program, heavily inspired by old_timer's answer but rewritten in pure assembly.

.bss
.align 4096 # page size on my machine. You can automate this process using
            # libc's getpagesize() to make it bit more portable, but hey!,
            # this is a minimum viable product!
exec: 
    .skip 10000




.text
mprotectoutput: .asciz "mprotect output value %d\n"

.global main
main:
    # prologue
    pushq %rbp
    movq %rsp, %rbp

    # body
    movq $exec, %rdi
    movq $10000, %rsi
    movq $7, %rdx
    call mprotect

    # print output from the mprotect function. If other than 0, the code will
    # segfault on `jmp *%rax`.
    movq $mprotectoutput, %rdi
    movq %rax, %rsi
    xor %rax, %rax
    call printf

    # the subroutine will move 0x45 to %rax, the return to the address
    # in register %r15

    # set the return address
    movq $back, %r15

    # rdi will be a counter that counts how many program bytes were written
    xor %rdi, %rdi
    # 48 c7 c0 45 00 00 00  mov    $0x45,%rax
    movq $0x0000000045c0c748, %rax
    movq %rax, exec(%rdi)
    addq $7, %rdi
    # 41 ff e7              jmp    *%r15
    movl $0x00e7ff41, %eax
    movl %eax, exec(%rdi)
    addq $3, %rdi

    movq $exec, %rax
    jmp *%rax

back:
    # epilogue
    movq %rbp, %rsp
    popq %rbp
    ret

Wira answered 18/10, 2023 at 9:30 Comment(2)

Note: if the program is stored on the stack, you can skip the paging/mprotecting stage altogether. – Wira 18/10, 2023 at 16:54

The stack shouldn't normally be executable, but toolchains default to making it executable unless every object file includes a .note.gnu-stack that indicates it's compatible with a non-executable stack. Hand-written asm needs to include that explicitly. Why data and stack segments are executable? / Unexpected exec permission from mmap when assembly files included in the project – Ku 18/10, 2023 at 18:6

A much simpler solution than the ones given can be written due to https://nasm.us/doc/nasmdoc8.html#section-8.9.2, ELF extensions to the section directive. This allows you to define custom sections, and in particalar, one that is both writable and executable. Based on that insight, I wrote this (tested on Linux amd64):

    ; Here is a trivial example of self-modifying code.
    ; The instruction at to_modify would print 'A', but 
    ; because of the instruction at label `modifier`, 
    ; the 'A' is replaced by a 'B'. While this isn't changing the op itself,
    ; it is however modifying a hard-coded argument (within a code section), 
    ; so I would say it counts.
    ;
    ; Many of the examples online segfaulted when I tried to run them, or
    ; just wouldn't compile. This example uses nasm's 
    ; section directives found on https://nasm.us/doc/nasmdoc8.html#section-8.9.2,
    ; which allows us to create a writable AND executable section, .textmodify.
    ; 
    ; Be warned; this code may mess up your computer, as it has not been tested on computers
    ; other than mine
    ;
    ; To compile:
    ; $ nasm -f elf self_modify.asm && ld -m elf_i386 -o self_modify self_modify.o && ./self_modify
    ; I am using nasm version 2.14.02.
    ; 
    ; The expected output sohuld be 
    ; Original Code
    ; Modified Code
    ; B
    ;
    ; Whereas if you comment out the line at modifier:
    ; Original code
    ; Modified code
    ; A
    ;
    ; Try improving this program by isolating the .textmodify section to code that will
    ; change (I haven't tried this yet). 
    ; 
    
    
    section .textmodify   progbits    alloc   exec    write   align=1
    global _start
    
    _start:
        ; Print "Original code"
        mov eax, 4
        mov ebx, 1
        mov ecx, msg1
        mov edx, len1
        int 0x80
    
        ; Modify the code
    modifier:
        mov dword [to_modify+1], 0x42  ; 
    
        ; Print "Modified code"
        mov eax, 4
        mov ebx, 1
        mov ecx, msg2
        mov edx, len2
        int 0x80
    
    modified_code:
        ; This instruction will be modified
        to_modify:
        ; This instruction is 
        ; b8 41 00 00 00 
        ; in binary. The first byte is the opcode for mov, the second is 
        ; the character code for 'A' in hex. Thus we replace [to_modify+1] with 0x42. 
        mov eax, 'A'
        nop
        
        ; Print the modified character
        push eax
        mov eax, 4
        mov ebx, 1
        mov ecx, esp
        mov edx, 1
        int 0x80
        pop eax
    
        ; Exit
        mov eax, 1
        xor ebx, ebx
        int 0x80
    
    section .data
        msg1 db "Original code", 10
        len1 equ $ - msg1
        msg2 db "Modified code", 10
        len2 equ $ - msg2

Remember all of the normal caveats about self modifying code apply (it's dangerous, insecure, could burn down your house...)

EDIT: The previous version of this answer said that we needed 1 byte alignment in order to access any part of an instruction. This was incorrect; the code seems to work with an align value of both 1 and 16.

Yttriferous answered 7/7 at 1:50 Comment(2)

"We want one byte alignment so that we can modify any part of an instruction." I don't think alignment of a section does what you think it does. Try it with align=16, it will still work. – Dishearten 7/7 at 11:25

You're correct. The code does work with align=16. For some reason I thought that the base multiplier for the offset would be 16, much like changing the type of pointer in C. But that's wrong. Still don't fully understand it, but I guess it just aligns the start of the start of a section with a multiple of the power of 2 you specify. – Yttriferous 7/7 at 13:54

I've never written self-modifying code, although I have a basic understanding about how it works. Basically you write on memory the instructions you want to execute then jump there. The processor interpret those bytes you've written an instructions and (tries) to execute them. For example, viruses and anti-copy programs may use this technique.
Regarding the system calls, you were right, arguments are passed via registers. For a reference of linux system calls and their argument just check here.

Opulent answered 27/1, 2011 at 13:18 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags