Generating functions at runtime in C
Asked Answered
C

3

3

I would like to generate a function at runtime in C. And by this I mean I would essentially like to allocate some memory, point at it and execute it via function pointer. I realize this is a very complex topic and my question is naïve. I also realize there are some very robust libraries out there that do this (e.g. nanojit).

But I would like to learn the technique, starting with the basics. Could someone knowledgeable give me a very simple example in C?

EDIT: The answer below is great but here is the same example for Windows:

#include <Windows.h>

#define MEMSIZE 100*1024*1024
typedef void (*func_t)(void);

int main() {

    HANDLE proc = GetCurrentProcess();
    LPVOID p = VirtualAlloc(
        NULL,
        MEMSIZE,
        MEM_RESERVE|MEM_COMMIT,
        PAGE_EXECUTE_READWRITE);

    func_t func = (func_t)p;
    PDWORD code = (PDWORD)p;
    code[0] = 0xC3; // ret

    if(FlushInstructionCache(
        proc,
        NULL,
        0))
    {
        func();
    }

    CloseHandle(proc);
    VirtualFree(p, 0, MEM_RELEASE);
    return 0;
}
Cascade answered 12/2, 2013 at 5:26 Comment(4)
There is a general problem with this, which is that the C89/90 standard explicitly says converting an arbitrary pointer to a function pointer is undefined behaviour (G.2: A pointer to a function is converted to a pointer to an object or a pointer to an object is converted to a pointer to a function.) I'd be surprised (but interested!) if you really could do this in a portable, standards-compliant way.Karwan
Be aware that there are major security downsides to generating code at runtime. Attackers often have some level of control of the data flowing through your program. If you have a route for attacker controlled data to become attacker controlled code, there's a route for an attacker to install malware on your / your customer's machines. Mix data and code at your peril!Chilly
Thanks Matt, I am aware that its risky and complex. It's my understanding though that this is generally the technique used by JIT tools and, essentially, compilers. Please correct me if I'm wrong.Cascade
As of C11 (haven't checked C99), casting a function pointer to an object pointer is no longer undefined behaviour: it is now considered an implementation-provided extension (J.5.7). i.e. it's allowed.Mailman
B
5

As said previously by other posters, you'll need to know your platform pretty well.

Ignoring the issue of casting a object pointer to a function pointer being, technically, UB, here's an example that works for x86/x64 OS X (and possibly Linux too). All the generated code does is return to the caller.

#include <unistd.h>
#include <sys/mman.h>

typedef void (*func_t)(void);

int main() {
    /*
     * Get a RWX bit of memory.
     * We can't just use malloc because the memory it returns might not
     * be executable.
     */
    unsigned char *code = mmap(NULL, getpagesize(),
            PROT_READ|PROT_EXEC|PROT_WRITE,
            MAP_SHARED|MAP_ANON, 0, 0);

    /* Technically undefined behaviour */
    func_t func = (func_t) code;

    code[0] = 0xC3; /* x86 'ret' instruction */

    func();

    return 0;
}

Obviously, this will be different across different platforms but it outlines the basics needed: get executable section of memory, write instructions, execute instructions.

Bauhaus answered 12/2, 2013 at 6:13 Comment(1)
For the record it seems the corresponding function in Windows is VirtualAlloc with the PAGE_EXECUTE_READWRITE flag.Cascade
K
3

This requires you to know your platform. For instance, what is the C calling convention on your platform? Where are parameters stored? What register holds the return value? What registers must be saved and restored? Once you know that, you can essentially write some C code that assembles code into a block of memory, then cast that memory into a function pointer (though this is technically forbidden in ANSI C, and will not work depending if your platform marks some pages of memory as non-executable aka NX bit).

The simple way to go about this is simply to write some code, compile it, then disassemble it and look at what bytes correspond to which instructions. You can write some C code that fills allocated memory with that collection of bytes and then casts it to a function pointer of the appropriate type and executes.

It's probably best to start by reading the calling conventions for your architecture and compiler. Then learn to write assembly that can be called from C (i.e., follows the calling convention).

Kaon answered 12/2, 2013 at 5:37 Comment(0)
N
2

If you have tools, they can help you get some things right easier. For example, instead of trying to design the right function prologue/epilogue, I can just code this in C:

int  foo(void* Data)
    {
    return (Data != 0);
    }

Then (MicrosoftC under Windows) feed it to "cl /Fa /c foo.c". Then I can look at "foo.asm":

_Data$ = 8
; Line 2
        push    ebp
        mov     ebp, esp
; Line 3
        xor     eax, eax
        cmp     DWORD PTR _Data$[ebp], 0
        setne   al
; Line 4
        pop     ebp
        ret     0

I could also use "dumpbin /all foo.obj" to see that the exact bytes of the function were:

00000000: 55 8B EC 33 C0 83 7D 08 00 0F 95 C0 5D C3

Just saves me some time getting the bytes exactly right...

Nichy answered 12/2, 2013 at 7:2 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.