How to call machine code stored in char array?
Asked Answered
H

6

58

I'm trying to call native machine-language code. Here's what I have so far (it gets a bus error):

char prog[] = {'\xc3'}; // x86 ret instruction

int main()
{
    typedef double (*dfunc)();

    dfunc d = (dfunc)(&prog[0]);
    (*d)();
    return 0;
}

It does correctly call the function and it gets to the ret instruction. But when it tries to execute the ret instruction, it has a SIGBUS error. Is it because I'm executing code on a page that is not cleared for execution or something like that?

So what am I doing wrong here?

Hoskins answered 5/10, 2016 at 7:2 Comment(9)
This probably depends on OS and compiler, so you should add that info. You may need compiler-specific stuff to tell that prog should be executable, just as you suspect.Redintegrate
Also, make the function return void to avoid any problems related to that.Redintegrate
You need to allocate a page of memory and make it executable.Pyxie
Incase it helps anyone: I've often found that SIGBUS is indicative of bad alignment.Mallory
@user Please post your solution as an answer instead of editing it into your question.Hanna
I reverted the changes to the code sample, so the question makes sense again. Please do what @Hanna already suggested and accept an answer instead of updating your question, changing its meaning in the process.Distraught
Would it be more practical to use the asm() function?Mooncalf
Please, please, please, please, please, please, please use asm() (per @Stavr00's comment and Graham's answer) rather than any of the other approaches, especially if your code has any possibility of ever seeing the light of day on a piece of silicon that is in any way connected to the internet or has any possibility of interacting with anyone beyond yourself.Nipa
I took the liberty of replacing "native code" with "machine code" in the title - that should avoid misunderstandings.Calico
F
54

One first problem might be that the location where the prog data is stored is not executable.

On Linux at least, the resulting binary will place the contents of global variables in the "data" segment or here, which is not executable in most normal cases.

The second problem might be that the code you are invoking is invalid in some way. There's a certain procedure to calling a method in C, called the calling convention (you might be using the "cdecl" one, for example). It might not be enough for the called function to just "ret". It might also need to do some stack cleanup etc. otherwise the program will behave unexpectedly. This might prove an issue once you get past the first problem.

Freudberg answered 5/10, 2016 at 7:16 Comment(1)
This article goes into remarkably complete detail on how to embed and call machine code in C. It starts with the premise of turning main() into a char array.Adagietto
S
52

You need to call memprotect in order to make the page where prog lives executable. The following code does make this call, and can execute the text in prog.

#include <unistd.h>
#include <stdio.h>
#include <malloc.h>
#include <stdlib.h>
#include <errno.h>
#include <sys/mman.h>

char prog[] = {
   0x55,             // push   %rbp
   0x48, 0x89, 0xe5, // mov    %rsp,%rbp
   0xf2, 0x0f, 0x10, 0x05, 0x00, 0x00, 0x00,
       //movsd  0x0(%rip),%xmm0        # c <x+0xc>
   0x00,
   0x5d,             // pop    %rbp
   0xc3,             // retq
};

int main()
{
    long pagesize = sysconf(_SC_PAGE_SIZE);
    long page_no = (long)prog/pagesize;
    int res = mprotect((void*)(page_no*pagesize), (long)page_no+sizeof(prog), PROT_EXEC|PROT_READ|PROT_WRITE);
    if(res)
    {
        fprintf(stderr, "mprotect error:%d\n", res);
        return 1;
    }
    typedef double (*dfunc)(void);

    dfunc d = (dfunc)(&prog[0]);
    double x = (*d)();
    printf("x=%f\n", x);
    fflush(stdout);
    return 0;
}
Seneschal answered 5/10, 2016 at 8:1 Comment(6)
In this case, you can also declare the array const to let it be stored in the executable part of the process memory: https://mcmap.net/q/331470/-segmentation-fault-when-byte-coding-a-function-duplicate/1025391Microbe
@Microbe You can't generally assume that the memory section for readonly data is executable, even if some shitty linkers do it like that.Naught
@Naught You can't generally assume that you can execute machine code stored in an array, yet the OP asked for it.Microbe
@moooeeeep: sometime in the last year or so, GNU ld started linking .rodata into its own ELF segment so it can be read-only without exec permission. Not part of the text segment like it used to do. So that simple trick no longer works. You could use an __attribute__((section(".text"))) on a const array, though, in GNU C.Chlor
Your shellcode reads past the end of its array with movsd 0x0(%rip),%xmm0. That's an 8-byte load starting at the 0x5d byte (the byte after the movsd instruction because RIP+0). x86 is little-endian so the exponent field of the double will be from whatever garbage comes next. It looks like you naively copied objdump output for compiler-generated debug-mode code for a function that returns a double. It will of course load that constant from .rodata because x86 doesn't have FP immediate operands. But you didn't put the referenced double into the shellcode.Chlor
Some FP constants can be generated on the fly in a few instructions, like in What are the best instruction sequences to generate vector constants on the fly?. Or just convert your example to returning an int because mov $123, eax / ret is self-contained.Chlor
S
31

As everyone already said, you must ensure prog[] is executable, however the proper way to do it, unless you're writing a JIT compiler, is to put the symbol in an executable area, either by using a linker script or by specifying the section in the C code if the compiler allows , e.g.:

const char prog[] __attribute__((section(".text"))) = {...}
Stuff answered 5/10, 2016 at 10:57 Comment(0)
P
30

Virtually all C compilers will let you do this by embedding regular assembly language in your code. Of course it's a non-standard extension to C, but compiler writers recognise that it's often necessary. As a non-standard extension, you'll have to read your compiler manual and check how to do it, but the GCC "asm" extension is a fairly standard approach.

 void DoCheck(uint32_t dwSomeValue)
 {
    uint32_t dwRes;

    // Assumes dwSomeValue is not zero.
    asm ("bsfl %1,%0"
      : "=r" (dwRes)
      : "r" (dwSomeValue)
      : "cc");

    assert(dwRes > 3);
 }

Since it's easy to trash the stack in assembler, compilers often also allow you to identify registers you'll use as part of your assembler. The compiler can then ensure the rest of that function steers clear of those registers.

If you're writing the assembler code yourself, there is no good reason to set up that assembler as an array of bytes. It's not just a code smell - I'd say it is a genuine error which could only happen by being unaware of the "asm" extension which is the right way to embed assembler in your C.

Pachisi answered 5/10, 2016 at 17:17 Comment(9)
Good lord, how did five separate users answer this question without even mentioning asm? Bleeeeeeaaaaaaaarrrrrrrgggggghhhhh.Nipa
@KyleStrand Maybe everyone else makes the distinction between machine language (what the user wants) and assembler. asm is for example less useful if you want to generate the code on-the-fly.Koss
Also you can't use true assembly in every C compiler. E.g. MSVC will treat its __asm codes as code in yet another high-level language: it will try to optimize it, and won't let you emit raw bytes (like with db directive in MASM).Aver
@Koss Except that he's setting up constant arrays with instruction byte codes. If he knows what instructions he wants, all he's doing is a complicated version of embedding an "asm" block.Pachisi
@KyleStrand Very much so! Scary stuff... :)Pachisi
@Pachisi No, he is not, it's a plain char array, nothing constant about that. He's initializing it with a constant.Koss
@Koss If you're going to be pedantic, then note that Graham said "constant", not "const".Nipa
@KyleStrand I never mentioned const. I'm pretty sure that someone who's trying to execute machine language from C is aware of the asm() construct, and I'm glad that he reduced the example code snippet to the bare minimum that demonstrates the problem.Koss
@Koss If that's definitely what he wants to do, then fine. But based on what he originally asked for, and the fact that his byte array is set by a constant initialiser and is not subsequently changed, it'd be negligent not to point out the "asm" solution as the best (and only good) solution for running a known chunk of assembler.Pachisi
M
9

Essentially this has been clamped down on because it was an open invitation to virus writers. But you can allocate and buffer and set it up with native machinecode in straight C - that's no problem. The issue is calling it. Whilst you can try setting up a function pointer with the address of the buffer and calling it, that's highly unlikely to work, and highly likely to break on the next version of the compiler if somehow you do manage to coax it into doing what you want. So the best bet is to simply resort to a bit of inline assembly, to set up the return and jump to the automatically generated code. But if the system protects against this, you'll have to find methods of circumventing the protection, as Rudi described in his answer (but very specific to one particular system).

Michiko answered 5/10, 2016 at 10:42 Comment(0)
B
6

One obvious error is that \xc3 is not returning the double that you claim it's returning.

Biome answered 5/10, 2016 at 7:11 Comment(1)
True! (this is just padding)Smithsonite

© 2022 - 2024 — McMap. All rights reserved.