Executing machine code in memory
Asked Answered
S

9

33

I'm trying to figure out how to execute machine code stored in memory.

I have the following code:

#include <stdio.h>
#include <stdlib.h>

int main(int argc, char* argv[])
{
    FILE* f = fopen(argv[1], "rb");

    fseek(f, 0, SEEK_END);
    unsigned int len = ftell(f);
    fseek(f, 0, SEEK_SET);

    char* bin = (char*)malloc(len);
    fread(bin, 1, len, f);

    fclose(f);

    return ((int (*)(int, char *)) bin)(argc-1, argv[1]);
}

The code above compiles fine in GCC, but when I try and execute the program from the command line like this:

./my_prog /bin/echo hello

The program segfaults. I've figured out the problem is on the last line, as commenting it out stops the segfault.

I don't think I'm doing it quite right, as I'm still getting my head around function pointers.

Is the problem a faulty cast, or something else?

Sorn answered 7/1, 2010 at 11:35 Comment(4)
Charlie: If you ever make sence of all these answers, rather then using a casted pointer to a function as you have their, you may be better suited to write some basic thunk which manages the stack arguments dynamically. If using gcc, a function declared like "function() attribute ((naked));" and see gcc.gnu.org/onlinedocs/gcc/Function-Attributes.html for more examples. That way, you call the same function which decides if the dynamically loaded code will need to be supplied with N number of arguments/calling convention etc... Either way, you should probably lookino FFI and such.Autoroute
I'm pretty sure the OP is just misunderstanding the fundamentals of how executable files work. Use a dynamic link library for executing your own dynamic code, and exec for executing other apps.Ultramodern
@Ultramodern - You're completely right. I wanted to see if I could do this, so I thought "where can I find machine code?", and decided to just grab an executable file without thinking harder about it :/Sorn
You may have some luck compiling to web assembly.Maurinemaurise
M
11

It seems to me you're loading an ELF image and then trying to jump straight into the ELF header? http://en.wikipedia.org/wiki/Executable_and_Linkable_Format

If you're trying to execute another binary, why don't you use the process creation functions for whichever platform you're using?

Messuage answered 7/1, 2010 at 11:38 Comment(3)
I think it's because he's trying to exec an app in-memory he's allocated, I do not believe any process creation function operates like that. Thread create functions may, but he's loaing a disk file to memory and then trying to exec that memory.Autoroute
If the memory is not flagged as execute he will not be able to execute it, but he is also loading an ELF file into memory and then trying to call the ELF header, the first four bytes of which are 0x7f 'E' 'L' 'F'Messuage
Fun Fact: 0x7F is the primary opcode for JNLE. So maybe the first thing the code is trying to do is a jump to a garbage address? Either way: executing an ELF header is not going to work.Messuage
H
31

You need a page with write execute permissions. See mmap(2) and mprotect(2) if you are under unix. You shouldn't do it using malloc.

Also, read what the others said, you can only run raw machine code using your loader. If you try to run an ELF header it will probably segfault all the same.

Regarding the content of replies and downmods:

1- OP said he was trying to run machine code, so I replied on that rather than executing an executable file.

2- See why you don't mix malloc and mman functions:

#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <sys/mman.h>

int main()
{
    char *a=malloc(10);
    char *b=malloc(10);
    char *c=malloc(10);
    memset (a,'a',4095);
    memset (b,'b',4095);
    memset (c,'c',4095);
    puts (a);
    memset (c,0xc3,10); /* return */

    /* c is not alligned to page boundary so this is NOOP.
     Many implementations include a header to malloc'ed data so it's always NOOP. */
    mprotect(c,10,PROT_READ|PROT_EXEC);
    b[0]='H'; /* oops it is still writeable. If you provided an alligned
    address it would segfault */
    char *d=mmap(0,4096,PROT_READ|PROT_WRITE|PROT_EXEC,MAP_PRIVATE|MAP_ANON,-1,0);
    memset (d,0xc3,4096);
    ((void(*)(void))d)();
    ((void(*)(void))c)(); /* oops it isn't executable */
    return 0;
}

It displays exactly this behavior on Linux x86_64 other ugly behavior sure to arise on other implementations.

Houlberg answered 7/1, 2010 at 11:37 Comment(5)
I'll look into that. I had a feeling it might have been something to do with that.Sorn
That's not actually correct, you can do it with malloc, you just need to use mprotect.Autoroute
OK, if you READ his CODE, you see him LOADING a FILE, to EXECUTE. The FACT that it's a COMPILED BINARY, means it's text area is of PAGE SIZE ALIGNED ALLREADY. If he mprotect's the HEAP, then the ONLY POSSIABLE ISSUE, is the file he's LOADED to EXECUTE will have some of the .data possibly MARKED EXEC if he's not adjusted that himself. But their is NO PROBLEM with making the HEAP +x, JAVA and MONO do this all the time.Autoroute
Don't get too excited, mmap, mprotect, etc. only protect/unprotect in pages, not bytes. malloc implementations put malloc'ed data in preallocated chunks so if you change the protections in your chunk, it is likely to be appended or prepended to other malloc'ed data sharing the same page(s). If you are using mprotect the protections are going to be either (r|)w|x or r|x, in any case your r|w data in the page(s) isn't gonna like it ie. segfault or you are leaving that data available to introduce executable code.Houlberg
ya, dont worry I calmed down an all, even decided that your post helpfull after your code example. However in any case, if you see from my code, malloc works just fine +rwx, even if you add free's to all 3 of the heap allocated memory that the exmaple I show called, their is no problem or any stability issue. The only thing is that you may slightly permission some memory on the heap unintentionally as +x, but it's really not a big deal.Autoroute
A
13

Using malloc works fine.

OK this is my final answer, please note I used the orignal poster's code. I'm loading from disk, the compiled version of this code to a heap allocated area "bin", just as the orignal code did (the name is fixed not using argv, and the value 0x674 is from;

objdump -F -D foo|grep -i hoho
08048674 <hohoho> (File Offset: 0x674):

This can be looked up at run time with the BFD (Binary File Descriptor library) or something else, you can call other binaries (not just yourself) so long as they are statically linked to the same set of lib's.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/mman.h>

unsigned char *charp;
unsigned char *bin;

void hohoho()
{
   printf("merry mas\n");
   fflush(stdout);
}

int main(int argc, char **argv)
{
   int what;

   charp = malloc(10101);
   memset(charp, 0xc3, 10101);
   mprotect(charp, 10101, PROT_EXEC | PROT_READ | PROT_WRITE);

   __asm__("leal charp, %eax");
   __asm__("call (%eax)" );

   printf("am I alive?\n");

   char *more = strdup("more heap operations");
   printf("%s\n", more);

   FILE* f = fopen("foo", "rb");

   fseek(f, 0, SEEK_END);
   unsigned int len = ftell(f);
   fseek(f, 0, SEEK_SET);

   bin = (char*)malloc(len);
   printf("read in %d\n", fread(bin, 1, len, f));
   printf("%p\n", bin);

   fclose(f);
   mprotect(&bin, 10101, PROT_EXEC | PROT_READ | PROT_WRITE);

   asm volatile ("movl %0, %%eax"::"g"(bin));
   __asm__("addl $0x674, %eax");
   __asm__("call %eax" );
   fflush(stdout);

   return 0;
}

running...

co tmp # ./foo
am I alive?
more heap operations
read in 30180
0x804d910
merry mas

You can use UPX to manage the load/modify/exec of a file.

P.S. sorry for the previous broken link :|

Autoroute answered 7/1, 2010 at 11:51 Comment(9)
Note this IS cross platform and totally abstract's the details of file format specifications or any sort of requirement to play around with page protections and such.Autoroute
Pffft, Ilove getting down voted with no reasoning, get real. UPX is THE way todo this, using anything else is naieve. You can easially either use it to load exe's for you or it's lower level api's which emit dynamic assembly stubs which can load/run arbitrary memory blocks compressed or otherwise.Autoroute
Well, we don't know how he's going to get the machine code into memory. What if he's writing a bytecode interpreter and the code will be generated in memory? Loading "echo" (as incorrect as the code was) could have been a proof-of-concept that code could be generated and executed on the fly.Messuage
malloc doesn't ensure page alignment, your code may or may not work. you could use a page aligned subset of the mallocd block, which would be safe, or possibly use posix_memalign if you have itKinch
Hope you don't mind my edit, your UPX link was pointing somewhere scummyKinch
Thanks for fixing the link, and yes it's fairly sloppilly put together right now :) But it does EXACTALLY what Charlie asked for, man I hope I get an accepted answer for this one ahhaAutoroute
Now how can we do this for dynamically linked ELFs?Grave
I was googling for how JIT compilation is performed and found your answer - JIBL just in time binary loading LOL. +1 since it gives some idea.Breastbone
I don't understand - how can it be cross platform with intel assembly in there? You can run C code on a Raspberry Pi, a PS 2, a PDP-11 or even an AS400.Erbes
M
11

It seems to me you're loading an ELF image and then trying to jump straight into the ELF header? http://en.wikipedia.org/wiki/Executable_and_Linkable_Format

If you're trying to execute another binary, why don't you use the process creation functions for whichever platform you're using?

Messuage answered 7/1, 2010 at 11:38 Comment(3)
I think it's because he's trying to exec an app in-memory he's allocated, I do not believe any process creation function operates like that. Thread create functions may, but he's loaing a disk file to memory and then trying to exec that memory.Autoroute
If the memory is not flagged as execute he will not be able to execute it, but he is also loading an ELF file into memory and then trying to call the ELF header, the first four bytes of which are 0x7f 'E' 'L' 'F'Messuage
Fun Fact: 0x7F is the primary opcode for JNLE. So maybe the first thing the code is trying to do is a jump to a garbage address? Either way: executing an ELF header is not going to work.Messuage
A
3

An typical executable file has:

  • a header
  • entry code that is called before main(int, char **)

The first means that you can't generally expect byte 0 of the file to be executable; intead, the information in the header describes how to load the rest of the file in memory and where to start executing it.

The second means that when you have found the entry point, you can't expect to treat it like a C function taking arguments (int, char **). It may, perhaps, be usable as a function taking no paramters (and hence requiring nothing to be pushed prior to calling it). But you do need to populate the environment that will in turn be used by the entry code to construct the command line strings passed to main.

Doing this by hand under a given OS would go into some depth which is beyond me; but I'm sure there is a much nicer way of doing what you're trying to do. Are you trying to execute an external file as a on-off operation, or load an external binary and treat its functions as part of your program? Both are catered for by the C libraries in Unix.

Algometer answered 7/1, 2010 at 11:45 Comment(0)
F
3

It is more likely that that it is the code that is jumped to by the call through function-pointer that is causing the segfault rather than the call itself. There is no way from the code you have posted to determine that that code loaded into bin is valid. Your best bet is to use a debugger, switch to assembler view, break on the return statement and step into the function call to determine that the code you expect to run is indeed running, and that it is valid.

Note also that in order to run at all the code will need to be position independent and fully resolved.

Moreover if your processor/OS enables data execution prevention, then the attempt is probably doomed. It is at best ill-advised in any case, loading code is what the OS is for.

Flavius answered 7/1, 2010 at 12:4 Comment(1)
Ya, good on the position independent, Charlie can use -fPIC if using gcc but unfortnatly on Windows, their is no easy way to get compiled PIC C applications.Autoroute
Q
2

What you are trying to do is something akin to what interpreters do. Except that an interpreter reads a program written in an interpreted language like Python, compiles that code on the fly, puts executable code in memory and then executes it.

You may want to read more about just-in-time compilation too:

Just in time compilation
Java HotSpot JIT runtime

There are libraries available for JIT code generation such as the GNU lightning and libJIT, if you are interested. You'd have to do a lot more than just reading from file and trying to execute code, though. An example usage scenario will be:

  1. Read a program written in a scripting-language (maybe your own).
  2. Parse and compile the source into an intermediate language understood by the JIT library.
  3. Use the JIT library to generate code for this intermediate representation, for your target platform's CPU.
  4. Execute the JIT generated code.

And for executing the code you'd have to use techniques such as using mmap() to map the executable code into the process's address space, marking that page executable and jumping to that piece of memory. It's more complicated than this, but its a good start in order to understand what's going on beneath all those interpreters of scripting languages such as Python, Ruby etc.

The online version of the book "Linkers and Loaders" will give you more information about object file formats, what goes on behind the scenes when you execute a program, the roles of the linkers and loaders and so on. It's a very good read.

Quadrivial answered 7/1, 2010 at 14:17 Comment(0)
P
1

Use the operating system for loading and executing programs.

On unix, the exec calls can do this.

Your snippet in the question could be rewritten:

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

int main(int argc, char* argv[])
{
    return execv(argv[1],argv+2);
}
Pes answered 7/1, 2010 at 11:41 Comment(5)
exec does not do this, he's trying to load the app into memory manually. exec expect's a file path argument not a &memory address.Autoroute
He opens the binary using fopen and then tries to jump into it. If he'd just passed that path to exec instead... Thx for the downmod.Pes
If you clarify to me how you think exec actually does what he asked, which is "execute machiene code in memory", i'll take away any downvote on you in an instant, however it's totally not what he asked from what I can tell. Thanks for the associated down vote.Autoroute
I haven't downvoted UPX. I have added a cut-paste-change of the code in the original question.Pes
As Bruce Lee once said "My style? It's like the art of fighting without fighting." nice one.Autoroute
P
1

You can dlopen() a file, look up the symbol "main" and call it with 0, 1, 2 or 3 arguments (all of type char*) via a cast to pointer-to-function-returning-int-taking-0,1,2,or3-char*

Pretended answered 7/1, 2010 at 11:50 Comment(1)
using a method like this your probably want to lookup __libc_start_mainAutoroute
U
0

Executable files contain much more than just code. Header, code, data, more data, this stuff is separated and loaded into different areas of memory by the OS and its libraries. You can't load a program file into a single chunk of memory and expect to jump to it's first byte.

If you are trying to execute your own arbitrary code, you need to look into dynamic libraries because that is exactly what they're for.

Ultramodern answered 7/1, 2010 at 15:44 Comment(1)
Not MSDOS .COM files - they are just a binary image of the machine code - too bad they were limited to 64K...Erbes

© 2022 - 2024 — McMap. All rights reserved.