Is it possible to load a function into some allocated memory and run it from there?

Asked 25/8, 2010 at 20:17 Answered 25/8, 2010 at 20:52

I'm messing around with some interprocess communication stuff and I am curious if it's possible to copy a function into some shared memory and run it from there from either process.

Something like:

memcpy(shared_memory_address, &func, &func + sizeof(func));

I realize you can't take the size of the function but that was what popped into my head.

Piracy answered 25/8, 2010 at 20:17 Comment(10)

You can probably hack your way into doing this on some systems (although any solution would be very non-portable), but the idea for using this for IPC is terrifying. – Knighten 25/8, 2010 at 20:21

I suspect (and hope) not. There are all kinds of checks to prevent you from executing arbitrary blocks of memory. – Feldt 25/8, 2010 at 20:26

You can't get the size of a function, but you can get the distance between the start of the function and the start of the next function. Not that I'd advocate doing that. – Stat 25/8, 2010 at 20:34

@Mark: How does one get the start of the next function? The next function may be in another translation unit. – Boast 25/8, 2010 at 20:42

What are you really trying to do here? Even it if were possible to do this on particular architectures there are extremely good chances there's a much easier way to solve your underlying problem. If the question is purely theoretical, please consider stating that in the question preamble. – Sulfapyrazine 25/8, 2010 at 20:48

@Thomas, if you really need to know you can place a dummy function at the end of the function you need to copy. I never said it wouldn't be ugly... – Stat 25/8, 2010 at 20:48

Why not create a DLL? The function will be shared, and you can control if the data is local or shared through #pragma's in VS. – Overbalance 25/8, 2010 at 21:2

@meagar: There is protection against accidentally interpreting data as executable code. But once you explicitly tell the OS that you intend this to be code, there's no problem. – Barroom 25/8, 2010 at 23:37

@Mark Ransom: "you can place a dummy function at the end of the function" and why do you think that dummy function won't be removed during optimization and that all functions will be placed in exe in same order as in source file? – Quagga 26/8, 2010 at 0:0

@SigTerm, if you take the address of the function then the compiler can't very well optimize it away, can it? Of course it's free to reorder the functions however it wants, but the simplest result is the most likely. Trial and error can verify this easily for a single compiler. Again, I'd like to make the disclaimer that this is not a very good way to code, only that it's not impossible. – Stat 26/8, 2010 at 2:39

Theoretically, as functions are just sequence of byte code somewhere in the memory, you could copy the memory block of the function and call (jump into) it. Though c++ Abstracts that possibility away, as you noticed, we cannot actually know the size of function (although we can get pointer to it).

Still, there's libraries. For example, you could tell remote executable to load specific function from dynamic library and execute it. Check wikipedia-article for the references.

Thaddeusthaddus answered 25/8, 2010 at 20:31 Comment(0)

That was fun.
But it seems like you can. Though I would NEVER do this:

Compiled on lenovo:T61p running Windows 7: using g++ 4.3.4

I would note that some types of hardware will prevent this as you can only execute code from specific memory area (the program area) which is marked in the hardware memory map file as read only (to prevent self modifying code).

Note also that the type of function is very limited:

In this example func() does very a little and therefore works.
But if you do any of the following it will not be portable to another processes:

Call a function or method.
Pass a pointer (or reference)
- No object that contains a pointer or a reference will work either.
Use globals.
You could pass a method pointer:
- But object it is used on must be passed by value.

None of the above work because the address space of one process bares no resemblance to the address space of another processes (As it is mapped at the hardware level to physical memory).

Silly Example

#include <vector>
#include <iostream>
#include <string.h>

int func(int x)
{
    return x+1;
}

typedef int (*FUNC)(int);


int main()
{
    std::vector<char>   buffer(5000);

    ::memcpy(&buffer[0],reinterpret_cast<char*>(&func),5000);

    FUNC func   = reinterpret_cast<FUNC>(&buffer[0]);

    int result  = (*func)(5);

    std::cout << result << std::endl;

}

Esteresterase answered 25/8, 2010 at 20:52 Comment(5)

You're not sharing between processes though. – Hegemony 25/8, 2010 at 20:54

It is fun, but also note that code inside the buffer needs to be relinked to functions outside. Particularly with parallel-process jump tables, I think the fun would wear off. – Oho 25/8, 2010 at 21:9

What happens if the function is over 5000? How did you determine this number? – Boast 25/8, 2010 at 22:5

@Paul Nathan: @ Potatoswatter: @Thomas Matthews: Yes it is fun but only that. It is not practical to use in a real application. And in real life the potential of passing it to another processes and having it work are negligible (any function that does anything slightly more complex than expression manipulation is going to break). – Esteresterase 25/8, 2010 at 23:34

I like this portable example because you can try it on different operating systems and hardware to see what sort execution prevention mechanisms are enabled. Linux x86-64, for example, segfaults when invoking the function because the heap is non-executable. – Davedaveda 27/8, 2010 at 12:44

Still, there's libraries. For example, you could tell remote executable to load specific function from dynamic library and execute it. Check wikipedia-article for the references.

Thaddeusthaddus answered 25/8, 2010 at 20:31 Comment(0)

Last time I tried this, I ran into a road block: determining the number of bytes in the function. The task would be to use the address of the function, copy the bytes into memory (provided the code is compiled as Position Independent Code, PIC).

A more platform independent method is to review your compiler documentation to see if there is a #pragma, compiler option, or keyword that allows you to specify the function's address or segment to load at during load time.

Also, search the Embedded Systems groups, as this is a popular technique: Load code that programs a Flash Memory into RAM, execute the function in RAM, then reset the system.

Hope that helps.

Edit:
A suggestion: create a data or code segment using either an assembly language file or instructions to the linker (in the build script). Put your function into a separate code file. Tell the compiler and linker to compile this function into the new code segment. There may be compiler specific statements to get the starting address and size of a segment. Also, the OS may be able to load a segment at a given address for you.

Also look into DLLs or Shared Libraries which can be loaded during run-time, with the help of the OS.

Boast answered 25/8, 2010 at 20:36 Comment(0)

If you attempt such a thing, you may run into problems running code from memory which isn't supposed to contain executable code. See this Wikipedia article for more information: http://en.wikipedia.org/wiki/Executable_space_protection

Stat answered 25/8, 2010 at 20:36 Comment(0)

Yes. A similar technique is used by Just-In-Time code generators such as the Java VM. In fact you could almost say that the operating system's runtime loader and linker is doing this for you as it loads dynamic libraries into your process.

You do have to request executable memory from the operating system, though. And the code you are jumping into has to be written in a way that allows it to be located anywhere in memory (position independent).

Davedaveda answered 25/8, 2010 at 20:38 Comment(0)

If you generate code bytes and inject it into the process, thats called Run-time code generation (RTCG). You can look up some examples.

Modern kernels would prevent this to work from a non-privileged level, so you have to enter the correct mode or ring first. In order to find the code size, you have (of course) to count the bytes of the function it's code segment until the last return code.

Afaik graphics drivers sometimes used RTCG when creating code for raster ops on the fly (problem dependend).

Malcah answered 25/8, 2010 at 20:45 Comment(0)

-1

You can reasonably assume that is flatly impossible on Linux, Windows, or the more sophisticated embedded operating systems.

But if you are not operating with such pesky restrictions, you can patch in some guard bytes in your assembly that denote begin/end of functions and use those to help you copy stuff out to your shared memory (using assembly of course), then publish a list of procedure addresses to any interested process (also accessing/running using assembly).

Of course, there is a well-defined mechanism for providing libraries of code for multiple processes, the dynamic library system Linux and Windows provides. Probably not as flexible as you'd like though. :-)

Hegemony answered 25/8, 2010 at 20:40 Comment(1)

Difficult, but not impossible. OSes which have no-execute enforcement on data pages also provide the APIs to remark pages as executable, i.e. VirtualProtect on Windows. – Barroom 25/8, 2010 at 23:36

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Silly Example

Recommended topics

Hot tags