C++ and FULLY dynamic functions
Asked Answered
P

1

6

I have a problem with detours. Detours, as you all know, can only move among 5 bytes of space (i.e a 'jmp' call and a 4 byte address). Because of this it is impossible to have the 'hook' function in a class (a method), you cannot supply the 'this' pointer because there is simply not enough space (here's the problem more thoroughly explained). So I've been brainstorming all day for a solution, and now I want your thoughts on the subject so I don't begin a 3-5 day project without knowing if it would be possible or not.

I had 3 goals initially, I wanted the 'hook' functions to be class methods, I wanted the whole approach to be object-oriented (no static functions or global objects) and, the worst/hardest part, to be completely dynamic. This is my (in theory) solution; with assembly one can modify functions at runtime (a perfect example is any detouring method). So since I can modify functions dynamically, shouldn't I also be able to create them dynamically? For example; I allocate memory for, let's say ~30 bytes (through malloc/new). Wouldn't it be possible to just replace all bytes with binary numbers corresponding to different assembly operators (like 0xE9 is 'jmp') and then call the address directly (since it would contain a function)?

NOTE: I know on beforehand the return value, and all the arguments to all functions that I want to detour, and since I'm using GCC, the thiscall convention is practically identical to the _cdecl one.

So this is my thought/soon-to-be implementation; I create a 'Function' class. This constructor takes a variadic amount of arguments (except the first argument, which describes the return value of the target function).

Each argument is a description of the arguments the hook will receive (the size, and whether it is a pointer or not). So let's say I want to create a Function class for a int * RandomClass::IntCheckNum(short arg1);. Then I would just have to do like this:Function func(Type(4, true), Type(4, true), Type(2, false));. Where 'Type' is defined as Type(uint size, bool pointer). Then through assembly I could dynamically create the function (note: this would all be using _cdecl calling convention) since I can calculate the number of arguments and total size.

EDIT: With the example, Type(4, true) is the return value (int*), the scondType(4, true) is the RandomClass 'this' pointer and Type(2, false) describes the first argument (short arg1).

With this implementation I could easily have class methods as callbacks, but it would require an extensive amount of assembly code (which I'm not even especially experienced at). In the end, the only non-dynamic thing would be the methods in my callback class (which also would require pre and post callbacks).

So I wanted to know; is this possible? How much work would it require, and am I way over my head here?

EDIT: I'm sorry if I presented everything a bit fuzzy, but if there is something you want more thoroughly explained, do ask!

EDIT2: I'd also like to know, if I can find the hex values for all assembly operators somewhere? A list would help a ton! And/or if it is possible to somehow 'save' the asm(""); code at a memory address (which I highly doubt).

Pantomime answered 4/5, 2012 at 15:17 Comment(7)
Why use detours at all? Can’t you use a pure C++ solution such as std::function or am I missing something?Siamang
Not like I could help you just to clarify things. You want rewritable function in a class?(I mean you can change them at runtime) If that is so I think it(when done) could possibly open giant opportunities for AI programming in c++. +1Fulani
@Fulani This is known as genetic programming and doesn’t actually need rewritable functions at all.Siamang
The problem is, if you've read my link, I'm stuck with a closed API. To get access to the API I need to use reverse-engineering, and this is where detours join the picture. But as mentioned in my post, it's not in the nature of detours to have member methods as callbacks because there is not enough memory. So that means I would have to use static callback functions/global objects, UNLESS I would be able to create dynamic functions, just like I explained in my post, which would enable fully dynamic and object-oriented hooks!Pantomime
@KonradRudolph Perhaps I could but I would have to use the interface through assembly (since that would be the only way to control a dynamic function), and I don't even want to think about the amount of assembly code that would require.Pantomime
@Dennis Perhaps. I'm not quite sure, since I haven't even figured it all out myself (that's why I'm here :) ). No matter what I would have create a JMP to a boost::function of my choice (the hook), but I do not see how this would possible without a gigantic amount of work (from my point of view, even this approach would require a bit of ASM). I just don't know it would be possible to have a fully dynamic solution with this approach? Though I can just be plain wrong, because this is all a completely new subject for...Pantomime
You can have a look at the implementation of closures in libffi.Selfregard
V
4

What you describe is usually called "thunking", and is quite commonly implemented. Historically, the most common purpose has been mapping between 16-bit and 32-bit code (by autogenerating a new 32-bit function that calls an existing 16-bit one or vice versa). I believe some C++ compilers generate similar functions to adjust base class pointers to subclass pointers in multiple inheritance, also.

It certainly seems like a viable solution to your problem, and I don't foresee any huge issues. Just make sure you allocate the memory with any flags needed in your operating system to make sure the memory is executable (most modern OSs give out non-executable memory by default).

You may find this link helpful, particularly if working in Win32: http://www.codeproject.com/Articles/16785/Thunking-in-Win32-Simplifying-Callbacks-to-Non-sta

Regarding finding the hex values of assembly operations, the best reference I know of is the Appendix to the manual of the NASM assembler (and I don't just say that because I helped write it). There's a copy available here: http://www.posix.nl/linuxassembly/nasmdochtml/nasmdoca.html

Vogel answered 4/5, 2012 at 15:36 Comment(3)
Wow great links! It was really interesting reading about the thunking process (too bad it was Win32 though). Now excuse me if I sound dumb, but as mentioned earlier,I'm not especially experienced with assembly (I only know a bit AT&T syntax) so I had to ask about the NASM assembler you referenced to. I have 2 questions; do all ASM operators only use 1 byte? And secondly, since there are some many different values specified for each operator, which one am I interested in? I guess it depends on the size of my variables;but for 'push' there are 13 different values, how do I know which one I want?Pantomime
These are different variants for different types of push instruction (kinds of register, immediate values, indirect memory references). The top of the guide has a description of all the different modes, so use that to work out which one you want, then just look down the list to find the instruction format you need. Say you want to push EBX: that's a reg32, so you want the second variant, which is "o32 50+r". o32 is an operand size prefix, which is ignored if you're running in 32-bit code; 50+r is 50 hex plus the code for the register (3, they're listed at the top), so 53h is your code.Vogel
In answer to your first question, no there are instructions that are more than one byte long, and some instructions vary in size depending on context (see the PUSH example above: the 'o32' prefix doesn't generate any code in 32 bit mode, however if you're producing 16 bit code, it would be an additional 66h byte that appears at the start of the instruction). However all of the most common instructions are single byte.Vogel

© 2022 - 2024 — McMap. All rights reserved.