How could I generate and execute machine code at runtime?
Asked Answered
T

3

5

The closest I have gotten to assembly is building my own Java Class library which loads class files and allows you to create, compile, and decompile classes. While endeavoring this project, I wondered how the Java Virtual Machine actually generated native machine code at runtime during JIT optimizations.

It got me thinking: how could one generate machine code and execute it at runtime with assembly, and as a bonus, without a JIT compiler library, or "manually"?

Trant answered 6/4, 2017 at 12:19 Comment(8)
If it's emulating functions, a quick search shows this very nice lecture, couldn't explain it better myself: here.Torsibility
Put the instruction into executable memory, append a RET and call it using a function pointer. And hope it won't blow up in your face. PS: x86 instructions are variable length, up to 15 bytes.Simulator
If your "dynamic environment" must be x86 machine code based, then you can check dosbox sources to see how they emulate the x86 environment, they have different modes, the basic one runs all instruction in emulator-way (no executing them natively, but emulating them), but they also have some dynamic mode which is compiling small parts of the emulated code into native code, and executing that natively, if I understood it correctly... Mind you this is far from trivial. Unless you have very good reasons and enough budget, don't go there, create interpreter of your lang, avoid "based-on-x86".Anticipatory
You may however consider using some JIT compilation library (such as libgccjit or asmjit or others) or write your own. You should edit your question to motivate it much more. It smells badly as some XY problemJulide
Inline assembler support in the language only helps you if you have the asm instructions at compile time. Unless you're using an interpreted language, in which case inline asm support is unlikely. With Java JNI, you call a whole function which has to exist as machine code, not text assembly language.Hinkle
I don't understand the downvotes, that's a good question actually.Scrannel
@BasileStarynkevitch to answer an old question, most of my questions are purely philosophical, or are questions about an approach I hope to take in the future for some future project or idea that I have. As I stated in the question, "while endeavoring this project, I wondered how the JVM actually generated native machine code at runtime during JIT optimizations." Instead of targeting how the JVM's JIT compiler optimizes bytecode, I made the question more generalized to, "How can one dynamically create machine code and execute it at runtime [with assembly]?" This isn't a JVM-only question.Trant
Related: #4912493Totem
J
12

Your question changed substantially (in july 2017). The initial variant referred to the EX (execute) instruction of IBM mainframes.

how could one generate machine code and execute it at runtime with assembly...?

In practice, you would use some JIT compilation library, and there are many of them. Or you would use some dynamic loader. At the lowest level, they all write some byte sequences representing valid machine code—a sequence of many machine instructions—in a memory segment (of your virtual address space) which has to be made executable (read about the NX bit), and then some of your code would jump indirectly to that address or more often call it indirectly—that is call through a function pointer. Most JVM implementations use JIT compilation techniques.

...and as a bonus, without a JIT compiler library, or "manually"?

Supposing you have some valid machine code for the processor architecture that your program is currently executing on, for example, you could get a memory segment (e.g. mmap(2) on Linux), and then make it executable (e.g. mprotect(2)). Most other operating systems provide similar system calls.


If you use a JIT compilation library like asmjit or libjit or libgccjit or LLVM or many others, you first construct in memory a representation (similar to some abstract syntax tree) of the code to be generated, then ask the JIT library to emit machine code for it. You could even write your own JIT compilation code, but it is a lot of work (you need to understand all the details of your instruction set, e.g. x86 for PCs). By the way, generating fast-running machine code is really difficult, because you need to optimize like compilers do (and to care about details like instruction scheduling, register allocation, etc... see also this), and that is why using an existing JIT compilation library (like libgccjit or LLVM) is preferable (a contrario, simpler JIT libraries like asmjit or libjit or GNU lightning don't optimize much and generate poor machine code).

If you use a dynamic loader (e.g. dlopen(3) on POSIX) you would use some external compiler to produce a shared library (that is a plugin) and then you ask the dynamic linker to load it in your process (and handle appropriate relocations) and get by name (using dlsym(3)) some function addresses from it.

Some language implementations (notably SBCL for Common Lisp) are able to emit on the fly some good machine code at every REPL interaction. In essence their runtime embark a full compiler (containing a JIT compilation part).

A trick I often use on Linux is to emit some C (or C++) code at runtime in some temporary file (that is compiling some domain specific language to C or to C++), fork a compilation of it as a plugin, and dynamically load it. With current (laptops, desktops, servers) computers it is fast enough to stay compatible with an interactive loop.

Read also about eval (in particular the famous SICP book), metaprogramming, multistage programming, self-modifying code, continuations, compilers (the Dragon Book), Scott's Programming Language Pragmatics, and J.Pitrat's blog.

Julide answered 10/7, 2017 at 4:30 Comment(5)
Why is machine code optimization so terribly necessary? Why aren't computers made with a very basic instruction set allowing me to do everything I want, without all of these different variants of the same instruction? Theoretically, I could make a machine using just one instruction: mov. Make it address-oriented (rather than stack-oriented); define special addresses for ALU and FP registers, and viola!Trant
That is another question, and you need to read several books (about compilers and about computer architectures) to understand why.Julide
well, I realize that CPUs are basically just fixed, multi-stage micro-machines (with micro-code) and that the instructions are just interpreted to this fixed sequence. They have an instruction pipeline and (unnecessarily) have branch prediction for conditional jumps. The real question is why everything is so unnecessarily complex when it could be simpler? "Make it as simple as possible, but not simpler"— EinsteinTrant
There are both technical and economical reasons for that. Remember that a desktop processor requires billions of dollars of investment (both R&D to design it, and chip plants to manufacture it).Julide
Certainly, I am aware of that, but things could be different. It would seem the means of optimization lies in optimizing memory storage. A cache miss is quite expensive. Grabbing data from the L2 cache is slower than L1, and RAM is slower than L2. Why not just make equidistant cache pockets that surround the CPU in 3D space? Then there's the nuances of one instruction over another, plus architecture-specific instructions that have ever so slightly less overhead than the next of kin instruction. This market seems very dirty and tedious.Trant
T
6

In the comments I gave you a link to a file explaining things thoroughly.

Most Assembly languages have a subroutine (the assembly word for function as far as your googling is concerned) implementation as two commands call and ret - maybe something similar.

The implementation is nearly the same as a jump, excepts call stores in the stack the address of the next command, and ret pops it - that's why it's very important to maintain a balanced stack in the subroutine. Since you don't want to mess with registers which may contain important stuff/are limited, this is where you keep all your local variables, and hence balancing is an issue. You could of course do this yourself with jump and some pushing and popping.

As far as "arguments" are concerned, a simple method is using registers. This is a problem if you need to pass more arguments than there are registers. A more robust method is pushing the arguments before the call. This is what many real 32-bit calling-conventions do. An example from the link I provided for a subroutine adding 3 numbers:

# Save old EBP
pushl %ebp
# Change EBP
movl %esp, %ebp
# Save caller-save registers if necessary
pushl %ebx
pushl %esi
pushl %edi
# Allocate space for local variable
subl $4, %esp
# Perform the addition
movl 8(%ebp), %eax
addl 12(%ebp), %eax
addl 16(%ebp), %eax
movl %eax, -16(%ebp)
# Copy the return value to EAX
movl -16(%ebp), %eax
# Restore callee-save registers if necessary
movl -12(%ebp), %edi
movl -8(%ebp), %esi
movl -4(%ebp), %ebx
# Restore ESP
movl %ebp, %esp
# Restore EBP
popl %ebp
# Return to calling
ret

Calling the subroutine:

# Save caller-save registers if necessary
pushl %eax
pushl %ecx
pushl %edx
# Push parameters
pushl $5
pushl $4
pushl $3
# Call add3
call add3
# Pop parameters
addl %12, %esp
# Save return value
movl %eax, wherever
# Restore caller-save registers if necessary
popl %edx
popl %ecx
popl %eax
# Proceed!

As you can see you need more work here then high languages. The pdf contains a detailed explanation includes how the stack works, but note that:

  1. You need to define how to handler register usage. In this example both the caller and the subroutine save the registers, just in case - you can of course simplify.
  2. Arguments and local variables are addressed relative to the stack pointer, locals positive, arguments negative.
  3. If this is a small thing you're making for yourself you can skip all this stack playing and just set aside registers for argument and return value transferring, maybe to practice before you go to more advance stuff.
Torsibility answered 6/4, 2017 at 13:25 Comment(3)
This is a problem if you're nesting functions. Not really. Spilling some of your input args to the stack before calling another function is no different from spilling any other variables you have live in registers. The Windows and Linux x86-64 ABIs pass the first few args in registers. Windows even does that for 32-bit x86 in some calling conventions.Hinkle
@PeterCordes not sure I follow, or maybe my post wasn't well written - if you have more than 8 or whatever parameters, the registers are not enough. For people just learning to work with assembly this is an important point to understand - you have to properly micromanage the memory, and the stack offers a methodological way to do that - the example you give is the big leagues, not hello world. Does that makes sense? Assuming you want a single consistent methodology, pure stack is easiest to learn (IMO).Torsibility
Yes, limited number of registers and having to fall back to passing on the stack is a problem, but not because of nesting. It's extra complexity for leaf functions, too, which have to look in different places depending on which arg it is. Totally agree that register arg passing is more complex for a beginner when you consider having more args than regs.Hinkle
L
4

To execute a piece of x86 machine, use the jmp instruction to jump to its beginning. Note that the CPU doesn't know where the code ends so you have to make manual arrangements. A better way is to use call to call that machine code and then return with a ret instruction somewhere in the code.

There is no direct way to execute just a single instruction as that is usually pretty pointless. I'm not sure what you are trying to achieve.

Lamplighter answered 6/4, 2017 at 12:41 Comment(4)
In order to call multiple instructions in succession to create a dynamic runtime environment, I have to know how to call one instruction to call many instructions.Trant
Note that your environment and the code you execute use the same register file, so unless you do some complicated tricks, the code that executes your single instruction destroys most or all of its effects.Lamplighter
I recommend you to design a byte code and write an interpreter for that, that's probably much simpler to get right.Lamplighter
The EX instruction is sort of self modifying code in that it uses a source register to modify the target instruction as it is being executed. One of the examples is similar to REP MOVS as the byte count is in a register, but the others are more like having a ModRM-byte in a source register. Big sledge-hammer kind of tool!Wooden

© 2022 - 2024 — McMap. All rights reserved.