How does JIT compilation actually execute the machine code at runtime?

Asked 23/12, 2014 at 20:3 Answered 12/10, 2018 at 7:39

Solved compiler-construction compilation llvm jit machine-code

I understand the gist of how JIT compilation works (after reading such resources as this SO question). However, I am still wondering how does it actually execute the machine code at runtime?

I don't have a deep background in operating systems or compiler optimizations, and haven't done anything with machine code directly, but am starting to explore it. I have started playing around in assembly, and see how something like NASM can take your assembly code and compile it to machine code (the executable), and then you can "invoke" it from the command line like ./my-executable.

But how is a JIT compiler actually doing that at runtime? Is it like streaming machine code into stdin or something, or how does it work? If you could provide an example or some pseudocode of how some assembly (or something along those lines, not as high level as C though) might look to demonstrate the basic flow, that would be amazing too.

Pregnancy answered 23/12, 2014 at 20:3 Comment(1)

The jitter only generates machine code, it doesn't execute it. That's the job of the processor. – Nipissing 23/12, 2014 at 20:8

You mentioned that you played around with assembly so you have some idea how that works, good. Imagine that you write code that allocates a buffer (ex: at address 0x75612d39). Then your code saves the assembly ops to that buffer to pop a number from the stack, the assembly to call a print function to print that number, then the assembly to "return". Then you push the number 3 onto the stack, and call/jump to address 0x75612d39. The processor will obey the instructions to print your numbers, then return to your code again, and continue. At the assembly level it's actually pretty straightforward.

I don't know any "real" assembly languages, but here's a "sample" cobbled together from a bytecode I know. This machine has 2 byte pointers, the string %s is located at address 6a, and the function printf is located at address 1388.

void myfunc(int a) {
    printf("%s", a);
}

The assembly for this function would look like this:

OP Params OpName     Description
13 82 6a  PushString 82 means string, 6a is the address of "%s"
                     So this function pushes a pointer to "%s" on the stack.
13 83 00  PushInt    83 means integer, 00 means the one on the top of the stack.
                     So this function gets the integer at the top of the stack,
                     And pushes it on the stack again
17 13 88 Call        1388 is printf, so this calls the printf function
03 02    Pop         This pops the two things we pushed back off the stack
02       Return      This returns to the calling code.

So when your JITTER reads in the void myfunc(int a) {printf("%s", a);}, it allocate memory for this function (ex: at address 0x75612d39), and store these bytes in that memory: 13 82 6a 13 83 00 17 13 88 03 02 02. Then, to call that function, it simply jumps/calls the function at address 0x75612d39.

Grozny answered 23/12, 2014 at 20:9 Comment(4)

Could you show a basic example (maybe along the lines of an assembly snippet, just updated the original question)? I am sorta starting to see what you're describing, but don't yet get how the assembly might look, and how the processor plays into it. – Pregnancy 23/12, 2014 at 20:14

@LancePollard: I haven't actually messed with any assembly in ages, and even then its unlikely we knew the same one. I put in a "sample" from an bytecode I'm familiar with though. – Grozny 23/12, 2014 at 20:38

Just curious about what bytecode is it? Is it developed for learning/teaching purpose? – Dim 24/12, 2014 at 2:22

The bytecode is an unnamed proprietary code at my company, for use in localization. I cant really give any details. – Grozny 24/12, 2014 at 9:50

When code is executed, it all boils down to the code being loaded into a known part of memory, and the program counter being set to the start of the code, either by a direct register setting, or a jmp instruction, or similar. So what the JIT compiler will do is build the machine code in a known part of memory, and then execute from there.

Decani answered 23/12, 2014 at 20:6 Comment(0)

I'll try to ellaborate more on @MooingDuck answer. Let's take a c# example of hello world code.

namespace Hello
{
    class Program
    {
        static void Main(string[] args)
        {
            Console.WriteLine("Hello, world!");
        }
    }
}

The equivalent assembly code is something like:

    mov     edx,len                             ;message length
    mov     ecx,msg                             ;message to write
    mov     ebx,1                               ;file descriptor (stdout)
    mov     eax,4                               ;system call number (sys_write)
    int     0x80                                ;call kernel

    mov     eax,1                               ;system call number (sys_exit)
    int     0x80                                ;call kernel


msg     db  'Hello, world!',0xa                 ;our dear string
len     equ $ - msg                             ;length of our dear string

(This code was taken from here).

Each of these instructions, and obiously the data itself, can be represented as numbers. Now, I can just put those numbers inside a buffer, tell the CPU to get to the buffer's position in memory and start executing the code. right?

Not so fast.

As you can see in this SO question, it doesn't work, until you map the memory as executable. Now you can cast is as a function, and "call" this memory. it will run.

To summarize, as far as I understand, this is more or less how the JITTER works:

Reads the IL
compiles it (i.e., determines which op codes will do the job)
allocates memory for them and map them as executable code
calls this memory as a function (by cast or anything else)

Bipetalous answered 12/10, 2018 at 7:39 Comment(0)

Recommended topics

Hot tags