Before you down-vote or start saying that `goto`ing is evil and obsolete, please read the justification of why it is viable in this case. Before you mark it as duplicate, please read the full question.

I was reading about virtual machine interpreters, when I stumbled across computed gotos . Apparently they allow significant performance improvement of certain pieces of code. The most known example is the main VM interpreter loop.

Consider a (very) simple VM like this:

#include <iostream>

enum class Opcode
{
    HALT,
    INC,
    DEC,
    BIT_LEFT,
    BIT_RIGHT,
    RET
};

int main()
{
    Opcode program[] = { // an example program that returns 10
        Opcode::INC,
        Opcode::BIT_LEFT,
        Opcode::BIT_LEFT,
        Opcode::BIT_LEFT,
        Opcode::INC,
        Opcode::INC,
        Opcode::RET
    };
    
    int result = 0;

    for (Opcode instruction : program)
    {
        switch (instruction)
        {
        case Opcode::HALT:
            break;
        case Opcode::INC:
            ++result;
            break;
        case Opcode::DEC:
            --result;
            break;
        case Opcode::BIT_LEFT:
            result <<= 1;
            break;
        case Opcode::BIT_RIGHT:
            result >>= 1;
            break;
        case Opcode::RET:
            std::cout << result;
            return 0;
        }
    }
}

All this VM can do is a few simple operations on one number of type int and print it. In spite of its doubtable usefullness, it illustrates the subject nonetheless.

The critical part of the VM is obviously the switch statement in the for loop. Its performance is determined by many factors, of which the most inportant ones are most certainly branch prediction and the action of jumping to the appropriate point of execution (the case labels).

There is room for optimization here. In order to speed up the execution of this loop, one might use, so called, computed gotos.

Computed Gotos

Computed gotos are a construct well known to Fortran programmers and those using a certain (non-standard) GCC extension. I do not endorse the use of any non-standard, implementation-defined, and (obviously) undefined behavior. However to illustrate the concept in question, I will use the syntax of the mentioned GCC extension.

In standard C++ we are allowed to define labels that can later be jumped to by a goto statement:

goto some_label;

some_label:
    do_something();

Doing this isn't considered good code (and for a good reason!). Although there are good arguments against using goto (of which most are related to code maintainability) there is an application for this abominated feature. It is the improvement of performance.

Using a goto statement can be faster than a function invocation. This is because the amount of "paperwork", like setting up the stack and returning a value, that has to be done when invoking a function. Meanwhile a goto can sometimes be converted into a single jmp assembly instruction.

To exploit the full potential of goto an extension to the GCC compiler was made that allows goto to be more dynamic. That is, the label to jump to can be determined at run-time.

This extension allows one to obtain a label pointer, similar to a function pointer and gotoing to it:

    void* label_ptr = &&some_label;
    goto (*label_ptr);

some_label:
    do_something();

This is an interesting concept that allows us to further enhance our simple VM. Instead of using a switch statement we will use an array of label pointers (a so called jump table) and than goto to the appropriate one (the opcode will be used to index the array):

// [Courtesy of Eli Bendersky][4]
// This code is licensed with the [Unlicense][5]

int interp_cgoto(unsigned char* code, int initval) {
    /* The indices of labels in the dispatch_table are the relevant opcodes
    */
    static void* dispatch_table[] = {
        &&do_halt, &&do_inc, &&do_dec, &&do_mul2,
        &&do_div2, &&do_add7, &&do_neg};
    #define DISPATCH() goto *dispatch_table[code[pc++]]

    int pc = 0;
    int val = initval;

    DISPATCH();
    while (1) {
        do_halt:
            return val;
        do_inc:
            val++;
            DISPATCH();
        do_dec:
            val--;
            DISPATCH();
        do_mul2:
            val *= 2;
            DISPATCH();
        do_div2:
            val /= 2;
            DISPATCH();
        do_add7:
            val += 7;
            DISPATCH();
        do_neg:
            val = -val;
            DISPATCH();
    }
}

This version is about 25% faster than the one that uses a switch (the one on the linked blog post, not the one above). This is because there is only one jump performed after each operation, instead of two.

Control flow with switch: For example, if we wanted to execute Opcode::FOO and then Opcode::SOMETHING, it would look like this: As you can see, there are two jumps being performed after an instruction is executed. The first one is back to the switch code and the second is to the actual instruction.

In contrary, if we would go with an array of label pointers (as a reminder, they are non-standard), we would have only one jump:

It is worthwhile to note that in addition to saving cycles by doing less operations, we also enhance the quality of branch prediction by eliminating the additional jump.

Now, we know that by using an array of label pointers instead of a switch we can improve the performance of our VM significantly (by about 20%). I figured that maybe this could have some other applications too.

I came to the conclusion that this technique could be used in any program that has a loop in which it sequentially indirectly dispatches some logic. A simple example of this (apart from the VM) could be invoking a virtual method on every element of a container of polymorphic objects:

std::vector<Base*> objects;
objects = get_objects();
for (auto object : objects)
{
    object->foo();
}

Now, this has much more applications.

There is one problem though: There is nothing such as label pointers in standard C++. As such, the question is: Is there a way to simulate the behaviour of computed gotos in standard C++ that can match them in performance?.

Edit 1:

There is yet another down side to using the switch. I was reminded of it by user1937198. It is bound checking. In short, it checks if the value of the variable inside of the switch matches any of the cases. It adds redundant branching (this check is mandated by the standard).

Edit 2:

In response to cmaster, I will clarify what is my idea on reducing overhead of virtual function calls. A dirty approach to this would be to have an id in each derived instance representing its type, that would be used to index the jump table (label pointer array). The problem is that:

There are no jump tables is standard C++
It would require as to modify all jump tables when a new derived class is added.

I would be thankful, if someone came up with some type of template magic (or a macro as a last resort), that would allow to write it to be more clean, extensible and automated, like this:

#include <iostream> enum class Opcode { HALT, INC, DEC, BIT_LEFT, BIT_RIGHT, RET }; int run(Opcode* program) { int result = 0; for (int i = 0; true;i++) { auto instruction = program[i]; switch (instruction) { case Opcode::HALT: break; case Opcode::INC: ++result; break; case Opcode::DEC: --result; break; case Opcode::BIT_LEFT: result <<= 1; break; case Opcode::BIT_RIGHT: result >>= 1; break; case Opcode::RET: std::cout << result; return 0; default: __assume(0); } } }

Before you down-vote or start saying that `goto`ing is evil and obsolete, please read the justification of why it is viable in this case. Before you mark it as duplicate, please read the full question.

Computed Gotos

Edit 1:

Edit 2:

Recommended topics

Hot tags

Before you down-vote or start saying that gotoing is evil and obsolete, please read the justification of why it is viable in this case. Before you mark it as duplicate, please read the full question.

Computed Gotos

Edit 1:

Edit 2:

Recommended topics

Hot tags

Before you down-vote or start saying that `goto`ing is evil and obsolete, please read the justification of why it is viable in this case. Before you mark it as duplicate, please read the full question.