Compile time generated function dispatcher with minimal overhead
Asked Answered
S

5

8

I'm trying to implement a fast function dispatcher using compile time generated arrays to being able to use it at runtime in O(1).

Some lines of code just to clarify:

template<int i>
void f()
  {
  // do stuff 
  }

// specialized for every managed integer 
template<>
void f<1>
{
// do stuff
}

Dispatcher<1,5,100,300> dispatcher;  
dispatcher.execute(5); // this should call f<5>()

Let's call N the number of inputs of the dispatcher (4 in this case) and M the maximum value of the dispatcher input (300) in this case.

I've been able to make it work creating an array with size equal to M. This exploits the fact that at runtime you can do something like:

dispatcher.execute(5) -> internalArray[5]();

This works of course, but it is not feasible for arrays of big dimensions.

The best thing would be to generate an array of N elements only, and do some math trick to transform the input index into the index of the second array.

In the example, something that translates 1,5,100,300 respectively into 0,1,2,3. I have been able to do a kind of pre-processing method to transform them, but I'm looking for a way to avoid this step.

In other words I think I'm looking for some kind of minimal perfect hashing that can be used at compile time for my specific case in a very efficient way (ideally without any overhead, something like: goto: MyInstruction).

I'm not looking for alternatives that use virtual functions, std::map or complex operations.

Please ask if there is something is not clear.

PS I'm using C++11 but any idea is welcome

[Edit] I'm aware of the labels as values language extension of GCC. With those I would be maybe able to achieve my goal, but need a portable solution.

Serval answered 14/11, 2018 at 10:19 Comment(8)
What happens if you call dispatcher.execute(4) ?Australorp
If the dispatcher is supposed to be compile-time, then it means that .execute() argument is also supposed to be compile time, when why not just make it a template argument? then execute<N>() would call f<N>, though I still don't understand why would you want a dispatcher in this particular case instead of direct call. If you want execute to accept run-time arguments then it's not clear what you want by dispatch.Myles
does the execute method need to have runtime parameter? Or will it always be compile time?Concettaconcettina
And you can still create Perfect_hash_function.Australorp
@Australorp Either there's an input validation check that calls a default function fo values bigger than N or it assumes there are no bigger values (client is responsible for that).Serval
For compile time I meant code generated at compile time to avoid to manually write switch. It must accept runtime values.Serval
Have you actually measured the performance of std::unordered_map? It is expressly designed to solve problems like this efficiently.Cilla
@PaulSanders yep, but I'm interested mainly in the worst case scenario, where the unordered_map should perform poorly.Serval
C
7

Well, I don't know if you are going to be able to do what you want. Make a code that creates a perfect hash function for any input seems to me pretty ... not doable.

Anyway, here is a simple solution to write the code. It's C++17 but can be made to work with C++11 with a bit of trickery.

template<int i> void f();

template <int... Is>
struct Dispatcher
{
    template <int I> constexpr auto execute_if(int i)
    {
        if  (I == i)
            f<I>();
    }

    constexpr auto execute(int i)
    {
        (execute_if<Is>(i), ...);
    }
};

auto test()
{
    Dispatcher<1,5,100,300> dispatcher;  
    dispatcher.execute(5);
}

The above code translates to just a simple jump because 5 is a compile time constant:

test():                               # @test()
        jmp     void f<5>()            # TAILCALL

If the argument is a runtime variable, then it does a series of compares:

auto test(int i)
{
    Dispatcher<1,5,100,300> dispatcher;  
    dispatcher.execute(i);
}
test(int):                               # @test(int)
        cmp     edi, 99
        jg      .LBB0_4
        cmp     edi, 1
        je      .LBB0_7
        cmp     edi, 5
        jne     .LBB0_9
        jmp     void f<5>()            # TAILCALL
.LBB0_4:
        cmp     edi, 100
        je      .LBB0_8
        cmp     edi, 300
        jne     .LBB0_9
        jmp     void f<300>()          # TAILCALL
.LBB0_9:
        ret
.LBB0_7:
        jmp     void f<1>()            # TAILCALL
.LBB0_8:
        jmp     void f<100>()          # TAILCALL

The solution can be improved to perform a binary search, but it is not trivial.

Concettaconcettina answered 14/11, 2018 at 11:38 Comment(9)
Very interesting solution. I did something with tag dispatching that worked in a similar way. This approach provides a solution in O(N) with N defined as above. It is not bad per se, but I was looking for a O(1) solution. If no solution exists, I was thinking to automatically use a different solution based on the N,M values. Ideally the best solution would be a kind of hash function generator, that would be created provided the N inputs. For example for the inputs: 10, 100, 32, 45, 58, 126, 3, 29, 200, 400, 0 the function should be: (X ^ 28) % 13Serval
@Serval and that's what I meant in my first paragraph: I think it's not possible to create a code that, for any inputs will create a perfect hash function.Concettaconcettina
Yes, it's O(N). It can be made O(log N) with a binary search, but it's not as easy with metaprogramming.Concettaconcettina
I think that if that function exists it would be the holy grail. I think it would be acceptable to relax some constraints and instead of an array of M elements, it would be ok to use an helper array of X size with N < X < M and with the mapping not constrained to be in a sequential order.Serval
Intriguing solution; for a C++14 one, you can simply use the trick of the initialization of a C-style array (constexpr auto execute(int i) { using unused = int[]; (void)unused { 0, (execute_if<Is>(i), 0)... }; }); works also in C++11 but not as constexpr methodChucho
@Chucho yep I used something like that for my tag dispatch approach. No way to make constexpr something like that in C++11?Serval
@Serval - maybe I'm wrong but... I don't think so; or better: you can do it in recursive way (I can try to write an example, if you want) but compilers impose limits in template recursion and this can be a problem if N is great.Chucho
Damn, the constexpr thing would have been nice to have ehheServal
It looks like the compiler optimizes the linear search into a kind of binary search: it checks >100 first, no -> check 1 && 5, yes -> check 100 && 300. I'm curious how many folds it can go for.Heteroecious
C
4

Building on @bolov's answer, it's possible to use an arbitrary dispatch algorithm when i is not a constant by changing:

constexpr auto execute(int i)
{
    (execute_if<Is>(i), ...);
}

To:

constexpr auto execute(unsigned i)
{
    (execute_if<Is>(i), ...);
}

And then adding:

constexpr auto execute (int& i)
{
    // Add arbitrary dispatch mechanism here
}

Complete example, C++11 compatible and using a rather clunky std::map (worst case complexity log n) when i is not a constant (I dropped the constexpr stuff to make life easy in C++11):

#include <map>
#include <iostream>

std::map <int, void (*) ()> map;

template <int i> void f ();
template <> void f <1> () { std::cout << "f1\n"; }
template <> void f <2> () { std::cout << "f2\n"; }
template <> void f <3> () { std::cout << "f3\n"; }
template <> void f <4> () { std::cout << "f4\n"; }
template <> void f <5> () { std::cout << "f5\n"; }

template <int ... Is>
struct Dispatcher
{
    template <int first> void execute_if (int i)
    {
        if (first == i)
        {            
            std::cout << "Execute f" << i << " via template\n";
            f <first> ();
        }
    }

    template <int first, int second, int... rest> void execute_if (int i)
    {
        if (first == i)
        {            
            std::cout << "Execute f" << i << " via template\n";
            f <first> ();
        }
        else
            execute_if <second, rest...> (i);
    }

    void execute (unsigned i)
    {
        execute_if <Is...> (i);
    }

    void execute (int& i)
    {
        std::cout << "Execute f" << i << " via map\n";
        map.at (i) ();
    }
};

int main()
{
    map [1] = f <1>;
    map [2] = f <2>;
    map [3] = f <3>;
    map [4] = f <4>;
    map [5] = f <5>;

    Dispatcher <1, 2, 4> dispatcher;  
    dispatcher.execute (2);
    int i = 4;
    dispatcher.execute (i);
}

Output:

Execute f2 via template
f2
Execute f4 via map
f4

Live Demo


Edit: as per the OP's request, here's a version using a binary search instead of std::map. The key to this is building the array to be searched in the Dispatcher constructor.

#include <vector>
#include <iostream>

template <int i> void f ();
template <> void f <1> () { std::cout << "f1\n"; }
template <> void f <2> () { std::cout << "f2\n"; }
template <> void f <3> () { std::cout << "f3\n"; }
template <> void f <4> () { std::cout << "f4\n"; }
template <> void f <5> () { std::cout << "f5\n"; }

using ve = std::pair <int, void (*) ()>;

template <int ... Is>
struct Dispatcher
{
    template <int first> void execute_if (int i)
    {
        if (first == i)
        {            
            std::cout << "Execute f" << i << " via template\n";
            f <first> ();
        }
    }

    template <int first, int second, int... rest> void execute_if (int i)
    {
        if (first == i)
        {            
            std::cout << "Execute f" << i << " via template\n";
            f <first> ();
        }
        else
            execute_if <second, rest...> (i);
    }

    void execute (unsigned i)
    {
        execute_if <Is...> (i);
    }

    void execute (int& i)
    {
        std::cout << "Execute f" << i << " via binary search\n";
        auto lb = lower_bound (indexes.begin (), indexes.end (), ve (i, nullptr), 
            [] (ve p1, ve p2) { return p1.first < p2.first; });    
        if (lb != indexes.end () && lb->first == i)
            lb->second ();
    }

    template <int first> void append_index ()
    {
        indexes.emplace_back (ve (first, f <first>));
    }

    template <int first, int second, int... rest> void append_index ()
    {
        append_index <first> ();
        append_index <second, rest...> ();
    }

    Dispatcher ()
    {
        append_index <Is...> ();
    }

private:
    std::vector <ve> indexes;
};

int main()
{
    Dispatcher <1, 2, 4> dispatcher;  
    dispatcher.execute (2);
    int i = 4;
    dispatcher.execute (i);
}

Live demo

Cilla answered 14/11, 2018 at 15:52 Comment(8)
Interesting alternative. Thanks to sharing too. This is a kind of way to do the "The solution can be improved to perform a binary search, but it is not trivial" proposed by bolov, but unfortunately, as I stated in the question, I'm looking for a solution that does not involve the use of std::map or other "dynamic" approaches (I'm aware of allocators, but there is still some overhead )Serval
Added a binary search version to my answer.Cilla
Paul, thanks again for the feedback. I'm failing to notice the difference between this and the std::map version. Performance should still be O(log(N)) as the previous one, but it still uses a dynamic approach (std::vector) without being constexpr. It is however a really good alternative for some relaxed constraintsServal
If i is known at compile time, the solution can be O(1). If not, some kind of lookup table to needed, choose your poison. Simple.Cilla
another "poison" is to use the log(N) search as it has been suggested. But what I'm looking for is something in the middle of a lookup table and this log(N) solutionServal
There is no such something. Both solutions I provided are O(log(N)). Your other alternatives, as we have already covered, are an unordered_map or a sparse array or Bolov's solution. You have to choose now.Cilla
your solution is O(log(N)) sure, and it works. Problem is that it uses std containers and is not constexpr. I think however that there might be some other solution involving some math function that given the inputs generates the specific function for the problem (maybe involving operations like modulo or division).Serval
Don't think so.Cilla
C
2

The OP asked for a C++11 solution, that maintain the constexpres-ness, following the bolov's solution example.

Well... I don't if it's a good idea because a constexpr function/member in C++11 need to be recursive and return a value. And compilers pose strict limits to template recursion and this can be a problem if N (the sizeof...(Is)) is high.

Anyway... the best I can imagine is the following

template <int... Is>
struct Dispatcher
{
    template <typename = void>
    constexpr int execute_h (int) const
     { /* wrong case; exception? */ return -1; }

    template <int J0, int ... Js>
    constexpr int execute_h (int i) const
     { return J0 == i ? (f<J0>(), 0) : execute_h<Js...>(i); }

    constexpr int execute (int i) const
     { return execute_h<Is...>(i); }
};

that can be used, computing f<>() compile-time, as follows

void test()
{
    constexpr Dispatcher<1,5,100,300> dispatcher;  
    constexpr auto val1 = dispatcher.execute(5);
    constexpr auto val2 = dispatcher.execute(6);

    std::cout << val1 << std::endl; // print 0 (5 is in the list)
    std::cout << val2 << std::endl; // print -1 (6 isn't in the list)
}

also f<>() must be constexpr and, in C++11, can't return void; I've used the following

template <int i>
constexpr int f ()
 { return i; }
Chucho answered 14/11, 2018 at 14:15 Comment(4)
Thanks for sharing, indeed in C++11, this is for sure a really good solution for the O(N) caseServal
@Serval - yes: the bolov's solution is intriguing but, unfortunately, O(N). IMHO, it's better to use a std::map or a std::unordered_map (maybe constant; maybe compile-time initialized using a little of template-meta-programming); but you have explicitly excluded this type of solutions.Chucho
@Serval - I'm an idiot: there isn't needs of execute_if() is recursion case: you can do all in execute_h(). Answer simplified.Chucho
no worries max, thanks for the feedback. I'm looking for worst case scenario algorithms in this case, so std::unordered_map is excluded. A kind of compile time binary tree would be indeed very interesting to seeServal
C
2

A little improvement (IMHO) for bolov's solution

Writing execute_if to return true, when f<I>() is executed, or false, otherwise

template <int I>
constexpr auto execute_if (int i) const
{ return I == i ? f<I>(), true : false; }

instead of using the comma operator for template folding

template <int ... Is>
constexpr auto execute(int i) const
 { (execute_if<Is>(i), ...); }

we can use the or (||) operator

template <int ... Is>
constexpr auto execute(int i) const
 { (execute_if<Is>(i) || ...); }

Using the comma operator, execute_is<Is>(i) is called for ever Is, also when the first Is is equal to i; using the || we have short circuiting, that is that execute_is<Is>(i) is called only until we get a Is that is equal to i.

Chucho answered 14/11, 2018 at 17:36 Comment(2)
your code has the exact same generated assembly. This is because of the "as if rule". The observable behavior of both mine and yours is identical and the compilers can properly analyze this and optimize accordingly. I would call this premature optimization. I am sorry, I don't see the benefits of this version. To be clear, I am not saying it's bad or anything like that. Just that there is no difference.Concettaconcettina
@Concettaconcettina - I see... I tend to underestimate the power of compiler optimizations :(Chucho
C
0

In theory(!) you could create a perfect hash function using C++ templating.

  • This question has code on how to create a perfect hash function (using brute force, so it only works for relatively small sets).
  • This question shows that C++ templating is Turing complete, so it should be possible to convert the above code to C++ templates.
  • It might even be possible with C preprocessor, as it doesn't need an endless loop.

But I assume doing it is rather hard.

Cryometer answered 9/5, 2019 at 9:37 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.