Variadic templates and switch statement?
Asked Answered
E

5

25

I have the following function which can take N arguments of different types, and forwards them to N functions templated on each individual type, in this manner (example with two arguments):

template <typename T1, typename T2>
bool func(int& counter, T1 x1, T2 x2) {
    switch (counter) {
        case 0:
            if (func2<T1>(x1)) {
                counter++;
                return true;
            } else {
                return false;
            }
        case 1:
            if (func2<T2>(x2)) {
                counter++;
                return true;
            } else {
                return false;
            }
        default:
            return true;
    }
}

I want to write this function with variadic templates so that it can handle any number of arguments in a type-safe way. I can see a solution using recursive functions, passing along the counter and the variadic index and comparing them for equality, but this would seem to yield far less efficient code than the switch statement above (a sequence of if-checks compared to a jump table).

Can this be done efficiently using template metaprogramming or do I need to provide overloads for each arity?

Escent answered 18/9, 2017 at 12:12 Comment(1)
Bottom line is that based on my experience, if you need absolute maximum efficiency, and need it to work with any compiler other than a recent clang, you have to provide the overloads because there is no variadic switch case. If you can live with slightly less optimal code or only need it to work with clang, solution given should work. If you do need to generate arity overloads I highly recommend using boost PP, the resulting code will be way more maintainable.Installment
I
24

Here's a solution similar to max's, but it: a) clearly separates out the generic parts from the parts specific to the solution, and b) I show that clang fully optimizes it. The basic idea is to build a switch case at compile time, from a contiguous integer sequence. We do that like this:

template <class T, T ... Is, class F>
auto compile_switch(T i, std::integer_sequence<T, Is...>, F f) {
  using return_type = std::common_type_t<decltype(f(std::integral_constant<T, Is>{}))...>;
  return_type ret;
  std::initializer_list<int> ({(i == Is ? (ret = f(std::integral_constant<T, Is>{})),0 : 0)...});
  return ret;
}

The idea is that integer gets passed into the lambda as an integral constant type, so its usable in compile time contexts. To use this with the current problem, all we have to do is forward the variadic pack a tuple, and apply the usual tricks with index sequence:

template <class T, std::size_t ... Is>
bool func_impl(std::size_t& counter, T&& t, std::index_sequence<Is...> is) {
  auto b = compile_switch(counter, is, [&] (auto i) -> bool {
    return func2(std::get<i>(std::move(t)));
  });
  if (b) ++counter;
  return b;
}

template <class ... Ts>
bool func(std::size_t & counter, Ts&& ... ts) {
  return func_impl(counter,
      std::forward_as_tuple(std::forward<Ts>(ts)...),
      std::index_sequence_for<Ts...>{});
}

We'll use this definition of func2 to look at some assembly:

template <class T>
bool func2(const T& t) { std::cerr << t; return std::is_trivial<T>::value; }

Looking here: https://godbolt.org/g/6idVPS we notice the following instructions:

auto compile_switch<unsigned long, 0ul, 1ul, 2ul, 3ul, 4ul, 5ul, bool func_impl<std::tuple<int&, double&, int&, unsigned long&, char const*&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&>, 0ul, 1ul, 2ul, 3ul, 4ul, 5ul>(unsigned long&, std::tuple<int&, double&, int&, unsigned long&, char const*&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&>&&, std::integer_sequence<unsigned long, 0ul, 1ul, 2ul, 3ul, 4ul, 5ul>)::{lambda(auto:1)#1}>(unsigned long, std::integer_sequence<unsigned long, 0ul, 1ul, 2ul, 3ul, 4ul, 5ul>, bool func_impl<std::tuple<int&, double&, int&, unsigned long&, char const*&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&>, 0ul, 1ul, 2ul, 3ul, 4ul, 5ul>(unsigned long&, std::tuple<int&, double&, int&, unsigned long&, char const*&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&>&&, std::integer_sequence<unsigned long, 0ul, 1ul, 2ul, 3ul, 4ul, 5ul>)::{lambda(auto:1)#1}): # @auto compile_switch<unsigned long, 0ul, 1ul, 2ul, 3ul, 4ul, 5ul, bool func_impl<std::tuple<int&, double&, int&, unsigned long&, char const*&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&>, 0ul, 1ul, 2ul, 3ul, 4ul, 5ul>(unsigned long&, std::tuple<int&, double&, int&, unsigned long&, char const*&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&>&&, std::integer_sequence<unsigned long, 0ul, 1ul, 2ul, 3ul, 4ul, 5ul>)::{lambda(auto:1)#1}>(unsigned long, std::integer_sequence<unsigned long, 0ul, 1ul, 2ul, 3ul, 4ul, 5ul>, bool func_impl<std::tuple<int&, double&, int&, unsigned long&, char const*&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&>, 0ul, 1ul, 2ul, 3ul, 4ul, 5ul>(unsigned long&, std::tuple<int&, double&, int&, unsigned long&, char const*&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&>&&, std::integer_sequence<unsigned long, 0ul, 1ul, 2ul, 3ul, 4ul, 5ul>)::{lambda(auto:1)#1})
        push    r14
        push    rbx
        push    rax
        mov     bl, 1
        cmp     rdi, 5
        ja      .LBB2_11
        jmp     qword ptr [8*rdi + .LJTI2_0]

Looking down for that label, we find:

.LJTI2_0:
        .quad   .LBB2_2
        .quad   .LBB2_4
        .quad   .LBB2_5
        .quad   .LBB2_6
        .quad   .LBB2_7
        .quad   .LBB2_10

In other words, clang has both turned this into a jump table, and inlined all the calls to func2. This is not possible using a table of function pointers as some have suggested (at least I've never seen a compiler do it), in fact the only way to get assembly this good is via switch case, or with this technique + clang. Sadly gcc will not generate assembly quite as good, but still decent.

Installment answered 18/9, 2017 at 14:44 Comment(8)
You did it! If I could have voted twice, I would have done it!Patrizius
@Patrizius Thanks :-) Your previous comment, which it looks like you removed, brings up an interesting point though. I'm not actually sure why std::integral_constant can actually convert like that... its conversion is constexpr of course but anything passed into a function parameter is not constexpr, so it shouldn't work. Unless there's an exception for stateless things?Installment
@Patrizius Dug up the answer to that, it was not trivial (for me at least): #46289458.Installment
Indeed it is not that trivial, I read [expr.const] and [dcl.constexpr] back and forth. I believe these sections are inheritently difficult to apply because they do not state what is allowed, but what is disallowed. So we must check that every statement of a constexpr function and all the subexpressions of an expression does not fall on any of the bullets of these standard sections to evaluate if an expression could be a constant expression. Even after having performed the checks, I still have the doubt of having missed one, so I prefer to rely on compiler judgment!Patrizius
Why did you use std::move (t) in func_impl instead of std::forward, as in func?Susceptive
"This is not possible using a table of function pointers as some have suggested" - As of gcc 7.3 (or even earlier?) it is. It finds and inlines function bodies perfectly when a constexpr inline function pointer array is used.Tanner
@GézaTörök I don't know about this specific example as I haven't revisited, but it sure isn't perfect. Here's an example: gcc.godbolt.org/z/cAsVe4. visit is implemetned using tables of function pointers and you can see how badly gcc 9 still fails to remove a simple no-op. If you think gcc can do simpler cases, I'd suggest posting an answer with a godbolt link.Installment
@NirFriedman Thanks for this remark, I realized that elements of the fn ptr array were initialized using captureless lambdas generated in a fold expression. I don't know why the difference though; maybe immediate availability of fn body and lack of linkage constraints helps optimizationTanner
C
3

Just for fun, I propose the following way

template <typename ... Ts>
bool func (int & cnt, Ts ... xs)
 {
   using unused = int[];

   int  i   { -1 };
   bool ret { true };

   (void)unused { 0, ((++i == cnt ? (ret = func<Ts>(xs)) : true), 0)... };

   if ( ret && (cnt <= i) )
      ++cnt;

   return ret;
 }

but I don't think is a efficient way as the switch way.

Collayer answered 18/9, 2017 at 12:26 Comment(5)
I think it will be for clang, but not for gcc. See discussion here: reddit.com/r/cpp/comments/6vyqra/variadic_switch_case.Installment
You might do something like: std::function<bool()> fs = { [&](){ if (func<Ts>(xs)) { ++counter; return true; return false; } }... }; return fs[counter](); to have direct jump.Sind
@Sind I cannot get this to compile with g++ (6.3), it seems it does not accept it due to the parameter pack expansion of Ts and xs.Noctule
Shame about GCC not accepting it. Clang does however.Rhetic
@Holt: Normally, you can use regular (template) functions instead of lambda as work-around.Sind
P
2

This solution may be the most efficient:

template<size_t I,class...Args>
bool func2_b(Args...arg)
  {
  if (func2(std::get<I>(std::tuple<Args...>{arg...})))
    return true;
  else
   return false;
  }

template<class...Args,size_t...Is>
bool func_(int& counter,std::index_sequence<Is...>,Args...args)
  {
  using ft = bool(*)(Args...);
  ft table[]={func2_b<Is,Args...>...};
  if (counter<0 || counter>=(int)sizeof...(Args))
    return false;
  return table[counter](args...);
  }

template<class...Args>
bool func(int& counter,Args...xs)
  {
  return func_(counter,std::make_index_sequence<sizeof...(Args)>{},xs...);
  }
Patrizius answered 18/9, 2017 at 14:3 Comment(2)
Unlikely, anything involving a table of function pointers is going to prevent inlining. It's strictly slower than switch case in all situations, and strictly slower for max's solution if clang is doing the compiling since it can transform it into a switch case.Installment
@NirFriedman I remember an answer of you or maybe a comment, about a similar question. I know the problem with inlining since that. I was thinking about just putting a link to this question, but I do not find it anymore. Has someone find since last month to make a template switch statement?Patrizius
S
1

Also for fun, this might be a bit too convoluted

#include<type_traits>
#include<array>

template<typename T>
void g(T&& t)
{
    // This function gets called
}

template<typename T>
void entry(void* p)
{
    g(*(std::remove_reference_t<T>*)p);
}

template<size_t N>
using table_t = std::array<void (*)(void*), N>;

template<typename... Ts>
constexpr auto make_table()
{
    return table_t<sizeof...(Ts)>{
        entry<Ts>...
    };
}

template<size_t N>
void f_(const table_t<N>&, int)
{

}

template<size_t N, typename T, typename... Ts>
void f_(const table_t<N>& table, int select, T&& t, Ts&&... ts)
{
    if(select == N - sizeof...(Ts) - 1)
        table[select]((void*)&t);
    else
        f_(table, select, std::forward<Ts>(ts)...);
}

template<typename... Ts>
void f(int select, Ts&&... ts)
{
    static constexpr auto table = make_table<Ts...>();
    if(select < 0 || select >= int(sizeof...(Ts)))
        throw "out of bounds";
    f_(table, select, std::forward<Ts>(ts)...);
}

Which rolls a vtable in f and dispatches accordingly to g.

Live

Siderosis answered 18/9, 2017 at 13:2 Comment(0)
M
0

Theoretically you could do binary search of parameter index by yourself:

#include <type_traits>
#include <tuple>
#include <typeinfo>
#include <iostream>
#include <algorithm>


template <std::size_t I>
using ic = std::integral_constant<std::size_t, I>;

template <class T>
bool func2(T) {
    std::cout<<typeid(T).name()<<std::endl;
    return true;
}

template <std::size_t N, class T>
bool func_impl(ic<0>, ic<N>, std::size_t &, T &&tup) {
    constexpr int index = std::min(N - 1, std::tuple_size<T>{} - 1);
    if (func2<std::tuple_element_t<index, std::decay_t<T>>>(std::get<index>(tup))) 
        return true;
    return false;
}

template <std::size_t K, std::size_t N, class T>
bool func_impl(ic<K>, ic<N> n, std::size_t &counter, T &&tup) {
    if (counter == N - 1) {
        return func_impl(ic<0>{}, n, counter, std::forward<T>(tup));
    }
    if (counter < N) {
        return func_impl(ic<K/2>{}, ic<N - K>{}, counter, std::forward<T>(tup));
    } else {
        return func_impl(ic<K/2>{}, ic<N + K>{}, counter, std::forward<T>(tup));
    }
}

template <class... Ts>
bool func(std::size_t& counter, Ts&&... xs) {
    return func_impl(ic<sizeof...(Ts)/2>{}, ic<sizeof...(Ts)/2>{}, counter, std::forward_as_tuple(xs...));
}

int main() {
   std::size_t i = 0;
   func<int, float, double, char>(i, 1, 2, 3, 4); 
}

[live demo]

Mauk answered 18/9, 2017 at 14:44 Comment(1)
#define MIN(x, y) Why ? std::min is constexpr since C++14, and even if it is for C++11, you may write (inline) constexpr function instead.Sind

© 2022 - 2024 — McMap. All rights reserved.