N-dimensionally nested metaloops with templates

S

2

10

I am trying to do N-dimensionally nested metaloops with template metaprogramming. The nesting part is trivial, however passing all the arbitrary number of iteration indices as template parameters to the most-inner loop seems problematic.

A simple unnested metaloop looks like:

template <size_t I, size_t N>
struct meta_for
{
    template <typename Lambda>
    inline meta_for(Lambda &&iteration)
    {
        iteration(I);
        meta_for<I+1, N> next(static_cast<Lambda&&>(iteration));
    }
};

template <size_t N>
struct meta_for<N, N>
{
    template <typename Lambda>
    inline meta_for(Lambda &&iteration)
    {
        return;
    }
};

#include <iostream>

int main()
{
    meta_for<0, 10>([&](size_t i) // perform 10 iterations
    {
        std::cout << i << '\n';
    });

    return 0;
}

Now, I want to make a metaloop, which accepts an N parameter denoting the dimensionality (the level of nesting), using like:

#include <iostream>

int main()
{
    // perform 3 dimensionally nested iterations
    // each index goes from 0 to 10
    // so 10x10x10 iterations performed
    meta_for<3, 0, 10>([&](size_t i, size_t j, size_t k)
    {
        std::cout << i << ' ' << j << ' ' << k << '\n';
    });

    return 0;
}

Solent answered 30/12, 2015 at 19:19 Comment(7)

[OT]: prefer std::forward<Lambda> over static_cast<Lambda&&> – Bose 30/12, 2015 at 19:48

@Jarod42: No, I hate std::forward for multiple reasons. 1: actually it performs a cast, but it's name doesn't reflect that. 2: some compilers (like MSVC 14) generate additional instructions and/or won't be able to perform optimizations on the lambda if std::forward is used instead of the plain cast. It can cause serious performance problems. Maybe it's a bug maybe not, the plain cast is more clear and secure to me. I also made a custom C++11-like forward_cast function for it. – Solent 30/12, 2015 at 19:51

In your example you may linearise your index from 0..10*10*10 and recompute with modulo each part. See Completely enumerate indices of D-dimensional array at compile time – Bose 30/12, 2015 at 20:1

Well I guess it's not a duplicate but anyways I did something very similar to this with "metaloops" once, I think my template code was fairly clean: #32321253 – Armenian 30/12, 2015 at 20:4

So, for some explanation, the idea was, instead of trying to mimic exactly the semantics of N nested loops, I just implement cartesian product of an arbitrary number of type lists, and then have a single loop over the cartesian product result. For run-time code, this will require a lot more memory, and it's not a great idea. But for compile-time code, I think it's not asymptotically different since at compile-time, you need to instantiate a new type / function / something for each iteration of any looping computation anyways. And here, you are outputting a structure of that size anyways. – Armenian 30/12, 2015 at 20:23

Maybe in your case you really want to use meta-loops and not cartesian product though? I guess your example is not initialization but output. Hard to say. FWIW the cartesian product is really simple and natural in functional programming so at least IMO it is an attractive approach for TMP. – Armenian 30/12, 2015 at 20:26

@ChrisBeck The good ol' Cartesian product. Nice approach. No, it's not Cartesian product or initialization, it's a computation. – Solent 30/12, 2015 at 20:37

B

6

Someone better versed in this stuff can improve my answer.

Live Demo

The gist of my solution is that you declare N dimensions, with a start and an end.

It recurses on N-1 dimensions with the same start and end.

When it reaches the 1st dimension, it will actually begin incrementing the start, calling the passed function.

It will always attempt to pass a number of arguments identical to the number of dimensions (their indices).

So a call like this:

meta_for<2, 0, 2>::loop(
    [](size_t i, size_t j)
    {
        std::cout << i << " " << j << std::endl;
    });

Will result in output like this:

0 0

0 1

1 0

1 1

Here's the meta_for structure, which uses a helper, iterate:

template<size_t D, size_t B, size_t E>
struct meta_for
{
    template<typename Func>
    static void loop(Func&& func)
    {
        iterate<D, B, B, E>::apply(std::forward<Func>(func));
    }
};

And the helpers:

// a helper macro to avoid repeating myself too much
#define FN template<typename Func, typename... Args> \
             static void apply(Func&& func, Args&&... a)


// Outer loop. S="Self" or "Start". Indicating current index of outer loop. Intent is to iterate until S == E
template<int Dim, size_t S, size_t B, size_t E>
struct iterate
{
    static_assert(S < E && B < E, "Indices are wrong");
    FN
    {
        // outer loop recursive case. Recurse on lower Dimension (Dim-1), and then increment outer loop (S+1)
        iterate<Dim-1, B, B, E>::apply (func, a..., S);
        iterate<Dim, S+1, B, E>::apply (func, a...);
    }
};

// Outer loop base case
template<int Dim, size_t B, size_t E> 
struct iterate<Dim, E, B, E>
{
    FN
    {
        // outer loop base case, End == End. Terminate loop
    }
};

// innter loop. "S" is outer loop's current index, which we need to pass on to function
// "B" is inner loop's (this loop) current index, which needs to iterate until B == E
template<size_t S, size_t B, size_t E>
struct iterate<1, S, B, E>
{
    static_assert(S < E && B < E, "Indices are wrong");
    FN
    {
        // inner loop recursive case. Perform work, and then recurse on next index (B+1)
        func(a..., B);
        iterate<1, S, B+1, E>::apply(func, a...);
    }
};

// inner loop base case
template<size_t S, size_t E>
struct iterate<1, S, E, E>
{
    FN
    {
        // inner loop base case, End == End. Terminate loop
    }
};

// case where zero dimensions (no loop)
template<size_t S, size_t B, size_t E>
struct iterate<0, S, B, E>
{
    static_assert(sizeof(S) == 0, "Need more than 0 dimensions!");
};

More explanation

This solution, like any other involving variadic templates, relies on recursion.

I wanted to express recursion on an outer loop, so I started with a base case; the end of the loop. This is the case where the start is the same as the end :

template<int Dim, size_t B, size_t E> 
struct iterate<Dim, E, B, E>
{ /*..*/};

Notice here that this is a specialization for <Dim, E, B, E>. The second position indicates the outer loop's current index, and the last position indicates the index to iterate up to (but not including). So in this case, the current index is the same as the last, indicating we are finished looping (and hence a "do nothing" function).

The recursive case for the outer loop involves the scenario where the loop index is less than the index to iterate to. In template terms, the second position is less than the fourth position:

template<int Dim, size_t S, size_t B, size_t E>
struct iterate
{/*...*/}

Notice this is NOT a specialization.

The logic of this function is that an outer loop should signal an inner loop to begin executing from its start, and then the outer loop continues and starts the process all over again for inner loops:

iterate<Dim-1, B, B, E>::apply (func, a..., S);
iterate<Dim, S+1, B, E>::apply (func, a...);

Notice in the first line that the second template argument is again B, indicating to start at the beginning again. This is necessary because the other recursive case on the second line increments S (incrementing outer loop index).

The entire time, we are also accumulating arguments to pass to the function:

::apply(func, a..., S)

is passing the function on, along with indices of higher dimension loops, and then appending the current loop's index (S). a here is a variadic template.

The inner loop

When I say "inner loop", I mean the innermost loop. This loop needs to simply increment until the start index reaches the end index, and not attempt to recurse on any lower dimension. In our case, this is when our Dim (Dimension) parameter is 1:

template<size_t S, size_t B, size_t E>
struct iterate<1, S, B, E>
{/*...*/};

At this point, we finally want to call our passed function, along with all arguments we've accumulated thus far (the indices of the outer loops) PLUS, the index of the innermost loop:

func(a..., B);

And then recurse (increment index)

iterate<1, S, B+1, E>::apply(func, a...);

The base case here is when the innermost loop's index is the same as the end index (AND the dimension is 1):

template<size_t S, size_t E>
struct iterate<1, S, E, E>
{/*...*/};

Hence the "do nothing" function here; there shouldn't be any work performed because the loop is terminating.

Finally, I included one last specialization to catch a user error where they didn't specify any dimensions:

template<size_t S, size_t B, size_t E>
struct iterate<0, S, B, E>

Which uses static_assert to always fail because sizeof(size_t) is not zero:

static_assert(sizeof(S) == 0, "Need more than 0 dimensions!");

Conclusion

This is a specific use case template meta-program. Where we essentially generate N nested for loops that all have the same start and end indices AND we want to pass those indices to a function. We could do a little more work to make it such that the iterate structure could stand on its own without making the assumption that the outer loop's start and end indices are the same as an inner loop's.

My favorite application of this code is that we can use it to make an N-dimensional counter. For example, a binary counter for N-bits (found in the live demo).

Bureaucrat answered 4/1, 2016 at 23:8 Comment(4)

the only thing I don't get is why the length of a... depends on the first template parameter of the class. you never pass actual values to the variadic template parameter. is this standard behavior? usually std::integer_sequence is used for these purposes. an excellent answer by the way. – Solent 6/1, 2016 at 16:48

@plasmacel: I don't fully understand what you're asking, however I'll wager a guess and try to be thorough. The parameter pack expansion a... expands into a comma separated list of arguments. Initially the pack is empty, but as the templates are recursively instantiated, this parameter pack grows to include current loop indices a..., S becomes an a... at the next level of recursion, all the way down to a..., B. Effectively recursion causes the pack to expand first as (<none>), then (S), (S, S+1), ..., (S, S+1, ..., E-1), or concretely (0), (0, 1), ..., (0, 1, ..., 9) – Bureaucrat 6/1, 2016 at 16:59

you pass the same number of template arguments at every recursion so I don't see how it grows. I have never seen a template parameter pack expansion like this, however it works. – Solent 6/1, 2016 at 17:14

@plasmacel: Ah, now I understand what you're saying. It may appear that I'm passing the same number of template arguments, but actually that is not the case! As I mentioned in my previous comment, the parameter pack is growing by a single value each time it enters a lower level of recursion. When I call iterate<Dim-1, B, B, E>::apply (func, a..., S); we enter a lower level of recursion on iterate. At that point, a refers to the previous level of recursion's (a..., S), which is now effectively (S-1). Then we recurse again, and a takes on (S-1, S-2, S) Make sense? – Bureaucrat 6/1, 2016 at 18:10

B

11

Since this question appears to still be getting traffic, I thought it would be a good idea to show how much easier this is to do in C++17. First, the full code

Demo

template<size_t Dimensions, class Callable>
constexpr void meta_for_loop(size_t begin, size_t end, Callable&& c)
{
    static_assert(Dimensions > 0);
    for(size_t i = begin; i != end; ++i)
    {
        if constexpr(Dimensions == 1)
        {
            c(i);
        }
        else
        {
            auto bind_an_argument = [i, &c](auto... args)
            {
                c(i, args...);
            };
            meta_for_loop<Dimensions-1>(begin, end, bind_an_argument);
        }
    }
}

Explanation:

If the Dimensions is 1, we simply call the provided-lambda with the next index in a loop
Else, we create a new callable from the provided one, except that we bind the loop index to one of the callable arguments. Then we recurse on our meta for loop with 1 fewer dimension.

If you're familiar at all with functional programming, this is a bit easier to understand, as it's an application of currying.

How it works in more concrete terms:

You want a binary counter that goes

0 0
0 1
1 0
1 1

So you create a callable that can print two integers like so:

auto callable = [](size_t i, size_t j)
{
   std::cout << i << " " << j << std::endl;
};

And since we have two columns, we have two dimensions, so D = 2.

We call our meta for loop defined above like so:

meta_for_loop<2>(0, 2, callable);

The end argument to meta_for_loop is 2 instead of 1 because we are modeling a half-closed interval [start, end), which is common in programming because people often want the first index to be included in their loop, and then they want to iterate (end - start) times.

Let's step through the algorithm:

Dimensions == 2, so we don't fail our static assert
We begin to iterate, i = 0
Dimensions == 2, so we enter the "else" branch of our constexpr if statement
- We create a new callable that captures the passed in callable and name it bind_an_argument to reflect that we're binding one argument of the provided callable c.

So, bind_an_argument effectively looks like this:

void bind_an_argument(size_t j)
{
    c(i, j);
}

Note that i stays the same, but j is variable. This is useful in our meta for loop because we want to model the fact that an outer loop stays at the same index while an inner loop iterates over its whole range. For example

for(int i = 0; i < N; ++i)
{
    for (int j = 0; j < M; ++j)
    {
       /*...*/
    }
}

when i == 0 we iterate over all values of j from 0 to M, and then we repeat for i == 1, i == 2, etc.

We call meta_for_loop again, except that Dimensions is now 1 instead of 2, and our Callable is now bind_an_argument instead of c
Dimensions == 1 so our static_assert passes
We begin to loop for(size_t i = 0; i < 2; ++i)
Dimensions == 1 so we enter the if branch of our constexpr if
We call bind_an_argument with i = 1, which calls our callable from above with arguments (0, 0), the first of which was bound from the previous call to meta_for_loop. This produces output

0 0
We call bind_an_argument with i == 1, which calls our callable from above with arguments (0, 1), the first argument of which was bound during our previous call to meta_for_loop. This produces output

0 1
We finish iterating, so the stack unwinds to the parent calling function
We're back in our call to meta_for_loop with Dimensions == 2 and Callable == callable. We finish our first loop iteration and then increment i to 1
Since Dimensions == 2, we enter the else branch again
Repeat steps 4 through 10, except that the first argument to callable is bound to 1 instead of 0. This produces output

1 0
1 1

Bureaucrat answered 3/2, 2019 at 22:20 Comment(1)

Thanks, this is awesome. – Solent 3/2, 2019 at 22:39

B

6

Someone better versed in this stuff can improve my answer.

Live Demo

The gist of my solution is that you declare N dimensions, with a start and an end.

It recurses on N-1 dimensions with the same start and end.

When it reaches the 1st dimension, it will actually begin incrementing the start, calling the passed function.

It will always attempt to pass a number of arguments identical to the number of dimensions (their indices).

So a call like this:

meta_for<2, 0, 2>::loop(
    [](size_t i, size_t j)
    {
        std::cout << i << " " << j << std::endl;
    });

Will result in output like this:

0 0

0 1

1 0

1 1

Here's the meta_for structure, which uses a helper, iterate:

template<size_t D, size_t B, size_t E>
struct meta_for
{
    template<typename Func>
    static void loop(Func&& func)
    {
        iterate<D, B, B, E>::apply(std::forward<Func>(func));
    }
};

And the helpers:

// a helper macro to avoid repeating myself too much
#define FN template<typename Func, typename... Args> \
             static void apply(Func&& func, Args&&... a)


// Outer loop. S="Self" or "Start". Indicating current index of outer loop. Intent is to iterate until S == E
template<int Dim, size_t S, size_t B, size_t E>
struct iterate
{
    static_assert(S < E && B < E, "Indices are wrong");
    FN
    {
        // outer loop recursive case. Recurse on lower Dimension (Dim-1), and then increment outer loop (S+1)
        iterate<Dim-1, B, B, E>::apply (func, a..., S);
        iterate<Dim, S+1, B, E>::apply (func, a...);
    }
};

// Outer loop base case
template<int Dim, size_t B, size_t E> 
struct iterate<Dim, E, B, E>
{
    FN
    {
        // outer loop base case, End == End. Terminate loop
    }
};

// innter loop. "S" is outer loop's current index, which we need to pass on to function
// "B" is inner loop's (this loop) current index, which needs to iterate until B == E
template<size_t S, size_t B, size_t E>
struct iterate<1, S, B, E>
{
    static_assert(S < E && B < E, "Indices are wrong");
    FN
    {
        // inner loop recursive case. Perform work, and then recurse on next index (B+1)
        func(a..., B);
        iterate<1, S, B+1, E>::apply(func, a...);
    }
};

// inner loop base case
template<size_t S, size_t E>
struct iterate<1, S, E, E>
{
    FN
    {
        // inner loop base case, End == End. Terminate loop
    }
};

// case where zero dimensions (no loop)
template<size_t S, size_t B, size_t E>
struct iterate<0, S, B, E>
{
    static_assert(sizeof(S) == 0, "Need more than 0 dimensions!");
};

More explanation

This solution, like any other involving variadic templates, relies on recursion.

I wanted to express recursion on an outer loop, so I started with a base case; the end of the loop. This is the case where the start is the same as the end :

template<int Dim, size_t B, size_t E> 
struct iterate<Dim, E, B, E>
{ /*..*/};

Notice here that this is a specialization for <Dim, E, B, E>. The second position indicates the outer loop's current index, and the last position indicates the index to iterate up to (but not including). So in this case, the current index is the same as the last, indicating we are finished looping (and hence a "do nothing" function).

The recursive case for the outer loop involves the scenario where the loop index is less than the index to iterate to. In template terms, the second position is less than the fourth position:

template<int Dim, size_t S, size_t B, size_t E>
struct iterate
{/*...*/}

Notice this is NOT a specialization.

The logic of this function is that an outer loop should signal an inner loop to begin executing from its start, and then the outer loop continues and starts the process all over again for inner loops:

iterate<Dim-1, B, B, E>::apply (func, a..., S);
iterate<Dim, S+1, B, E>::apply (func, a...);

Notice in the first line that the second template argument is again B, indicating to start at the beginning again. This is necessary because the other recursive case on the second line increments S (incrementing outer loop index).

The entire time, we are also accumulating arguments to pass to the function:

::apply(func, a..., S)

is passing the function on, along with indices of higher dimension loops, and then appending the current loop's index (S). a here is a variadic template.

The inner loop

When I say "inner loop", I mean the innermost loop. This loop needs to simply increment until the start index reaches the end index, and not attempt to recurse on any lower dimension. In our case, this is when our Dim (Dimension) parameter is 1:

template<size_t S, size_t B, size_t E>
struct iterate<1, S, B, E>
{/*...*/};

At this point, we finally want to call our passed function, along with all arguments we've accumulated thus far (the indices of the outer loops) PLUS, the index of the innermost loop:

func(a..., B);

And then recurse (increment index)

iterate<1, S, B+1, E>::apply(func, a...);

The base case here is when the innermost loop's index is the same as the end index (AND the dimension is 1):

template<size_t S, size_t E>
struct iterate<1, S, E, E>
{/*...*/};

Hence the "do nothing" function here; there shouldn't be any work performed because the loop is terminating.

Finally, I included one last specialization to catch a user error where they didn't specify any dimensions:

template<size_t S, size_t B, size_t E>
struct iterate<0, S, B, E>

Which uses static_assert to always fail because sizeof(size_t) is not zero:

static_assert(sizeof(S) == 0, "Need more than 0 dimensions!");

Conclusion

This is a specific use case template meta-program. Where we essentially generate N nested for loops that all have the same start and end indices AND we want to pass those indices to a function. We could do a little more work to make it such that the iterate structure could stand on its own without making the assumption that the outer loop's start and end indices are the same as an inner loop's.

My favorite application of this code is that we can use it to make an N-dimensional counter. For example, a binary counter for N-bits (found in the live demo).

Bureaucrat answered 4/1, 2016 at 23:8 Comment(4)

the only thing I don't get is why the length of a... depends on the first template parameter of the class. you never pass actual values to the variadic template parameter. is this standard behavior? usually std::integer_sequence is used for these purposes. an excellent answer by the way. – Solent 6/1, 2016 at 16:48

@plasmacel: I don't fully understand what you're asking, however I'll wager a guess and try to be thorough. The parameter pack expansion a... expands into a comma separated list of arguments. Initially the pack is empty, but as the templates are recursively instantiated, this parameter pack grows to include current loop indices a..., S becomes an a... at the next level of recursion, all the way down to a..., B. Effectively recursion causes the pack to expand first as (<none>), then (S), (S, S+1), ..., (S, S+1, ..., E-1), or concretely (0), (0, 1), ..., (0, 1, ..., 9) – Bureaucrat 6/1, 2016 at 16:59

you pass the same number of template arguments at every recursion so I don't see how it grows. I have never seen a template parameter pack expansion like this, however it works. – Solent 6/1, 2016 at 17:14

@plasmacel: Ah, now I understand what you're saying. It may appear that I'm passing the same number of template arguments, but actually that is not the case! As I mentioned in my previous comment, the parameter pack is growing by a single value each time it enters a lower level of recursion. When I call iterate<Dim-1, B, B, E>::apply (func, a..., S); we enter a lower level of recursion on iterate. At that point, a refers to the previous level of recursion's (a..., S), which is now effectively (S-1). Then we recurse again, and a takes on (S-1, S-2, S) Make sense? – Bureaucrat 6/1, 2016 at 18:10

Live Demo

More explanation

The inner loop

Conclusion

Demo

Explanation:

How it works in more concrete terms:

Live Demo

More explanation

The inner loop

Conclusion

Recommended topics

Hot tags