Equivalent C++ to Python generator pattern
Asked Answered
T

13

158

I've got some example Python code that I need to mimic in C++. I do not require any specific solution (such as co-routine based yield solutions, although they would be acceptable answers as well), I simply need to reproduce the semantics in some manner.

Python

This is a basic sequence generator, clearly too large to store a materialized version.

def pair_sequence():
    for i in range(2**32):
        for j in range(2**32):
            yield (i, j)

The goal is to maintain two instances of the sequence above, and iterate over them in semi-lockstep, but in chunks. In the example below the first_pass uses the sequence of pairs to initialize the buffer, and the second_pass regenerates the same exact sequence and processes the buffer again.

def run():
    seq1 = pair_sequence()
    seq2 = pair_sequence()

    buffer = [0] * 1000
    first_pass(seq1, buffer)
    second_pass(seq2, buffer)
    ... repeat ...

C++

The only thing I can find for a solution in C++ is to mimic yield with C++ coroutines, but I haven't found any good reference on how to do this. I'm also interested in alternative (non general) solutions for this problem. I do not have enough memory budget to keep a copy of the sequence between passes.

Tercel answered 30/1, 2012 at 3:58 Comment(3)
As you can see from here #3864910 coroutine is not good idea to implement. Can't you do it with buffered reading? #4686362Sidecar
C++ iterators should support something like this.Introvert
Related: Equivalent in C++ of Yield in C#?Concierge
C
111

Generators exist in C++, just under another name: Input Iterators. For example, reading from std::cin is similar to having a generator of char.

You simply need to understand what a generator does:

  • there is a blob of data: the local variables define a state
  • there is an init method
  • there is a "next" method
  • there is a way to signal termination

In your trivial example, it's easy enough. Conceptually:

struct State { unsigned i, j; };

State make();

void next(State&);

bool isDone(State const&);

Of course, we wrap this as a proper class:

class PairSequence:
    // (implicit aliases)
    public std::iterator<
        std::input_iterator_tag,
        std::pair<unsigned, unsigned>
    >
{
  // C++03
  typedef void (PairSequence::*BoolLike)();
  void non_comparable();
public:
  // C++11 (explicit aliases)
  using iterator_category = std::input_iterator_tag;
  using value_type = std::pair<unsigned, unsigned>;
  using reference = value_type const&;
  using pointer = value_type const*;
  using difference_type = ptrdiff_t;

  // C++03 (explicit aliases)
  typedef std::input_iterator_tag iterator_category;
  typedef std::pair<unsigned, unsigned> value_type;
  typedef value_type const& reference;
  typedef value_type const* pointer;
  typedef ptrdiff_t difference_type;

  PairSequence(): done(false) {}

  // C++11
  explicit operator bool() const { return !done; }

  // C++03
  // Safe Bool idiom
  operator BoolLike() const {
    return done ? 0 : &PairSequence::non_comparable;
  }

  reference operator*() const { return ij; }
  pointer operator->() const { return &ij; }

  PairSequence& operator++() {
    static unsigned const Max = std::numeric_limts<unsigned>::max();

    assert(!done);

    if (ij.second != Max) { ++ij.second; return *this; }
    if (ij.first != Max) { ij.second = 0; ++ij.first; return *this; }

    done = true;
    return *this;
  }

  PairSequence operator++(int) {
    PairSequence const tmp(*this);
    ++*this;
    return tmp;
  }

private:
  bool done;
  value_type ij;
};

So hum yeah... might be that C++ is a tad more verbose :)

Cherish answered 30/1, 2012 at 7:34 Comment(22)
I accepted your answer (thanks!) because it is technically correct for the question i gave. Do you have any pointers for techniques in cases where the sequence that needs to be generated is more complex, or am I just beating a dead horse here with C++ and really coroutines are the only way for generality?Tercel
@NoahWatkins: coroutines make for an easy implementation when languages support them. Unfortunately C++ does not, so iteration is easier. If you really need coroutines, you actually need a full-blown thread to hold the "stack" of your function call on the side. It's definitely overkill to open such a can of worms just for that in this example, but your mileage may vary depending on your actual needs.Cherish
A non-thread-based coroutine implementation is available in Boost boost.org/doc/libs/1_57_0/libs/coroutine/doc/html/index.html with a proposal for standardization here: open-std.org/jtc1/sc22/wg21/docs/papers/2014/n3985.pdfJournal
@boycy: There are actually multiple proposal for coroutines, notably one stack-less and the other stack-full. It's tough nut to crack, so for now I am waiting. In the meantime though, stack-less coroutines are implementable near directly as Input Iterators (just, without the sugar).Cherish
Yet similar, iterators are not the same as generators.Wash
This practice is out of date, we should use explicit operator bool() instead of Safe Bool idiom, using instead of typedef.Inhospitable
@acgtyrant: using vs typedef is mostly a matter of preference here, though I'll admit to preferring using because it lines up the defined names nicely. On the other hand, I totally agree with the explicit, from C++11 onward it's definitely the recommendation. I updated the answer, care to give it another review?Cherish
@MatthieuM. I think C++03 does not has nullptr yet...And you can replace static unsigned const by static unsigned constexpr in C++11 too.Inhospitable
@acgtyrant: Ugh! Good point on nullptr. Not sure constexpr gains us much here though, since we do not need a compile-time value.Cherish
This code would read MUCH nicer if you split it up into two separate C++03 and C++11 versions... (Or just get rid of C++03 altogether; people shouldn't be writing new code with it)Lagomorph
@TimRae: Oh boy, I don't know where you work, but my previous company is still stuck (for most of its code) with a compiler that does still has the experimental -std=c++0x flag. Needless to say, migrating to a newer compiler (which has been in the books for a year or two already) is a pre-requisite before being able to actually use C++11 (or later). As for splitting... might be a good idea, though then again there's not much difference.Cherish
I implemented this answer and it works fine. In the end, I have a loop like for(PairSequence s; s; ++s) {...}. But what is the point of inheriting from std::iterator? How would I use the generator with STL algorithms like for_each, for example?Exemplification
@FredSchoen: For for_each you would need some more work: a constructor for an "end" iterator (setting the done boolean to true) and a comparison function (== & !=) that compares this boolean. Then it should model an input_iterator appropriately and thus work with for_each.Cherish
Downmarked the answer: iterators are NOT generators. Generators are much more elegant than iterators. Python has iterators as well as generators. C++ is working on coroutines (as in boost::coroutine2 and experimental work in clang) but they will not be included in C++17. Coroutines would allow generators in C++.Malvin
@EvertW: From a client point of view, iterators are generators. Of course, they are much more unwieldy to write. Coroutines may require revisiting this answer finally, but I'll hold off until they are standard first... and there might be performance implication in the sugar that may warrant continuing hand-coding the iterators.Cherish
@MatthieuM.: True, but the question was about the provider-side of generators, not the client side... Anyway, I am looking forward to co-routines becoming standard. Implementing generators might be a useful bit of sugar coating, but replacing threads by coroutines in cases where this is feasible will be an enormous improvement!Malvin
Maybe you could abuse macros to get generator syntax in C++. But I won't do that. Sad C++ doesn't have this ability, of all things.Tenaille
Thanks for the example. std::iterator is deprecated in C++17, perhaps you'd like to update the code.Etoile
@Etoile iterators are not deprecated. Is there a source that suggested that?Nato
@JustinMeiners it says that here, for example: en.cppreference.com/w/cpp/iterator/iteratorEtoile
@Etoile it says the Distance type of iterator is deprecated, not the concept. I am only seeing one instance of "deprecated" on the page?Nato
@JustinMeiners isn't it exactly what you're using in your example? I just mentioned that it's deprecated and that perhaps you'd like to update the code. I didn't say anything about the concept of iterators.Etoile
S
64

In C++ there are iterators, but implementing an iterator isn't straightforward: one has to consult the iterator concepts and carefully design the new iterator class to implement them. Thankfully, Boost has an iterator_facade template which should help implementing the iterators and iterator-compatible generators.

Sometimes a stackless coroutine can be used to implement an iterator.

P.S. See also this article which mentions both a switch hack by Christopher M. Kohlhoff and Boost.Coroutine by Oliver Kowalke. Oliver Kowalke's work is a followup on Boost.Coroutine by Giovanni P. Deretta.

P.S. I think you can also write a kind of generator with lambdas:

std::function<int()> generator = []{
  int i = 0;
  return [=]() mutable {
    return i < 10 ? i++ : -1;
  };
}();
int ret = 0; while ((ret = generator()) != -1) std::cout << "generator: " << ret << std::endl;

Or with a functor:

struct generator_t {
  int i = 0;
  int operator() () {
    return i < 10 ? i++ : -1;
  }
} generator;
int ret = 0; while ((ret = generator()) != -1) std::cout << "generator: " << ret << std::endl;

P.S. Here's a generator implemented with the Mordor coroutines:

#include <iostream>
using std::cout; using std::endl;
#include <mordor/coroutine.h>
using Mordor::Coroutine; using Mordor::Fiber;

void testMordor() {
  Coroutine<int> coro ([](Coroutine<int>& self) {
    int i = 0; while (i < 9) self.yield (i++);
  });
  for (int i = coro.call(); coro.state() != Fiber::TERM; i = coro.call()) cout << i << endl;
}
Shirleeshirleen answered 4/10, 2012 at 21:8 Comment(0)
W
25

Since Boost.Coroutine2 now supports it very well (I found it because I wanted to solve exactly the same yield problem), I am posting the C++ code that matches your original intention:

#include <stdint.h>
#include <iostream>
#include <memory>
#include <boost/coroutine2/all.hpp>

typedef boost::coroutines2::coroutine<std::pair<uint16_t, uint16_t>> coro_t;

void pair_sequence(coro_t::push_type& yield)
{
    uint16_t i = 0;
    uint16_t j = 0;
    for (;;) {
        for (;;) {
            yield(std::make_pair(i, j));
            if (++j == 0)
                break;
        }
        if (++i == 0)
            break;
    }
}

int main()
{
    coro_t::pull_type seq(boost::coroutines2::fixedsize_stack(),
                          pair_sequence);
    for (auto pair : seq) {
        print_pair(pair);
    }
    //while (seq) {
    //    print_pair(seq.get());
    //    seq();
    //}
}

In this example, pair_sequence does not take additional arguments. If it needs to, std::bind or a lambda should be used to generate a function object that takes only one argument (of push_type), when it is passed to the coro_t::pull_type constructor.

Wines answered 6/8, 2016 at 8:51 Comment(1)
Note that Coroutine2 requires c++11, for which visual studio 2013 is insufficient as it is only partially supported.Dissidence
C
22

All answers that involve writing your own iterator are completely wrong. Such answers entirely miss the point of Python generators (one of the language's greatest and unique features). The most important thing about generators is that execution picks up where it left off. This does not happen to iterators. Instead, you must manually store state information such that when operator++ or operator* is called anew, the right information is in place at the very beginning of the next function call. This is why writing your own C++ iterator is a gigantic pain; whereas, generators are elegant, and easy to read+write.

I don't think there is a good analog for Python generators in native C++, at least not yet (there is a rummor that yield will land in C++17). You can get something similarish by resorting to third-party (e.g. Yongwei's Boost suggestion), or rolling your own.

I would say the closest thing in native C++ is threads. A thread can maintain a suspended set of local variables, and can continue execution where it left off, very much like generators, but you need to roll a little bit of additional infrastructure to support communication between the generator object and its caller. E.g.

// Infrastructure

template <typename Element>
class Channel { ... };

// Application

using IntPair = std::pair<int, int>;

void yield_pairs(int end_i, int end_j, Channel<IntPair>* out) {
  for (int i = 0; i < end_i; ++i) {
    for (int j = 0; j < end_j; ++j) {
      out->send(IntPair{i, j});  // "yield"
    }
  }
  out->close();
}

void MyApp() {
  Channel<IntPair> pairs;
  std::thread generator(yield_pairs, 32, 32, &pairs);
  for (IntPair pair : pairs) {
    UsePair(pair);
  }
  generator.join();
}

This solution has several downsides though:

  1. Threads are "expensive". Most people would consider this to be an "extravagant" use of threads, especially when your generator is so simple.
  2. There are a couple of clean up actions that you need to remember. These could be automated, but you'd need even more infrastructure, which again, is likely to be seen as "too extravagant". Anyway, the clean ups that you need are:
    1. out->close()
    2. generator.join()
  3. This does not allow you to stop generator. You could make some modifications to add that ability, but it adds clutter to the code. It would never be as clean as Python's yield statement.
  4. In addition to 2, there are other bits of boilerplate that are needed each time you want to "instantiate" a generator object:
    1. Channel* out parameter
    2. Additional variables in main: pairs, generator
Contrabassoon answered 1/1, 2017 at 0:9 Comment(7)
You are confusing syntax with functionality. A few answers above actually allow the C++ to pick up execution from where it's left off during the last call. It's nothing magical. As a matter of fact, Python is implemented in C, so whatever is possible in Python is possible in C, albeit not as convenient.Integument
@edy Isn’t that already addressed in the first paragraph? He’s not claiming that equivalent functionality can’t be created in conventional C++, only that it’s “a gigantic pain”.Eyewash
@Eyewash The question here isn't whether it's a pain to do generator in C++, but whether there is a pattern to do so. His claims that the approach "miss the point", that the "closest thing" is threads... are just misleading. Is it a pain? One could read through the other answers and decide for himself.Integument
@edy But doesn't this end up being a vacuous point, given that all Turing-complete languages are ultimately capable of the same functionality? "Whatever is possible in X is possible in Y" is guaranteed to be true for all such languages, but that doesn't seem to me to be a very illuminating observation.Eyewash
@Eyewash Precisely because all Turing-complete languages supposedly should have the same capability, thus the question of how to implement one feature in another language is legitimate. Nothing that Python has cannot be accomplished by another language; the question is efficiency and maintainability. In both regard, C++ would be a fine(r) choice.Integument
No, the key question is elegance.Eyewash
The whole point of Python's generators is the very convenient syntax. So this answer is indeed very correct: There is no equivalent in C++ and other answers do miss the point. Obviously, functionally equivalent things can be done in C++, but I believe that's not what this question is about.Thyrsus
B
5

Using range-v3:

#include <iostream>
#include <tuple>
#include <range/v3/all.hpp>

using namespace std;
using namespace ranges;

auto generator = [x = view::iota(0) | view::take(3)] {
    return view::cartesian_product(x, x);
};

int main () {
    for (auto x : generator()) {
        cout << get<0>(x) << ", " << get<1>(x) << endl;
    }

    return 0;
}
Bimonthly answered 26/7, 2018 at 19:14 Comment(0)
F
4

You should probably check generators in std::experimental in Visual Studio 2015 e.g: https://blogs.msdn.microsoft.com/vcblog/2014/11/12/resumable-functions-in-c/

I think it's exactly what you are looking for. Overall generators should be available in C++17 as this is only experimental Microsoft VC feature.

Fusion answered 8/2, 2016 at 5:30 Comment(1)
c++20 has coroutines but generators were not shipped. (but proposed) you can just create a generator on your own.Hothead
R
2

If you only need to do this for a relatively small number of specific generators, you can implement each as a class, where the member data is equivalent to the local variables of the Python generator function. Then you have a next function that returns the next thing the generator would yield, updating the internal state as it does so.

This is basically similar to how Python generators are implemented, I believe. The major difference being they can remember an offset into the bytecode for the generator function as part of the "internal state", which means the generators can be written as loops containing yields. You would have to instead calculate the next value from the previous. In the case of your pair_sequence, that's pretty trivial. It may not be for complex generators.

You also need some way of indicating termination. If what you're returning is "pointer-like", and NULL should not be a valid yieldable value you could use a NULL pointer as a termination indicator. Otherwise you need an out-of-band signal.

Retrospect answered 30/1, 2012 at 4:43 Comment(0)
G
1

Something like this is very similar:

struct pair_sequence
{
    typedef pair<unsigned int, unsigned int> result_type;
    static const unsigned int limit = numeric_limits<unsigned int>::max()

    pair_sequence() : i(0), j(0) {}

    result_type operator()()
    {
        result_type r(i, j);
        if(j < limit) j++;
        else if(i < limit)
        {
          j = 0;
          i++;
        }
        else throw out_of_range("end of iteration");
    }

    private:
        unsigned int i;
        unsigned int j;
}

Using the operator() is only a question of what you want to do with this generator, you could also build it as a stream and make sure it adapts to an istream_iterator, for example.

Graziano answered 3/7, 2013 at 0:17 Comment(0)
U
1

Well, today I also was looking for easy collection implementation under C++11. Actually I was disappointed, because everything I found is too far from things like python generators, or C# yield operator... or too complicated.

The purpose is to make collection which will emit its items only when it is required.

I wanted it to be like this:

auto emitter = on_range<int>(a, b).yield(
    [](int i) {
         /* do something with i */
         return i * 2;
    });

I found this post, IMHO best answer was about boost.coroutine2, by Yongwei Wu. Since it is the nearest to what author wanted.

It is worth learning boost couroutines.. And I'll perhaps do on weekends. But so far I'm using my very small implementation. Hope it helps to someone else.

Below is example of use, and then implementation.

Example.cpp

#include <iostream>
#include "Generator.h"
int main() {
    typedef std::pair<int, int> res_t;

    auto emitter = Generator<res_t, int>::on_range(0, 3)
        .yield([](int i) {
            return std::make_pair(i, i * i);
        });

    for (auto kv : emitter) {
        std::cout << kv.first << "^2 = " << kv.second << std::endl;
    }

    return 0;
}

Generator.h

template<typename ResTy, typename IndexTy>
struct yield_function{
    typedef std::function<ResTy(IndexTy)> type;
};

template<typename ResTy, typename IndexTy>
class YieldConstIterator {
public:
    typedef IndexTy index_t;
    typedef ResTy res_t;
    typedef typename yield_function<res_t, index_t>::type yield_function_t;

    typedef YieldConstIterator<ResTy, IndexTy> mytype_t;
    typedef ResTy value_type;

    YieldConstIterator(index_t index, yield_function_t yieldFunction) :
            mIndex(index),
            mYieldFunction(yieldFunction) {}

    mytype_t &operator++() {
        ++mIndex;
        return *this;
    }

    const value_type operator*() const {
        return mYieldFunction(mIndex);
    }

    bool operator!=(const mytype_t &r) const {
        return mIndex != r.mIndex;
    }

protected:

    index_t mIndex;
    yield_function_t mYieldFunction;
};

template<typename ResTy, typename IndexTy>
class YieldIterator : public YieldConstIterator<ResTy, IndexTy> {
public:

    typedef YieldConstIterator<ResTy, IndexTy> parent_t;

    typedef IndexTy index_t;
    typedef ResTy res_t;
    typedef typename yield_function<res_t, index_t>::type yield_function_t;
    typedef ResTy value_type;

    YieldIterator(index_t index, yield_function_t yieldFunction) :
            parent_t(index, yieldFunction) {}

    value_type operator*() {
        return parent_t::mYieldFunction(parent_t::mIndex);
    }
};

template<typename IndexTy>
struct Range {
public:
    typedef IndexTy index_t;
    typedef Range<IndexTy> mytype_t;

    index_t begin;
    index_t end;
};

template<typename ResTy, typename IndexTy>
class GeneratorCollection {
public:

    typedef Range<IndexTy> range_t;

    typedef IndexTy index_t;
    typedef ResTy res_t;
    typedef typename yield_function<res_t, index_t>::type yield_function_t;
    typedef YieldIterator<ResTy, IndexTy> iterator;
    typedef YieldConstIterator<ResTy, IndexTy> const_iterator;

    GeneratorCollection(range_t range, const yield_function_t &yieldF) :
            mRange(range),
            mYieldFunction(yieldF) {}

    iterator begin() {
        return iterator(mRange.begin, mYieldFunction);
    }

    iterator end() {
        return iterator(mRange.end, mYieldFunction);
    }

    const_iterator begin() const {
        return const_iterator(mRange.begin, mYieldFunction);
    }

    const_iterator end() const {
        return const_iterator(mRange.end, mYieldFunction);
    }

private:
    range_t mRange;
    yield_function_t mYieldFunction;
};

template<typename ResTy, typename IndexTy>
class Generator {
public:
    typedef IndexTy index_t;
    typedef ResTy res_t;
    typedef typename yield_function<res_t, index_t>::type yield_function_t;

    typedef Generator<ResTy, IndexTy> mytype_t;
    typedef Range<IndexTy> parent_t;
    typedef GeneratorCollection<ResTy, IndexTy> finalized_emitter_t;
    typedef  Range<IndexTy> range_t;

protected:
    Generator(range_t range) : mRange(range) {}
public:
    static mytype_t on_range(index_t begin, index_t end) {
        return mytype_t({ begin, end });
    }

    finalized_emitter_t yield(yield_function_t f) {
        return finalized_emitter_t(mRange, f);
    }
protected:

    range_t mRange;
};      
Ung answered 9/11, 2018 at 19:36 Comment(0)
S
1

This answer works in C (and hence I think works in C++ too)

#include<stdint.h>
//#include<stdio.h>

#define MAX (1ll << 32) //2^32

typedef struct {
    uint64_t i, j;
} Pair;

int generate_pairs(Pair* p)
{
    static uint64_t i = 0;
    static uint64_t j = 0;

    p->i = i;
    p->j = j;
    
    if(++j == MAX)
    {
        j = 0;
        if(++i == MAX)
        {
            return -1; // return -1 to indicate generator finished.
        }
    }
    
    return 1; // return non -1 to indicate generator not finished.
}

int main()
{
    while(1)
    {
        Pair p;
        int fin = generate_pairs(&p);
        
        //printf("%lld, %lld\n", p.i, p.j);
        
        if(fin == -1)
        {
            //printf("end");
            break;
        }
    }
    return 0;
}

This is simple, non object-oriented way to mimic a generator. This worked as expected for me.

Edit: Previous code was erroneous and I have updated it.

Note: This code can be improved to use just uint32_t instead of uint64_t for the given question.

Siple answered 6/7, 2020 at 3:35 Comment(0)
C
1

It is possible to have yield comportment with simple goto statement. As it is simple, I wrote it in C.

All you have to do in your generator function is :

  • all variables are declared as static
  • last yield exit is memorized with a label
  • variables are reinitialized at the end of function

example :

#include <stdio.h>

typedef struct {
    int i, j;
} Pair;

// the function generate_pairs  can generate values in successive calls.
// - all variables are declared as static
// - last yield exit is memorized with a label
// - variables are reinitialized at the end of function
Pair* generate_pairs(int imax, int jmax)
{
    // all local variable are declared static. So they are declared at the beginning
    static int i = 0;
    static int j = 0;
    static Pair p;
    // the exit position is marked with a label
    static enum {EBEGIN, EYIELD1} tag_goto = EBEGIN;
    
    // I goto to the last exit position
    if (tag_goto == EYIELD1)
        goto TYIELD1;
    
    
    for (i=0; i<imax; i++)   {
        for (j=0; j<jmax; j++)   {
            p.i = i;   p.j = -j;
            
            // I manage the yield comportment
            tag_goto = EYIELD1;
            return &p;
            TYIELD1 : ;
        }
        j = 0;
    }
    
    // reinitialization of variables
    i = 0;   j = 0;   // in fact this reinitialization is not useful in this example
    tag_goto = EBEGIN;
    
    // NULL means ends of generator
    return NULL; 
}

int main()
{
    for (Pair *p = generate_pairs(2,4); p != NULL; p = generate_pairs(2,4))
    {
        printf("%d,%d\n",p->i,p->j);
    }
    printf("end\n");
    return 0;
}
Castile answered 5/11, 2021 at 19:52 Comment(0)
M
0

Something like this:

Example use:

using ull = unsigned long long;

auto main() -> int {
    for (ull val : range_t<ull>(100)) {
        std::cout << val << std::endl;
    }

    return 0;
}

Will print the numbers from 0 to 99

Mound answered 9/5, 2017 at 2:20 Comment(0)
S
-1

Just as a function simulates the concept of a stack, generators simulate the concept of a queue. The rest is semantics.

As a side note, you can always simulate a queue with a stack by using a stack of operations instead of data. What that practically means is that you can implement a queue-like behavior by returning a pair, the second value of which either has the next function to be called or indicates that we are out of values. But this is more general than what yield vs return does. It allows to simulate a queue of any values rather than homogeneous values that you expect from a generator, but without keeping a full internal queue.

More specifically, since C++ does not have a natural abstraction for a queue, you need to use constructs which implement a queue internally. So the answer which gave the example with iterators is a decent implementation of the concept.

What this practically means is that you can implement something with bare-bones queue functionality if you just want something quick and then consume queue's values just as you would consume values yielded from a generator.

Sportswear answered 1/1, 2017 at 2:16 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.