"yield" keyword for C++, How to Return an Iterator from my Function?
Asked Answered
S

4

20

Consider the following code.

std::vector<result_data> do_processing() 
{
    pqxx::result input_data = get_data_from_database();
    return process_data(input_data);
}

std::vector<result_data> process_data(pqxx::result const & input_data)
{
    std::vector<result_data> ret;
    pqxx::result::const_iterator row;
    for (row = input_data.begin(); row != inpupt_data.end(); ++row) 
    {
        // somehow populate output vector
    }
    return ret;
}

While I was thinking about whether or not I could expect Return Value Optimization (RVO) to happen, I found this answer by Jerry Coffin [emphasis mine]:

At least IMO, it's usually a poor idea, but not for efficiency reasons. It's a poor idea because the function in question should usually be written as a generic algorithm that produces its output via an iterator. Almost any code that accepts or returns a container instead of operating on iterators should be considered suspect.

Don't get me wrong: there are times it makes sense to pass around collection-like objects (e.g., strings) but for the example cited, I'd consider passing or returning the vector a poor idea.

Having some Python background, I like Generators very much. Actually, if it were Python, I would have written above function as a Generator, i.e. to avoid the necessity of processing the entire data before anything else could happen. For example like this:

def process_data(input_data):
    for item in input_data:
        # somehow process items
        yield result_data

If I correctly interpreted Jerry Coffins note, this is what he suggested, isn't it? If so, how can I implement this in C++?

Sparky answered 10/8, 2012 at 9:17 Comment(2)
Just return the vector, it's perfectly fine. (N)RVO will most likely take care of this, and in C++11 move semantics will do when (N)RVO does not. Also, return process_data(get_data_from_database());. Sadly, C++ doesn't have the yield functionality. :(Scintillate
"Almost any code that accepts or returns a container instead of operating on iterators should be considered suspect." I would argue with this. It's certainly good general advice, but the desire to make code independent of the container type is often misguided and they are not very interchangable except in syntax...Autocratic
E
18

No, that’s not what Jerry means, at least not directly.

yield in Python implements coroutines. C++ doesn’t have them (but they can of course be emulated but that’s a bit involved if done cleanly).

But what Jerry meant is simply that you should pass in an output iterator which is then written to:

template <typename O>
void process_data(pqxx::result const & input_data, O iter) {
    for (row = input_data.begin(); row != inpupt_data.end(); ++row)
        *iter++ = some_value;
}

And call it:

std::vector<result_data> result;
process_data(input, std::back_inserter(result));

I’m not convinced though that this is generally better than just returning the vector.

Erythromycin answered 10/8, 2012 at 9:26 Comment(3)
A big advantage in favor of passing 'out' iterators is that the code becomes more efficient when calling process_data twice - maybe on different input data - but want the output to be in the same result object. And efficiency is why you use C++ in the first place, right? That aside, you allow the caller to use a custom allocator for the container.Microscopic
@Frerich I admit that this is a valid point that should be taken into consideration. Maybe Jerry’s advice is more generally applicable than I’ve made it sound.Erythromycin
Another advantage of returning the iterator is that you are not forcing the caller to use the data structure you decide. result could be a set for example.Uraemia
S
12

There is a blog post by Boost.Asio author Chris Kohlhoff about this: http://blog.think-async.com/2009/08/secret-sauce-revealed.html

He simulates yield with a macro

#define yield \
  if ((_coro_value = __LINE__) == 0) \
  { \
    case __LINE__: ; \
    (void)&you_forgot_to_add_the_entry_label; \
  } \
  else \
    for (bool _coro_bool = false;; \
         _coro_bool = !_coro_bool) \
      if (_coro_bool) \
        goto bail_out_of_coroutine; \
      else

This has to be used in conjunction with a coroutine class. See the blog for more details.

Seagoing answered 10/8, 2012 at 9:24 Comment(0)
S
3

When you parse something recursively or when the processing has states, the generator pattern could be a good idea and simplify the code greatly—one cannot easily iterate then, and normally callbacks are the alternative. I want to have yield, and find that Boost.Coroutine2 seems good to use now.

The code below is an example to cat files. Of course it is meaningless, until the point when you want to process the text lines further:

#include <fstream>
#include <functional>
#include <iostream>
#include <string>
#include <boost/coroutine2/all.hpp>

using namespace std;

typedef boost::coroutines2::coroutine<const string&> coro_t;

void cat(coro_t::push_type& yield, int argc, char* argv[])
{
    for (int i = 1; i < argc; ++i) {
        ifstream ifs(argv[i]);
        for (;;) {
            string line;
            if (getline(ifs, line)) {
                yield(line);
            } else {
                break;
            }
        }
    }
}

int main(int argc, char* argv[])
{
    using namespace std::placeholders;
    coro_t::pull_type seq(
            boost::coroutines2::fixedsize_stack(),
            bind(cat, _1, argc, argv));
    for (auto& line : seq) {
        cout << line << endl;
    }
}
Samora answered 6/8, 2016 at 9:12 Comment(2)
Looks interesting! Unfortunately I'm still stuck with boost version 1.58 (Coroutine2 is available since v1.59). But I'm going to try that later. Thanks for sharing!Sparky
@Sparky You can try boost::coroutine first. It should be compatible with more compilers, but you cannot use the C++11 range-based for loop over the content. You have to write something like: while (seq) { cout << seq.get() << endl; seq(); }Samora
S
0

I found that a istream-like behavior would come close to what I had in mind. Consider the following (untested) code:

struct data_source {
public:
    // for delivering data items
    data_source& operator>>(input_data_t & i) {
        i = input_data.front(); 
        input_data.pop_front(); 
        return *this; 
    }
    // for boolean evaluation
    operator void*() { return input_data.empty() ? 0 : this; }

private:
    std::deque<input_data_t> input_data;

    // appends new data to private input_data
    // potentially asynchronously
    void get_data_from_database();
};

Now I can do as the following example shows:

int main () {
    data_source d;
    input_data_t i;
    while (d >> i) {
        // somehow process items
        result_data_t r(i);
        cout << r << endl;
    }
}

This way the data acquisition is somehow decoupled from the processing and is thereby allowed to happen lazy/asynchronously. That is, I could process the items as they arrive and I don't have to wait until the vector is filled completely as in the other example.

Sparky answered 15/9, 2012 at 10:54 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.