Perfect forwarding with multiple passes over input arguments
Asked Answered
I

2

5

Consider the following function accept that takes a "universal reference" of type T and forwards that to a parse<T>() function object with an overload for lvalues and one for rvalues:

template<class T>
void accept(T&& arg)
{
    parse<T>()(std::forward<T>(arg), 0); // copy or move, depending on rvaluedness of arg
}

template<class T>
class parse
{
    // parse will modify a local copy or move of its input parameter
    void operator()(T const& arg, int n) const { /* optimized for lvalues */ }
    void operator()(T&& arg)    , int n) const { /* optimized for rvalues */ }
};

Since perfect forwarding leaves the source object in a valid but undefined state, it is impossible to perfectly forward again within the same scope. Below my attempt to have as few copies as possible in a hypothetical split() function that takes an int representing the number passes that have to be made over the input data:

template<class T>
void split(T&& arg, int n)
{
    for (auto i = 0; i < n - 1; ++i)
        parse<T>()(arg , i);                 // copy n-1 times
    parse<T>()(std::forward<T>(arg), n - 1); // possibly move the n-th time
}

Question: is this the recommended way to apply perfect forwarding for multiple passes over the same data? If not, what is a more idiomatic way to minimize the number of copies?

Indulgence answered 25/11, 2013 at 14:44 Comment(12)
What do you need the instance of T for, and does parse consume it? Can you for example move() the instance back on return from parse()? (my point being, if parse() consumes it, then you can only forward to it once, else, forward and move back, or better yet, always deal with the const reference...)Stereophonic
@Stereophonic no, parse() will modify arg and then discard the result, maybe writing some output or return an integer value, but not arg itself.Indulgence
mmm, interesting, you modify arg which is passed via const reference?Stereophonic
The question is misleading, you don't want perfect forwarding multiple timesCoca
@Stereophonic no, it will make a local copy of course, and modify the copy. With the T&& overload, I can modify on std::move(arg) rather than on a local copy.Indulgence
@DavidRodríguez-dribeas sorry, modified to better cover the topic.Indulgence
@TemplateRex, then what you have is the best you are going to get; on the last call, a copy is elided because of the forward - and you can only elide this copy...Stereophonic
Difficult to make any further suggestions unless we get a feel for what T is, you could for example extract out the state you modify and see if you can make copying of this cheap, but it's hard to say...Stereophonic
@Stereophonic for my current use cases, T is a std::array<uint64_t, 2>, so no optimized move constructor, but the parse() function is generic so I want to be able to pass say a std::vector<uint64_t> as well.Indulgence
There is nothing you can do but copy. You're trying to maky many values out of a single value. What other techniques are there for that?Cover
@Cover I'm optimizing a piece of working code and haven't actually got much experience with perfect forwarding, so I wanted to make sure (as STL put it: "Write code only when you know how it'll behave [...] especially true for rvalue references. If you don't know, ask an expert.")Indulgence
If you pass a non-const lvalue reference to split(), then T will deduce to an lvalue reference type and the compiler will refuse to overload both operator()(T const&) and operator()(T&) since T&& collapses to & when T is an lvalue reference (Example at Coliru). I think you want parse<typename std::decay<T>::type> instead of parse<T>.Stitch
B
9

Question: is this the recommended way to apply perfect forwarding for multiple passes over the same data?

Yes, this is the recommended way to apply perfect forwarding (or move) when you need to pass the data multiple times. Only (potentially) move from it on your last access. Indeed, this scenario was foreseen in the original move paper, and is the very reason that "named" variables declared with type rvalue-reference are not implicitly moved from. From N1377:

Even though named rvalue references can bind to an rvalue, they are treated as lvalues when used. For example:

struct A {};

void h(const A&);
void h(A&&);

void g(const A&);
void g(A&&);

void f(A&& a)
{
    g(a);  // calls g(const A&)
    h(a);  // calls h(const A&)
}

Although an rvalue can bind to the "a" parameter of f(), once bound, a is now treated as an lvalue. In particular, calls to the overloaded functions g() and h() resolve to the const A& (lvalue) overloads. Treating "a" as an rvalue within f would lead to error prone code: First the "move version" of g() would be called, which would likely pilfer "a", and then the pilfered "a" would be sent to the move overload of h().

If you want h(a) to move in the above example, you have to do so explicitly:

    h(std::move(a));  // calls h(A&&);

As Casey points out in the comments, you have an overloading problem when passing in lvalues:

#include  <utility>
#include  <type_traits>

template<class T>
class parse
{
    static_assert(!std::is_lvalue_reference<T>::value,
                               "parse: T can not be an lvalue-reference type");
public:
    // parse will modify a local copy or move of its input parameter
    void operator()(T const& arg, int n) const { /* optimized for lvalues */ }
    void operator()(T&& arg     , int n) const { /* optimized for rvalues */ }
};

template<class T>
void split(T&& arg, int n)
{
    typedef typename std::decay<T>::type Td;
    for (auto i = 0; i < n - 1; ++i)
        parse<Td>()(arg , i);                 // copy n-1 times
    parse<Td>()(std::forward<T>(arg), n - 1); // possibly move the n-th time
}

Above I've fixed it as Casey suggests, by instantiating parse<T> only on non-reference types using std::decay. I've also added a static_assert to ensure that the client does not accidentally make this mistake. The static_assert isn't strictly necessary because you will get a compile-time error regardless. However the static_assert can offer a more readable error message.

That is not the only way to fix the problem though. Another way, which would allow the client to instantiate parse with an lvalue reference type, is to partially specialize parse:

template<class T>
class parse<T&>
{
public:
    // parse will modify a local copy or move of its input parameter
    void operator()(T const& arg, int n) const { /* optimized for lvalues */ }
};

Now the client doesn't need to do the decay dance:

template<class T>
void split(T&& arg, int n)
{
    for (auto i = 0; i < n - 1; ++i)
        parse<T>()(arg , i);                 // copy n-1 times
    parse<T>()(std::forward<T>(arg), n - 1); // possibly move the n-th time
}

And you can apply special logic under parse<T&> if necessary.

Bianchi answered 25/11, 2013 at 17:52 Comment(3)
+1 and accepted, of course. The 2002 paper is really clear, tnx for linking to that. Just to make sure: this solution will not work for passing the same parameter twice to the same forwarding function because of the undefined order of evaluation? I.e. foo(arg, std::forward<T>(arg)); does not have a deterministic last evaluated argument, right? So there one would always incur an extra copy.Indulgence
@TemplateRex: If you are passing to something that is accepting by reference (lvalue or rvalue, as in your example), and if you know what order the passed-to function will move its arguments, then you can move/forward the last one that the passed-to function will move. The forward/move doesn't actually do anything but cast to rvalue. Things only get moved (if passed by reference) when the passed-to function moves the argument. If you are passing to something that is accepting by value, then you do have a problem with the unspecified order of evaluation of argument binding.Bianchi
OK, got it, very illuminating.Indulgence
S
0

(I know, it is an old thread)

As stated in the comments, the data is a large array or vector of uint64_t. A better optimization than parameter passing to prevent a final copy would probably be to optimize the many copy operations to

  • read once
  • write many times (for each intended pass)

in one step instead of many independent copies.

A starting point could be this faster alternative to memcpy? which has answers that include memcpy-like code. You would have to multiply the code line that writes to the destination to write several copies of the data instead.

You can also combine memset, which is optimized for writing the same value to memory over and over again, and memcpy, which is optimized for reading and writing blocks of memory once for each block. You could look into optimized source code here: https://github.com/KNNSpeed/AVX-Memmove

The best code will be specific to the architecture and processor used. So you would have to test and compare your achieved speed.

Sexpartite answered 3/6, 2020 at 6:11 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.