C++0x rvalue references - lvalues-rvalue binding

Asked 1/5, 2010 at 8:9 Answered 1/5, 2010 at 11:2

This is a follow-on question to C++0x rvalue references and temporaries

In the previous question, I asked how this code should work:

void f(const std::string &); //less efficient
void f(std::string &&); //more efficient

void g(const char * arg)
{
    f(arg);
}

It seems that the move overload should probably be called because of the implicit temporary, and this happens in GCC but not MSVC (or the EDG front-end used in MSVC's Intellisense).

What about this code?

void f(std::string &&); //NB: No const string & overload supplied

void g1(const char * arg)
{
     f(arg);
}
void g2(const std::string & arg)
{
    f(arg);
}

It seems that, based on the answers to my previous question that function g1 is legal (and is accepted by GCC 4.3-4.5, but not by MSVC). However, GCC and MSVC both reject g2 because of clause 13.3.3.1.4/3, which prohibits lvalues from binding to rvalue ref arguments. I understand the rationale behind this - it is explained in N2831 "Fixing a safety problem with rvalue references". I also think that GCC is probably implementing this clause as intended by the authors of that paper, because the original patch to GCC was written by one of the authors (Doug Gregor).

However, I don't this is quite intuitive. To me, (a) a const string & is conceptually closer to a string && than a const char *, and (b) the compiler could create a temporary string in g2, as if it were written like this:

void g2(const std::string & arg)
{
    f(std::string(arg));
}

Indeed, sometimes the copy constructor is considered to be an implicit conversion operator. Syntactically, this is suggested by the form of a copy constructor, and the standard even mentions this specifically in clause 13.3.3.1.2/4, where the copy constructor for derived-base conversions is given a higher conversion rank than other user-defined conversions:

A conversion of an expression of class type to the same class type is given Exact Match rank, and a conversion of an expression of class type to a base class of that type is given Conversion rank, in spite of the fact that a copy/move constructor (i.e., a user-defined conversion function) is called for those cases.

(I assume this is used when passing a derived class to a function like void h(Base), which takes a base class by value.)

Motivation

My motivation for asking this is something like the question asked in How to reduce redundant code when adding new c++0x rvalue reference operator overloads ("How to reduce redundant code when adding new c++0x rvalue reference operator overloads").

If you have a function that accepts a number of potentially-moveable arguments, and would move them if it can (e.g. a factory function/constructor: Object create_object(string, vector<string>, string) or the like), and want to move or copy each argument as appropriate, you quickly start writing a lot of code.

If the argument types are movable, then one could just write one version that accepts the arguments by value, as above. But if the arguments are (legacy) non-movable-but-swappable classes a la C++03, and you can't change them, then writing rvalue reference overloads is more efficient.

So if lvalues did bind to rvalues via an implicit copy, then you could write just one overload like create_object(legacy_string &&, legacy_vector<legacy_string> &&, legacy_string &&) and it would more or less work like providing all the combinations of rvalue/lvalue reference overloads - actual arguments that were lvalues would get copied and then bound to the arguments, actual arguments that were rvalues would get directly bound.

Clarification/edit: I realize this is virtually identical to accepting arguments by value for movable types, like C++0x std::string and std::vector (save for the number of times the move constructor is conceptually invoked). However, it is not identical for copyable, but non-movable types, which includes all C++03 classes with explicitly-defined copy constructors. Consider this example:

class legacy_string { legacy_string(const legacy_string &); }; //defined in a header somewhere; not modifiable.

void f(legacy_string s1, legacy_string s2); //A *new* (C++0x) function that wants to move from its arguments where possible, and avoid copying
void g() //A C++0x function as well
{
    legacy_string x(/*initialization*/);
    legacy_string y(/*initialization*/);

    f(std::move(x), std::move(y));
}

If g calls f, then x and y would be copied - I don't see how the compiler can move them. If f were instead declared as taking legacy_string && arguments, it could avoid those copies where the caller explicitly invoked std::move on the arguments. I don't see how these are equivalent.

Questions

My questions are then:

Is this a valid interpretation of the standard? It seems that it's not the conventional or intended one, at any rate.
Does it make intuitive sense?
Is there a problem with this idea that I"m not seeing? It seems like you could get copies being quietly created when that's not exactly expected, but that's the status quo in places in C++03 anyway. Also, it would make some overloads viable when they're currently not, but I don't see it being a problem in practice.
Is this a significant enough improvement that it would be worth making e.g. an experimental patch for GCC?

Egocentric answered 1/5, 2010 at 8:9 Comment(1)

Actually, according to the current rules (N3126), in your first example the function g invokes f(const string&) because the initializer is an lvalue. I think this is an unfortunate accident and I hope that this will changed so that the other overload is picked. – Doubleedged 9/10, 2010 at 18:19

I don't quite see your point in this question. If you have a class that is movable, then you just need a T version:

struct A {
  T t;
  A(T t):t(move(t)) { }
};

And if the class is traditional but has an efficient swap you can write the swap version or you can fallback to the const T& way

struct A {
  T t;
  A(T t) { swap(this->t, t); }
};

Regarding the swap version, I would rather go with the const T& way instead of that swap. The main advantage of the swap technique is exception safety and is to move the copy closer to the caller so that it can optimize away copies of temporaries. But what do you have to save if you are just constructing the object anyway? And if the constructor is small, the compiler can look into it and can optimize away copies too.

struct A {
  T t;
  A(T const& t):t(t) { }
};

To me, it doesn't seem right to automatically convert a string lvalue to a rvalue copy of itself just to bind to a rvalue reference. An rvalue reference says it binds to rvalue. But if you try binding to an lvalue of the same type it better fails. Introducing hidden copies to allow that doesn't sound right to me, because when people see a X&& and you pass a X lvalue, I bet most will expect that there is no copy, and that binding is directly, if it works at all. Better fail out straight away so the user can fix his/her code.

Polyhedron answered 1/5, 2010 at 10:54 Comment(8)

Say the class (T) is traditional and has a swap, but no move. If you have a function which accepts a T&&, then you can still write A(std::move(t)), where t is an unwanted instance of T, and it will not do an unnecessary copy, like in the pass-by-value case. – Egocentric 1/5, 2010 at 11:5

Also, I see what you mean about references being aliases - but C++ already introduces hidden copies when binding to const lvalue ref arguments. I suppose I see a non-const rvalue reference argument as being similar to const lvalue reference arguments, which exhibit this "hidden copy" behavior, rather than non-const lvalue reference arguments. – Egocentric 1/5, 2010 at 11:20

The hidden copy when binding to const lvalue reference is not done anymore. I.e istream const& is = istringstream("foo"); is well-formed in C++0x. And the hidden copy for string const& s = "foo"; is essential for all the operator overloading to work. There is no similar pressing need to support string &&s = "foo";.If that would introduce a temporary string, it would go agains the principle of least surprise to me. – Polyhedron 1/5, 2010 at 11:39

Firstly, thank you very much for your answers, they're really helpful. Secondly, I gave an example which violates least surprise for me on your answer to my previous question: that is, if vector<T>::push_back were declared with rvalue- and const lvalue-reference overloads (though I note this is no longer required), calling vector<string>::push_back with a const char * lvalue makes a conceptually-superfluous copy, because it would bind to the version accepting a const lvalue reference. (Maybe this simply means that push_back should accept its argument by value.) – Egocentric 1/5, 2010 at 11:58

@Egocentric i agree with you that ideally, push_back would take by value. It's not done for vector<T>, because that would do two copies for all the non-movable lvalues. But for vector<T> the solution is to use emplace_back which does zero copies and zero moves. – Polyhedron 1/5, 2010 at 12:33

Thanks - I'm rapidly coming to the conclusion that rvalue ref args are a lot less useful than I'd hoped. I hoped that they were like an "optimized" const lvalue ref argument where the caller gives permission for the callee to steal/move resources from the argument. That's apparently not what happens, though, but I can't really believe that this behavior was completely intentional. – Egocentric 1/5, 2010 at 14:0

this answer needs updating to reflect that this "issue" has been removed from the standard draft – Doubleedged 27/8, 2011 at 17:20

@Doubleedged thanks I removed the reference to the other outdated answer. – Polyhedron 27/8, 2011 at 18:16

What about this code?

void f(std::string &&); //NB: No const string & overload supplied

void g2(const std::string & arg)
{
    f(arg);
}

...However, GCC and MSVC both reject g2 because of clause 13.3.3.1.4/3, which prohibits lvalues from binding to rvalue ref arguments. I understand the rationale behind this - it is explained in N2831 "Fixing a safety problem with rvalue references". I also think that GCC is probably implementing this clause as intended by the authors of that paper, because the original patch to GCC was written by one of the authors (Doug Gregor)....

No, that's only half of the reason why both compilers reject your code. The other reason is that you can't initialize a reference to non-const with an expression referring to a const object. So, even before N2831 this didn't work. There is simply no need for a conversion because a string is a already a string. It seems you want to use string&& like string. Then, simply write your function f so that it takes a string by value. If you want the compiler to create a temporary copy of a const string lvalue just so you can invoke a function taking a string&&, there wouldn't be a difference between taking the string by value or by rref, would it?

N2831 has little to do with this scenario.

If you have a function that accepts a number of potentially-moveable arguments, and would move them if it can (e.g. a factory function/constructor: Object create_object(string, vector, string) or the like), and want to move or copy each argument as appropriate, you quickly start writing a lot of code.

Not really. Why would you want to write a lot of code? There is little reason to clutter all your code with const&/&& overloads. You can still use a single function with a mix of pass-by-value and pass-by-ref-to-const -- depending on what you want to do with the parameters. As for factories, the idea is to use perfect forwarding:

template<class T, class... Args>
unique_ptr<T> make_unique(Args&&... args)
{
    T* ptr = new T(std::forward<Args>(args)...);
    return unique_ptr<T>(ptr);
}

...and all is well. A special template argument deduction rule helps differentiating between lvalue and rvalue arguments and std::forward allows you to create expressions with the same "value-ness" as the actual arguments had. So, if you write something like this:

string foo();

int main() {
   auto ups = make_unique<string>(foo());
}

the string that foo returned is automatically moved to the heap.

So if lvalues did bind to rvalues via an implicit copy, then you could write just one overload like create_object(legacy_string &&, legacy_vector &&, legacy_string &&) and it would more or less work like providing all the combinations of rvalue/lvalue reference overloads...

Well, and it would be pretty much equivalent to a function taking the parameters by value. No kidding.

Is this a significant enough improvement that it would be worth making e.g. an experimental patch for GCC?

There's no improvement.

Doubleedged answered 1/5, 2010 at 11:2 Comment(6)

Ultimately, though, the factory function just forwards to e.g. a constructor, which still faces the same problem - to write lots of overloads, or to accept unnecessary copies. The constructor itself could be written as a template, but IMO changing every constructor that might want to do this into a template has other problems (including overloading ambiguities). Also, I agree that this would make rvalue refs similar to passing by value, except in the case where the type is a legacy type with no move constructor, and where you can't change that. – Egocentric 1/5, 2010 at 11:9

@Egocentric What "problem"? I have yet to see a real example. – Doubleedged 1/5, 2010 at 12:59

My problem is writing a constructor of an object that takes e.g. 3 strings, that are copyable, not movable, but swappable. The constructor will assign args to member vars. The string class is written by a third party that isn't updating their libraries to C++0x - think Qt 3.x. I want to be able to write this constructor in a way that takes advantage of the rvalue-ness of its arguments to minimize copies (by swapping them in) where feasible. I don't want to make the constructor a template, and I don't want to write 8 overloads for what should be simple code. That is not currently possible. – Egocentric 1/5, 2010 at 13:19

@Egocentric Interesting. I think I had a similar train of thought while learning what rvalue references are about. But I came to the conclusion that in those cases pass-by-value should suffice. I just didn't anticipate the existence of "legacy types" with optimized swap but no move constructors. Actually, an older GCC version treated pass-by-value arguments like you want rrefs to behave. But someone filed a bug report ( gcc.gnu.org/bugzilla/show_bug.cgi?id=36744 ) and this "feature" has been removed. – Doubleedged 1/5, 2010 at 15:19

@Egocentric Also, good compilers do copy elision. Then, the only difference between pass-by-value and your proposed approach with rrefs is that with your approach you can avoid an unnecessary copy when the argument was an "xvalue" (see N3055). In all other cases, there won't be a difference due to copy elision. – Doubleedged 1/5, 2010 at 15:24

@selibitze: That's an interesting bug. I see that bug as essentially, GCC trying to apply copy elision on a std::move'd, but non-movable argument, when it shouldn't. If f were instead declared with an rvalue ref argument, and the caller specifically moved y into f, then yes, I think f should modify y. In that case, the caller/callee have essentially agreed that the caller doesn't want y any more and so the callee can do what it pleases with it. I would not expect this when passing by value (which is what happens in the bug), though, for compatibility reasons. – Egocentric 2/5, 2010 at 1:20