Third attempt. The second became very long in the process of explaining every nook and cranny of the situation. But hey, I learned a lot too in the process, which I suppose is the point, no? :) Anyway. I'll re-address the question anew, keeping my longer answer as it in itself is a useful reference but falls short of a 'clear explanation'.
What are we dealing with here?
f
and g
are not trivial situations. They take time to understand and appreciate the first few times you encounter them. The issues at play are the lifetime of objects, Return Value Optimization, confusion of returning object values, and confusion with overloads of reference types. I'll address each and explain their relevance.
References
First thing's first. What's a reference? Aren't they just pointers without the syntax?
They are, but in an important way they're much more than that. Pointers are literally that, they refer to memory locations in general. There are few if any guarantees about the values located at wherever the pointer is set to. References on the other hand are bound to addresses of real values - values that guarantee to exist for the duration they can be accessed, but may not have a name for them available to be accessed in any other way (such as temporaries).
As a rule of thumb, if you can 'take its address' then you're dealing with a reference, a rather special one known as an lvalue
. You can assign to an lvalue. This is why *pointer = 3
works, the operator *
creates a reference to the address being pointed to.
This doesn't make the reference any more or less valid than the address it points to, however, references you naturally find in C++ do have this guarantee (as would well-written C++ code) - that they are referring to real values in a way where we don't need to know about its lifetime for the duration of our interactions with them.
Lifetime of Objects
We all should know by now when the c'tors and d'tors will be called for something like this:
{
A temp;
temp.property = value;
}
temp
's scope is set. We know exactly when it's created and destroyed. One way we can be sure it's destroyed is because this is impossible:
A & ref_to_temp = temp; // nope
A * ptr_to_temp = &temp; // double nope
The compiler stops us from doing that because very clearly we should not expect that object to still exist. This can arise subtly whenever using references, which is why sometimes people can be found suggesting avoidance of references until you know what you're doing with them (or entirely if they've given up understanding them and just want to move on with their lives).
Scope of Expressions
On the other hand we also have to be mindful that temporaries exist until the outer-most expression they're found in has completed. That means up to the semicolon. An expression existing in the LHS of a comma operator, for example, doesn't get destroyed until the semicolon. Ie:
struct scopetester {
static int counter = 0;
scopetester(){++counter;}
~scopetester(){--counter;}
};
scopetester(), std::cout << scopetester::counter; // prints 1
scopetester(), scopetester(), std::cout << scopetester::counter; // prints 2
This still does not avoid issues of sequencing of execution, you still have to deal with ++i++
and other things - operator precedence and the dreaded undefined behavior that can result when forcing ambiguous cases (eg i++ = ++i
). What is important is that all temporaries created exist until the semicolon and no longer.
There are two exceptions - elision / in-place-construction (aka RVO) and reference-assignment-from-temporary.
Returning by value and Elision
What is elision? Why use RVO and similar things? All of these come down under a single term that's far easier to appreciate - "in-place construction". Suppose we were using the result of a function call to initialize or set an object. Eg:
A x (void) {return A();}
A y( x() );
Lets consider the longest possible sequence of events that could happen here.
- A new
A
is constructed in x
- The temporary value returned by
x()
is a new A
, initialized using a reference to the previous
- A new
A
- y
- is initialized using the temporary value
Where possible, the compiler should re-arrange things so that as few as possible intermediate A
's are constructed where it's safe to assume the intermediate is inaccessible or otherwise unnecessary. The question is which of the objects can we do without?
Case #1 is an explicit new object. If we are to avoid this being created, we need to have a reference to an object that already exists. This is the most straightforward one and nothing more needs to be said.
In #2 we cannot avoid constructing some result. After all, we are returning by value. However, there are two important exceptions (not including exceptions themselves which are also affected when thrown): NRVO and RVO. These affect what happens in #3, but there are important consequences and rules regarding #2...
This is due to an interesting quirk of elision:
Notes
Copy elision is the only allowed form of optimization that can change the observable side-effects. Because some compilers do not perform copy elision in every situation where it is allowed (e.g., in debug mode), programs that rely on the side-effects of copy/move constructors and destructors are not portable.
Even when copy elision takes place and the copy-/move-constructor is not called, it must be present and accessible (as if no optimization happened at all), otherwise the program is ill-formed.
(Since C++11)
In a return statement or a throw-expression, if the compiler cannot perform copy elision but the conditions for copy elision are met or would be met, except that the source is a function parameter, the compiler will attempt to use the move constructor even if the object is designated by an lvalue; see return statement for details.
And more on that in the return statement notes:
Notes
Returning by value may involve construction and copy/move of a temporary object, unless copy elision is used.
(Since C++11)
If expression
is an lvalue expression and the conditions for copy elision are met, or would be met, except that expression
names a function parameter, then overload resolution to select the constructor to use for initialization of the returned value is performed twice: first as if expression
were an rvalue expression (thus it may select the move constructor or a copy constructor taking reference to const), and if no suitable conversion is available, overload resolution is performed the second time, with lvalue expression (so it may select the copy constructor taking a reference to non-const).
The above rule applies even if the function return type is different from the type of expression
(copy elision requires same type)
The compiler is allowed to even chain together multiple elisions. All it means is that two sides of a move / copy that would involve an intermediate object, could potentially be made to refer directly to each-other or even be made to be the same object. We don't know and shouldn't need to know when the compiler chooses to do this - it's an optimization, for one, but importantly you should think of move and copy constructors et al as a "last resort" usage.
We can agree the goal is to reduce the number of unnecessary operations in any optimization, provided the observable behavior is the same. Move and copy constructors are used wherever moves and copy operations happen, so what about when the compiler sees fit to remove a move/copy operation itself as an optimization? Should the functionally unnecessary intermediate objects exist in the final program just for the purposes of their side effects? The way the standard is right now, and compilers, seems to be: no - the move and copy constructors satisfy the how of those operations, not the when or why.
The short version: You have less temporary objects, that you ought to not care about to begin with, so why should you miss them. If you do miss them it may just be that your code relies on intermediate copies and moves to do things beyond their stated purpose and contexts.
Lastly, you need to be aware that the elided object is always stored (and constructed) in the receiving location, not the location of its inception.
Quoting this reference -
Named Return Value Optimization
If a function returns a class type by value, and the return statement's expression is the name of a non-volatile object with automatic storage duration, which isn't the function parameter, or a catch clause parameter, and which has the same type (ignoring top-level cv-qualification) as the return type of the function, then copy/move is omitted. When that local object is constructed, it is constructed directly in the storage where the function's return value would otherwise be moved or copied to. This variant of copy elision is known as NRVO, "named return value optimization".
Return Value Optimization
When a nameless temporary, not bound to any references, would be moved or copied into an object of the same type (ignoring top-level cv-qualification), the copy/move is omitted. When that temporary is constructed, it is constructed directly in the storage where it would otherwise be moved or copied to. When the nameless temporary is the argument of a return statement, this variant of copy elision is known as RVO, "return value optimization".
Lifetime of References
One thing we should not do, is this:
A & func() {
A result;
return result;
}
While tempting because it would avoid implicit copying of anything (we're just passing an address right?) it's also a short-sighted approach. Remember the compiler above preventing something looking like this with temp
? Same thing here - result
is gone once we're done with func
, it could be reclaimed and could be anything now.
The reason we cannot is because we cannot pass an address to result
out of func
- whether as reference or as pointer - and consider it valid memory. We would get no further passing A*
out.
In this situation it is best to use an object-copy return type and rely on moves, elision or both to occur as the compiler finds suitable. Always think of copy and move constructors as 'measures of last resort' - you should not rely on the compiler to use them because the compiler can find ways to avoid copy and move operations entirely, and is allowed to do so even if it means the side effects of those constructors wouldn't happen any more.
There is however a special case, alluded to earlier.
Recall that references are guarantees to real values. This implies that the first occurrence of the reference initializes the object and the last (as far as known at compile time) destroys it when going out of scope.
Broadly this covers two situations: when we return a temporary from a function. and when we assign from a function result. The first, returning a temporary, is basically what elision does but you can in effect elide explicitly with reference passing - like passing a pointer in a call chain. It constructs the object at the time of return, but what changes is the object is no longer destroyed after leaving scope (the return statement). And on the other end the second kind happens - the variable storing the result of the function call now has the honor of destroying the value when it goes out of scope.
The important point here is that elision and reference passing are related concepts. You can emulate elision by using pointers to uninitialized variables' storage location (of known type), for example, as you can with reference passing semantics (basically what they're for).
Overloads of Reference Types
References allow us to treat non-local variables as if they are local variables - to take their address, write to that address, read from that address, and importantly, be able to destroy the object at the right time - when the address can no longer be reached by anything.
Regular variables when they leave scope, have their only reference to them disappear, and are promptly destroyed at that time. Reference variables can refer to regular variables, but except for elision / RVO circumstances they do not affect the scope of the original object - not even if the object they referred to goes out of scope early, which can happen if you make references to dynamic memory and are not careful to manage those references yourself.
This means you can capture the results of an expression explicitly by reference. How? Well, this may seem odd at first but if you read the above it will make sense why this works:
class A {
/* assume rule-of-5 (inc const-overloads) has been followed but unless
* otherwise noted the members are private */
public:
A (void) { /* ... */ }
A operator+ ( const A & rhs ) {
A res;
// do something with `res`
return res;
}
};
A x = A() + A(); // doesn't compile
A & y = A() + A(); // doesn't compile
A && z = A() + A(); // compiles
Why? What's going on?
A x = ...
- we can't because constructors and assignment is private.
A & y = ...
- we can't because we're returning a value, not a reference to a value who's scope is greater or equal to our current scope.
A && z = ...
- we can because we're able to refer to xvalues. As consequence of this assignment the lifetime of the temporary value is extended to this capturing lvalue because it in effect has become an lvalue reference. Sound familiar? It's explicit elision if I were to call it anything. This is more apparent when you consider this syntax must involve a new value and must involve assigning that value to a reference.
In all three cases when all constructors and assignment is made public, there is always only three objects constructed, with the address of res
always matching the variable storing the result. (on my compiler anyway, optimizations disabled, -std=gnu++11, g++ 4.9.3).
Which means the differences really do come down to just the storage duration of function arguments themselves. Elision and move operations cannot happen on anything but pure expressions, expiring values, or explicit targeting of the "expiring values" reference overload Type&&
.
Re-examining f
and g
I've annotated the situation in both functions to get things rolling, a shortlist of assumptions the compiler would note when generating (reusable) code for each.
A f( A && a ) {
// has storage duration exceeding f's scope.
// already constructed.
return a;
// can be elided.
// must be copy-constructed, a exceeds f's scope.
}
A g( A a ) {
// has storage duration limited to this function's scope.
// was just constructed somehow, whether by elision, move or copy.
return a;
// elision may occur.
// can move-construct if can't elide.
// can copy-construct if can't move.
}
What we can say for sure about f
's a
is that it's expecting to capture moved or expression-type values. Because f
can accept either expression-references (prvalues) or lvalue-references about to disappear (xvalues) or moved lvalue-references (converted to xvalues via std::move
), and because f
must be homogenous in the treatment of a
for all three cases, a
is seen as a reference first and foremost to an area of memory who's lifetime exists for longer than a call to f
. That is, it is not possible to distinguish which of the three cases we called f
with from within f
, so the compiler assumes the longest storage duration it needs for any of the cases, and finds it safest not to assume anything about the storage duration of a
's data.
Unlike the situation in g
. Here, a
- however it happens upon its value - will cease to be accessible beyond a call to g
. As such returning it is tantamount to moving it, since it's seen as an xvalue in that case. We could still copy it or more probably even elide it, it can depend on which is allowed / defined for A
at the time.
The issues with f
// we can't tell these apart.
// `f` when compiled cannot assume either will always happen.
// case-by-case optimizations can only happen if `f` is
// inlined into the final code and then re-arranged, or if `f`
// is made into a template to specifically behave differently
// against differing types.
A case_1() {
// prvalues
return f( A() + A() );
}
A make_case_2() {
// xvalues
A temp;
return temp;
}
A case_2 = f( make_case_2() )
A case_3(A & other) {
// lvalues
return f( std::move( other ) );
}
Because of the ambiguity of usage the compiler and standards are designed to make f
usable consistently in all cases. There can be no assumptions that A&&
will always be a new expression or that you will only use it with std::move
for its argument etc. Once f
is made external to your code, leaving only its call signature, that cannot be the excuse anymore. The function signature - which reference overload to target - is a clue to what the function should be doing with it and how much (or little) it can assume about the context.
rvalue references are not a panacea for targeting only "moved values", they can target a good deal more things and even be targeted incorrectly or unexpectedly if you assume that's all they do. A reference to anything in general should be expected to and be made to exist for longer than the reference does, with the one exception being rvalue reference variables.
rvalue reference variables are in essence, elision operators. Wherever they exist there is in-place construction going on of some description.
As regular variables, they extend the scope of any xvalue or rvalue they receive - they hold the result of the expression as it's constructed rather than by move or copy, and from thereon are equivalent to regular reference variables in usage.
As function variables they can also elide and construct objects in-place, but there is a very important difference between this:
A c = f( A() );
and this:
A && r = f( A() );
The difference is there is no guarantee that c
will be move-constructed vs elided, but r
definitely will be elided / constructed in-place at some point, owing to the nature of what we're binding to. For this reason we can only assign to r
in situations where there will be a new temporary value created.
But why is A&&a
not destroyed if it is captured?
Consider this:
void bad_free(A && a) {
A && clever = std::move( a );
// 'clever' should be the last reference to a?
}
This won't work. The reason is subtle. a
's scope is longer, and rvalue reference assignments can only extend the lifetime, not control it. clever
exists for less time than a
, and therefore is not an xvalue itself (unless using std::move
again, but then you're back to the same situation, and it continues forth etc).
lifetime extension
Remember that what makes lvalues different to rvalues is that they cannot be bound to objects that have less lifetime than themselves. All lvalue references are either the original variable or a reference that has less lifetime than the original.
rvalues allow binding to reference variables that have longer lifetime than the original value - that's half the point. Consider:
A r = f( A() ); // v1
A && s = f( A() ); // v2
What happens? In both cases f
is given a temporary value that outlives the call, and a result object (because f
returns by value) is constructed somehow (it will not matter as you shall see). In v1 we are constructing a new object r
using the temporary result - we can do this in three ways: move, copy, elide. In v2 we are not constructing a new object, we are extending the lifetime of the result of f
to the scope of s
, alternatively saying the same: s
is constructed in-place using f
and therefore the temporary returned by f
has its lifetime extended rather than being moved or copied.
The main distinction is v1 requires move and copy constructors (at least one) to be defined even if the process is elided. For v2 you are not invoking constructors and are explicitly saying you want to reference and/or extend the lifetime of a temporary value, and because you don't invoke move or copy constructors the compiler can only elide / construct in-place!
Remember that this has nothing to do with the argument given to f
. It works identically with g
:
A r = g( A() ); // v1
A && s = g( A() ); // v2
g
will create a temporary for its argument and move-construct it using A()
for both cases. It like f
also constructs a temporary for its return value, but it can use an xvalue because the result is constructed using a temporary (temporary to g
). Again, this will not matter because in v1 we have a new object that could be copy-constructed or move-constructed (either is required but not both) while in v2 we are demanding reference to something that's constructed but will disappear if we don't catch it.
Explicit xvalue capture
Example to show this is possible in theory (but useless):
A && x (void) {
A temp;
// return temp; // even though xvalue, can't do this
return std::move(temp);
}
A && y = x(); // y now refers to temp, which is destroyed
Which object does y
refer to? We have left the compiler no choice: y
must refer to the result of some function or expression, and we've given it temp
which works based on type. But no move has occurred, and temp
will be deallocated by the time we use it via y
.
Why didn't lifetime extension kick in for temp
like it did for a
in g
/ f
? Because of what we're returning: we can't specify a function to construct things in-place, we can specify a variable to be constructed in place. It also goes to show that the compiler does not look across function / call boundaries to determine lifetime, it will just look at which variables are on the calling side or local, how they're assigned to and how they're initialized if local.
If you want to clear all doubts, try passing this as an rvalue reference: std::move(*(new A))
- what should happen is that nothing should ever destroy it, because it isn't on the stack and because rvalue references do not alter the lifetime of anything but temporary objects (ie, intermediates / expressions). xvalues are candidates for move construction / move assignment and can't be elided (already constructed) but all other move / copy operations can in theory be elided on the whim of the compiler; when using rvalue references the compiler has no choice but to elide or pass on the address.
body
..." - I would think in the case ofg
it is return value optimization that's allowing it to work. Consider how calls tof
would look with respect toa
- there is nothing to say afterf
returns thata
would continue to exist. – Changelessf
(org
). Also, from what I understand, return value optimization is somewhat orthogonal to whether we perform a move vs. a copy. – LentissimoA && z = ...
but I can't think of a valid reason to do this. The&&
tellsf
that it should only accept objects that can no longer be referred to, otherwise assume it is exactly likeA f(A & a)
with the same consequences (I tend to anyway). Then it starts to make sense whyf
is likely to copy whileg
wont -g
creates its own temp object whilef
is copying the "last" reference to it. – Changeless