RVO force compilation error on failure
Asked Answered
O

3

9

Lots of discussions here about when RVO can be done but not much about when it is actually done. As stated may times, RVO can not be guaranteed according to the Standard but is there a way to guarantee that either RVO optimization succeeds or the corresponding code fails to compile?

So far I partially succeeded to make the code issue link errors when RVO fails. For this I declare the copy constructors without defining them. Obviously this is neither robust nor feasible in the non rare cases where I need to implement one or both copy constructors, i.e. x(x&&) and x(x const&).

This brings me to my second question: Why have the compiler writers chosen to enable RVO when user defined copy constructors are in place but not when only default copy constructors are present?

Third question: Is there some other way to enable RVO for plain data structures?

Last question (promise): Do you know any compiler that makes my test code behave other then I observed with gcc and clang?

Here is some example code for gcc 4.6, gcc 4.8 and clang 3.3 that shows the problem. The behavior does not depend on general optimization or debug settings. Of course option --no-elide-constructors does what it says, i.e. turns RVO off.

#include <iostream>
using namespace std;

struct x
{
    x () { cout << "original x address" << this << endl; }
};
x make_x ()
{
    return x();
}

struct y
{
    y () { cout << "original y address" << this << endl; }
    // Any of the next two constructors will enable RVO even if only
    // declared but not defined. Default constructors will not do!
    y(y const & rhs);
    y(y && rhs);
};
y make_y ()
{
    return y();
}

int main ()
{
    auto x1 = make_x();
    cout << "copy of  x address" << &x1 << endl;
    auto y1 = make_y();
    cout << "copy of  y address" << &y1 << endl;
}

Output:

original x address0x7fff8ef01dff
copy of  x address0x7fff8ef01e2e
original y address0x7fff8ef01e2f
copy of  y address0x7fff8ef01e2f

RVO seems also not to work with plain data structures:

#include <iostream>

using namespace std;

struct x
{
    int a;
};

x make_x ()
{
    x tmp;
    cout << "original x address" << &tmp << endl;
    return tmp;
}

int main ()
{
    auto x1 = make_x();
    cout << "copy of  x address" << &x1 << endl;
}

Output:

original x address0x7fffe7bb2320
copy of  x address0x7fffe7bb2350

UPDATE: Note that some optimizations are very easily confused with RVO. Constructor helpers like make_x are an example. See this example where the optimization is actually enforced by the standard.

Overdress answered 9/10, 2013 at 2:27 Comment(4)
I think make_x doesn't use RVO because x is so small it's more efficient to just copy it. If you make x contain a larger array it does RVO on my machine.Monorail
Now that C++ has move semantics, why would you even care about enforcing RVO?Equiponderate
@DanielKO, in those cases where RVO/NRVO is more performant than moving the object, which is often, the compiler will use RVO instead of moving.Surfboat
@DavidBrown, you are kind of correct. I have updated my post and added a reference to another post which I believe you make like.Overdress
B
5

The problem is that the compiler is doing too much optimizations :)

First of all, I disabled the inlining of make_x() otherwise we cannot distinguish between RVO and inlining. However, I did put the rest into an anonymous namespace so that external linkage is not interfering with any other compiler optimizations. (As evidence shows, external linkage can prevent inlining for example, and who knows what else...) I rewrote the input-output, now it uses printf(); otherwise the generated assembly code would be cluttered due to all the iostream stuff. So the code:

#include <cstdio>
using namespace std;

namespace {

struct x {
    //int dummy[1024];
    x() { printf("original x address %p\n", this); }
};

__attribute__((noinline)) x make_x() {
    return x();
}

} // namespace

int main() {
    auto x1 = make_x();
    printf("copy  of x address %p\n", &x1);
}

I analyzed the generated assembly code with a colleague of mine as my understanding of the gcc generated assembly is very limited. Later today, I used clang with the -S -emit-llvm flags to generate LLVM assembly which I personally find much nicer and easier to read than the X86 Assembly/GAS Syntax. It didn't matter which compiler was used, the conclusions are the same.

I rewrote the generated assembly in C++, it roughly looks like this if x is empty:

#include <cstdio>
using namespace std;

struct x { };

void make_x() {
    x tmp;
    printf("original x address %p\n", &tmp);
}

int main() {
    x x1;
    make_x();
    printf("copy  of x address %p\n", &x1);
}

If x is big (the int dummy[1024]; member uncommented):

#include <cstdio>
using namespace std;

struct x { int dummy[1024]; };

void make_x(x* x1) {

    printf("original x address %p\n", x1);
}

int main() {
    x x1;
    make_x(&x1);
    printf("copy  of x address %p\n", &x1);
}

It turns out that make_x() only has to print some valid, unique address if the object is empty. make_x() has the liberty to print some valid address pointing to its own stack if the object is empty. There is also nothing to be copied, there is nothing to return from make_x().

If you make the object bigger (add the int dummy[1024]; member for example), it gets constructed in place so RVO does kick in, and only the objects' address is passed to make_x() to be printed. No object gets copied, nothing gets moved.

If the object is empty, the compiler can decide not to pass an address to make_x() (What a waste of resources would that be? :) ) but let make_x() make up a unique, valid address from its own stack. When this optimization happens is somewhat fuzzy and hard to reason about (that is what you see with y) but it really doesn't matter.

RVO looks like to happen consistently in those cases where it matters. And, as my earlier confusion shows, even the whole make_x() function can get inlined so there is no return value to be optimized away in the first place.

Barmaid answered 9/10, 2013 at 11:56 Comment(6)
Good point. In my test code I would not even see a difference between successful RVO and successful inlining. Strangly using compiler option -finline-small-functions does also not elide the copy. The make_x function should be small enough though.Overdress
In that case I think it is better if we first tag this question as comiler-error to attract the compiler writing wizards and see what they suggest. I will do that now.Overdress
Can you quote where the standard indicates that an external linkage function can't be inlined in some places and called as a function in others?Purview
@MarkB Unfortunately no, you are likely to be correct. I will revise this answer. Today, with fresher eyes, I see that I need to inspect this issue further. Sorry for the false alarm.Barmaid
@Ali, that answers my question, gives additional insights and shows some tricks 8-)Overdress
@PatrickFromberg Oh, I have learned a lot by trying to answer your question! I have updated the answer with the sample code as promised. The most important lesson for me was to look at the generated assembly code. The executable's behavior can be hard to understand and quite misleading. Everything became quickly clear when we looked at the assembly code. By the way, the LLVM assembly language is nice and easy to read, you can generate it with clang with the -S -emit-llvm flags. Thanks for your patience & sorry it took me so long to get the answer right.Barmaid
P
2
  1. I don't believe there's any way to make such a guarantee. RVO is an optimization and as such the compiler may determine in a particular case that using it is actually a de-optimization and elect to not do so.

  2. I'm assuming you're referring to your first code snippet. In 32 bit bit compilation I'm unable to reproduce your assertion on g++ 4.4, 4.5, or 4.8 (through ideone.com) even with no optimization enabled at all. In 64 bit compilation I can reproduce your no-RVO behavior. This smells like a 64 bit code generation bug in g++.

  3. If in fact what I observed in (2) is a bug then once the bug is fixed it will just work.

  4. I can confirm that Sun CC also does not RVO your specific examples even in 32 bit compilation.

I do wonder however if somehow your introspection code to print out the addresses is causing the compiler to inhibit the optimization (for example it may need to inhibit the optimization to prevent possible aliasing problems).

Purview answered 9/10, 2013 at 15:17 Comment(0)
G
0

Why have the compiler writers chosen to enable RVO when user defined copy constructors are in place but not when only default copy constructors are present?

Because the standard says so:

C++14, 12.8/31:

When certain criteria are met, an implementation is allowed to omit the copy/move construction of a class object, even if the constructor selected for the copy/move operation and/or the destructor for the object have side effects.

C++14, 12.8/32

When the criteria for elision of a copy operation are met or would be met save for the fact that the source object is a function parameter, and the object to be copied is designated by an lvalue, overload resolution to select the constructor for the copy is first performed as if the object were designated by an rvalue. If overload resolution fails, or if the type of the first parameter of the selected constructor is not an rvalue reference to the object’s type (possibly cv-qualified), overload resolution is performed again, considering the object as an lvalue. [ Note: This two-stage overload resolution must be performed regardless of whether copy elision will occur. It determines the constructor to be called if elision is not performed, and the selected constructor must be accessible even if the call is elided. —end note ]

You must remember that the RVO (and other copy elisions) are optional.

Imagine a code with deleted copy/move constructors/assignments that compiles on your compiler because the RVO kicks in. Then you move that perfectly compiling code into another compiler, where it legally fails to compile. This is not acceptable.

This means the code must always be valid even if the compiler, for some reason, decides to NOT do the RVO optimization.

Gasparo answered 19/5, 2015 at 15:36 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.