C++: returning by reference and copy constructors
Asked Answered
H

9

22

References in C++ are baffling me. :)

The basic idea is that I'm trying to return an object from a function. I'd like to do it without returning a pointer (because then I'd have to manually delete it), and without calling the copy-constructor, if possible (for efficiency, naturally added: and also because I wonder if I can't avoid writing a copy constructor).

So, all in all, here are the options for doing this that I have found:

  • The function return type can be either the class itself (MyClass fun() { ... }) or a reference to the class (MyClass& fun() { ... }).
  • The function can either construct the variable at the line of return (return MyClass(a,b,c);) or return an existing variable (MyClass x(a,b,c); return x;).
  • The code that receives the variable can also have a variable of either type: (MyClass x = fun(); or MyClass& x = fun();)
  • The code which receives the variable can either create a new variable on the fly (MyClass x = fun();) or assign it to an existing variable (MyClass x; x = fun();)

And some thoughts on that:

  • It seems to be a bad idea to have the return type MyClass& because that always results in the variable being destroyed before it gets returned.
  • The copy constructor only seems to get involved when I return an existing variable. When returning a variable constructed in the line of return, it never gets called.
  • When I assign the result to an existing variable, the destructor also always kicks in before the value is returned. Also, no copy constructor gets called, yet target variable does receive the member values of the object returned from the function.

These results are so inconsistent that I feel totally confused. So, what EXACTLY is happening here? How should I properly construct and return an object from a function?

Havana answered 16/2, 2010 at 15:8 Comment(3)
The reason the copy constructor isn't being called in the second case is called return-value optimization, or RVO. The compiler is allowed to omit the temporary + copy, even if this changes the behavior of the program.Loader
@Havana What about the returning to an existing variable issue? I've faced it but I couldn't figure out why the destructor kicks in an no copy constructor is called. Have you ever find out what is the cause of it?Steelhead
@ArthurNunes - Sorry, no, I haven't worked much with C++ and I don't know the answer. I think I abandoned this project or rewrote it differently. (that was 5 years ago, you know).Havana
I
16

The best way to understand copying in C++ is often NOT to try to produce an artificial example and instrument it - the compiler is allowed to both remove and add copy constructor calls, more or less as it sees fit.

Bottom line - if you need to return a value, return a value and don't worry about any "expense".

Insignificant answered 16/2, 2010 at 15:14 Comment(3)
OK, I accept this. Still I must wonder - what about objects that shouldn't be clone-able? Like a network socket, which it is in my case. Am I forced to return a pointer then? Or can I depend on the optimized-out copy-constructor?Havana
@Havana That's another question. And the answer is that such things should not be copied, or be copyable.Insignificant
@Vilx: This is fixed in the next C++ standard with move semantics. Instead of making a copy and then losing one to transfer an object, you merely move the internal data; there never exists more than one copy.Loader
E
14

Recommended reading: Effective C++ by Scott Meyers. You find a very good explanation about this topic (and a lot more) in there.

In brief, if you return by value, the copy constructor and the destructor will be involved by default (unless the compiler optimizes them away - that's what happens in some of your cases).

If you return by reference (or pointer) a variable which is local (constructed on the stack), you invite trouble because the object is destructed upon return, so you have a dangling reference as a result.

The canonical way to construct an object in a function and return it is by value, like:

MyClass fun() {
    return MyClass(a, b, c);
}

MyClass x = fun();

If you use this, you don't need to worry about ownership issues, dangling references etc. And the compiler will most likely optimize out the extra copy constructor / destructor calls for you, so you don't need to worry about performance either.

It is possible to return by reference an object constructed by new (i.e. on the heap) - this object will not be destroyed upon returning from the function. However, you have to destroy it explicitly somewhere later by calling delete.

It is also technically possible to store an object returned by value in a reference, like:

MyClass& x = fun();

However, AFAIK there is not much point in doing this. Especially because one can easily pass on this reference to other parts of the program which are outside of the current scope; however, the object referenced by x is a local object which will be destroyed as soon as you leave the current scope. So this style can lead to nasty bugs.

Eos answered 16/2, 2010 at 15:10 Comment(1)
what's about using const MyClass& x = fun(); the variable is not destroyed when returned / leaving the scope. Is this ok?Lilian
O
9

read about RVO and NRVO (in a word these two stands for Return Value Optimization and Named RVO, and are optimization techniques used by the compiler to do what you're trying to achieve)

you'll find a lot of subjects here on stackoverflow

Ohm answered 16/2, 2010 at 15:13 Comment(0)
J
4

If you create an object like this:

MyClass foo(a, b, c);

then it will be on the stack in the function's frame. When that function ends, its frame is popped off the stack and all the objects in that frame are destructed. There is no way to avoid this.

So if you want to return an object to a caller, you only options are:

  • Return by value - a copy constructor is required (but the call to the copy constructor may be optimised out).
  • Return a pointer and make sure you either use smart pointers to deal with it or carefully delete it yourself when done with it.

Attempting to construct a local object and then return a reference to that local memory to a calling context is not coherent - a calling scope can not access memory that is local to the called scope. That local memory is only valid for the duration of the function that owns it - or, another way, while execution remains in that scope. You must understand this to program in C++.

Juggler answered 16/2, 2010 at 16:1 Comment(2)
I do, but the "optimized out copy constructor" was confusing me. The object would be kinda... messy to clone, what with it wrapping a network socket and all.Havana
Not necessarily - a socket is just a file descriptor at heart, so when you clone the object you can 'clone' the network socket by just copying the FD.Juggler
B
3

About the only time it makes sense to return a reference is if you're returning a reference to a pre-existing object. For an obvious example, nearly every iostream member function returns a reference to the iostream. The iostream itself exists before any of the member functions is called, and continues to exist after they're called.

The standard allows "copy elision", which means the copy constructor doesn't need to be called when you return an object. This comes in two forms: Name Return Value Optimization (NRVO) and anonymous Return Value Optimization (usually just RVO).

From what you're saying, your compiler implements RVO but not NRVO -- which means it's probably a somewhat older compiler. Most current compilers implement both. The un-matched dtor in this case means it's probably something like gcc 3.4 or thereabouts -- though I don't remember the version for sure, there was a one around then that had a bug like this. Of course, it's also possible that your instrumentation isn't quite right, so a ctor that you didn't instrument is being used, and a matching dtor is being invoked for that object.

In the end, you're stuck with one simple fact though: if you need to return an object, you need to return an object. In particular, a reference can only give access to a (possibly modified version of) an existing object -- but that object had to be constructed at some point as well. If you can modify some existing object without causing a problem, that's fine and well, go ahead and do it. If you need a new object different and separate from those you already have, go ahead and do that -- pre-creating the object and passing in a reference to it may make the return itself faster, but won't save any time overall. Creating the object has about the same cost whether done inside or outside the function. Any reasonably modern compiler will include RVO, so you won't pay any extra cost for creating it in the function, then returning it -- the compiler will just automate allocating space for the object where it's going to be returned, and have the function construct it "in place", where it'll still be accessible after the function returns.

Biquadrate answered 16/2, 2010 at 15:26 Comment(2)
Actually it's Visual Studio 2008. :) Though it's a debug compilation, so many optimizations are turned off.Havana
Ah, that would explain lack of NRVO anyway. Not so sure about the unmatched dtor -- you might have a compiler bug, but an un-instrumented ctor is probably more likely.Biquadrate
P
2

Basically, returning a reference only makes sense if the object still exists after leaving the method. The compiler will warn you if you return a reference to something that is being destroyed.

Returning a reference rather than an object by value saves copying the object which might be significant.

References are safer than pointers because they have different symantics, but behind the scenes they are pointers.

Pyroclastic answered 16/2, 2010 at 15:12 Comment(4)
The compiler might warn you (and usually does) but complex cases are not caught and not all compilers might be analyzing it in the first place.Hagioscope
Which compiler messes up here? An example?Pyroclastic
@nodan: Visual Studio 2010 does not warn with: int& f(bool f, int& x) { int y = 0; return f ? x : y; } int main() { int x; int& i = f(false, x); }.Loader
Returning by value can be just as cheap if the compiler applies N/RVO. and Ultra Pedant Mode: all known compilers implement references using the same machine code constructs as pointers, but the standard does not mandate this, so someone could use another method if they wanted (which would probably be very silly, but is nonetheless possible).Vicissitude
A
1

One potential solution, depending on your use case, is to default-construct the object outside of the function, take in a reference to it, and initialize the referenced object within the function, like so:

void initFoo(Foo& foo) 
{
  foo.setN(3);
  foo.setBar("bar");
  // ... etc ...
}

int main() 
{
  Foo foo;
  initFoo(foo);

  return 0;
}

Now this of course does not work if it is not possible (or does not make sense) to default-construct a Foo object and then initialize it later. If that is the case, then your only real option to avoid copy-construction is to return a pointer to a heap-allocated object.

But then think about why you are trying to avoid copy-construction in the first place. Is the "expense" of copy construction really affecting your program, or is this a case of premature optimization?

Adelladella answered 16/2, 2010 at 15:18 Comment(4)
Actually it's a case of do-I-really-need-to-write-the-damn-copy-constructor. Less code, less bugs. :PHavana
@Vilx: You should almost never really need to. Your members should copy themselves.Loader
Unless there is some dynamically allocated memory which gets deallocated on destruction. And I don't think that's very rare, is it?Havana
@Vilx GMan is implying that you should use smart pointers (which automatically deallocate on destruction) in preference to raw pointers when that sort of thing is needed.Adelladella
N
0

You are stucked with either:

1) returning a pointer

MyClass* func(){ //some stuf return new MyClass(a,b,c); }

2) returning a copy of the object MyClass func(){ return MyClass(a,b,c); }

Returning a reference is not valid because the object is to be destroyed after exiting the func scope, except if the function is a member of the class and the reference is from a variable that is member of the class.

Nondisjunction answered 16/2, 2010 at 15:18 Comment(0)
N
0

Not a direct answer, but a viable suggestion: You could also return a pointer, wrapped in an auto_ptr or smart_ptr. Then you'll be in control of what constructors and destructors get called and when.

Nicol answered 16/2, 2010 at 15:28 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.