What's the motivation behind having copy and direct initialization behave differently?

Asked 27/6, 2012 at 9:37 Answered 17/6, 2013 at 3:51

c++initialization history language-design

There are two syntaxes for initialization, direct- and copy-initialization:

A a(b);
A a = b;

I want to know the motivation for them having different defined behavior. For copy initialization, an extra copy is involved, and I can't think of any purpose for that copy. Since it's a copy from a temp, it can and probably will be optimized out, so the user can't rely on it happening - ergo the extra copy itself isn't reason enough for the different behavior. So... why?

Pastose answered 27/6, 2012 at 9:37 Comment(9)

I've had a go at an edit. If I've mangled anything, please put it back :-) – Cleruchy 27/6, 2012 at 9:44

@SteveJessop: Should be tagged Language lawyer. – Mig 27/6, 2012 at 9:45

@Als: maybe, I don't know the purpose of that tag. The idea of the question isn't to split fine hairs about what the spec actually says, though, and that's what I think of as language-lawyering. I'd prefer to tag it language-design, if that's permitted. – Cleruchy 27/6, 2012 at 9:46

@SteveJessop: What the spec says is definitive and clear there is no need to split hairs on that. The questions asks Why?, which is what the tag accounts for. – Mig 27/6, 2012 at 9:49

+1. Good question. Even I want to know the reason as to why they're designed to behave differently. – Worry 27/6, 2012 at 9:50

possible duplicate: https://mcmap.net/q/17178/-is-there-a-difference-between-copy-initialization-and-direct-initialization/1025391 – Disadvantageous 27/6, 2012 at 10:1

@Disadvantageous not really. I know there's a difference, I'm asking why. – Pastose 27/6, 2012 at 10:2

@Luchian: the top answer over on that question goes into detail about when it matters which one you use, though. It says in effect that direct initialization can perform more conversions than copy initialization, because direct initialization can convert b to any type for which A has a constructor, whereas copy initialization must attempt to convert b specifically to A or a derived class of A. So, a plausible motivation for the difference is that copy initialization exists in order to suppress any explicit non-copy constructors of A. – Cleruchy 27/6, 2012 at 10:17

Oh, and also to suppress conversion chains from b to A that involved two user-defined conversions, the second of those being a constructor of A. What the answer over there doesn't address, is which of the differences was the actual motivation. – Cleruchy 27/6, 2012 at 10:21

Since it's a copy from a temp, it can and probably will be optimized out

The keyword here is probably. The standard allows, but does not require, a compiler to optimize the copy away. If some compilers allowed this code (optimized), but others rejected it (non-optimized), this would be very inconsistent.

So the standard prescribes a consistent way of handling this - everyone must check that the copy constructor is accessible, whether they then use it or not.

The idea is that all compilers should either accept the code or reject it. Otherwise it will be non-portable.

Another example, consider

A a;
B b;

A a1 = a;
A a2 = b;

It would be equally inconsistent to allow a2 but forbid a1 when As copy constructor is private.

We can also see from the Standard text that the two methods of initializing a class object were intended to be different (8.5/16):

If the initialization is direct-initialization, or if it is copy-initialization where the cv-unqualified version of the source type is the same class as, or a derived class of, the class of the destination, constructors are considered. The applicable constructors are enumerated (13.3.1.3), and the best one is chosen through overload resolution (13.3). The constructor so selected is called to initialize the object, with the initializer expression or expression-list as its argument(s). If no constructor applies, or the overload resolution is ambiguous, the initialization is ill-formed.

Otherwise (i.e., for the remaining copy-initialization cases), user-defined conversion sequences that can convert from the source type to the destination type or (when a conversion function is used) to a derived class thereof are enumerated as described in 13.3.1.4, and the best one is chosen through overload resolution (13.3). If the conversion cannot be done or is ambiguous, the initialization is ill-formed. The function selected is called with the initializer expression as its argument; if the function is a constructor, the call initializes a temporary of the cv-unqualified version of the destination type. The temporary is a prvalue. The result of the call (which is the temporary for the constructor case) is then used to direct-initialize, according to the rules above, the object that is the destination of the copy-initialization. In certain cases, an implementation is permitted to eliminate the copying inherent in this direct-initialization by constructing the intermediate result directly into the object being initialized; see 12.2, 12.8.

A difference is that the direct-initialization uses the constructors of the constructed class directly. With copy-initialization, other conversion functions are considered and these may produce a temporary that has to be copied.

Leuctra answered 27/6, 2012 at 9:59 Comment(8)

But the potential extra copy is the reason for the difference. :-) – Leuctra 27/6, 2012 at 10:2

Sorry, I still don't see how. Can you be a bit more descriptive in the answer? (not the downvoter) – Pastose 27/6, 2012 at 10:3

This answers the question, "why does the copy constructor need to be accessible even when copy elision is applied?". It does not answer the question, "why is the copy-initialization syntax defined to do a copy that can be elided?" – Cleruchy 27/6, 2012 at 10:12

True, it addresses the point of "ergo the extra copy itself isn't reason enough for the different behavior". The potential extra copy is the reason. Why it was decided to have the copy there eludes me. – Leuctra 27/6, 2012 at 10:22

Because the copying might have extra side effects and this is the only way to differentiate both cases? – Spiccato 27/6, 2012 at 10:42

@Spiccato - No, the compiler is explicitly allowed to elide the copying even if there would be side effects. – Leuctra 27/6, 2012 at 10:50

That's exactly the question - WHY have a copy? – Pastose 27/6, 2012 at 12:29

Regarding your last sentence -- other conversion functions are considered by direct-initialization too, because the arguments can be co-erced to match whatever constructors the class has, just like any function call. So for example Foo f(0) for a class Foo that has a void* constructor results in a temporary too. I think everyone is along the right lines, that the different initializations are both useful, but it's actually quite complicated to explain all the consequences of the differences, let alone figure out whether there's some killer use-case that makes the need obvious... – Cleruchy 27/6, 2012 at 12:30

Only a speculation, but I am afraid it will be hard to be more certain without Bjarne Stroustrup confirming how it really was:

It was designed this way because it was assumed such behaviour will be expected by the programmer, that he will expect the copy to be done when = sign is used, and not done with the direct initializer syntax.

I think the possible copy elision was only added in later versions of the standard, but I am not sure - this is something somebody may be able to tell certainly by checking the standard history.

Leucopenia answered 27/6, 2012 at 10:27 Comment(3)

Ok, but why would the programmer want the extra copy? It seems like it doesn't serve any purpose. – Pastose 1/7, 2012 at 16:34

I agree. It does not make any sense to me, however this is the only possible explanation I can imagine. I think only those who designed the language might be able to tell their reasons. – Leucopenia 1/7, 2012 at 19:31

Yup. Or someone who actually wanted a copy and can tell use the reasons. – Pastose 1/7, 2012 at 19:48

Take the following example:

struct X
{
    X(int);
    X(const X&);
};

int foo(X x){/*Do stuff*/ return 1; }
X x(1);
foo(x);

In the compilers I tested, the argument to foo was always copied even with full optimization turned on. From this, we can gather that copies will not/must not be eliminated in all situations.

Now lets think from a language design perspective, imagine all the scenarios you would have to think about if you wanted to make rules for when a copy is needed and when it isn't. This would be very difficult. Also, even if you were able to come up with rules, they would be very complex and almost impossible for people to comprehend. However, at the same time, if you forced copies everywhere, that would be very inefficient. This is why the rules are the way they are, you make the rules comprehensible for people to understand while still not forcing copies to be made if they can be avoided.

I have to admit now, this answer is very similar to Suma's answer. The idea is that you can expect the behavior with the current rules, and anything else would be too hard for people to follow.

Swoosh answered 29/6, 2012 at 9:40 Comment(2)

foo takes its argument by value so it is clear that there will be a copy involved. I don't think this is in any way relevant to the question. – Klemens 17/6, 2013 at 1:9

@Tony: The whole point of the example is to show that not all copies can be eliminated. Although in some circumstances they can be elided, it is easier to make a more general rule then make individual rules for each use case. – Swoosh 17/6, 2013 at 3:8

Initialization of built-in types like:

int i = 2;

is very natural syntax, in part due to historical reasons (remember your high school math). It is more natural than:

int i(2);

even if some mathematicians may argue this point. After all, there is nothing unnatural in calling a function (a constructor in this case) and passing it an argument.

For built-in types these two types of initialization are identical. There is no extra copy in the former case. That is the reason for having both types of initialization and originally there was no specific intention to make them behave differently.

However, there are user-defined types and one of the stated goals of the language is to allow them to behave as built-in types as closely as possible.

Thus, copy construction (taking input from some conversion function, for example) is the natural implementation of the first syntax.

The fact that you may have extra copies and that they may be elided is an optimization for user-defined types. Both copy elision and explicit constructors came much later into the language. It is not surprising that standard allows optimizations after a certain period of use. Also, now you can eliminate explicit constructors from the overload resolution candidates.

Klemens answered 17/6, 2013 at 3:51 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags