Why copy constructor called twice in heap array initialization?
Asked Answered
I

2

15

For the following C++14 code, why does g++'s generated code for new A[1]{x} seem to invoke the copy constructor twice?

#include <iostream>
using namespace std;

class A {
public:
    A()           { cout << "default ctor" << endl; }
    A(const A& o) { cout << "copy ctor" << endl;    }
    ~A()          { cout << "dtor" << endl;         }
};

int main()
{
    A x;
    cout << "=========" << endl;
    A* y = new A[1]{x};
    cout << "=========" << endl;
    delete[] y;
    return 0;
}

Compilation and output:

$ g++ -fno-elide-constructors -std=c++14 test.cpp && ./a.out
default ctor
=========
copy ctor
copy ctor
dtor
=========
dtor
dtor

Interestingly, for the same code, clang++ only invokes the copy constructor once:

$ clang++ -fno-elide-constructors -std=c++14 test.cpp && ./a.out
default ctor
=========
copy ctor
=========
dtor
dtor

Furthermore, when using g++, changing the A* y = new A[1]{x}; line to any of the following will cause the copy constructor to be called only once:

  • A* y = new A {x}; - normal heap object instead of heap array of size 1
  • A y[1] {x}; - array on stack instead of heap

So it appears that the double copy constructor behavior is only exhibited in heap-array initialization.

Internode answered 15/5, 2021 at 3:47 Comment(3)
Interestingly this gcc behavior seem to be specific to c++11/c++14 and does not reproduce when compiling with -std=c++17: godbolt.org/z/538fqd9KMMarlow
Also interesting (possibly) but unhelpful (maybe) is that, if you add a move c'tor, then GCC calls that in place of the second copy c'tor call. Possibly, the beast is creating an initializer list and then using that to initialize the new array data?Godard
@Marlow I suspect that has something to do with C++17's newly introduced guaranteed copy elision, which can cause certain copy constructions to be elided even with -fno-elide-constructor.Internode
L
5

TL;DR: It's likely a GCC defect, a misinterpretation of {x} as temporary in this context. For each element in new A[N]{x1, x2, ... xN}, the copy constructor should get called once according to [decl.init] and [new.expr]. Instead, GCC likely interprets it as initializer list and thus in part as intermediate rvalue. We can force GCC to interpret it otherwise, though.


why does g++'s generated code for new A[1]{x} seem to invoke the copy constructor twice?

Because there is no move constructor. If we add a move constructor and some more output, we get a better picture of the situation (Compiler Explorer):

#include <iostream>
using namespace std;

class A {
public:
    A()           { cout << "default ctor @" << this << endl; }
    A(A&& o)      { cout << "move ctor: " << &o << " to " << this << endl;    }
    A(const A& o) { cout << "copy ctor: " << &o << " to " << this << endl;    }
    ~A()          { cout << "dtor @" << this << endl;         }
};

int main()
{
    A x;
    cout << "=========" << endl;
    A* y = new A[1]{x};
    cout << "=========" << endl;
    delete[] y;
    return 0;
}

Note that the existence of our new A(A&&) constructor shows us the inbetween temporary:

default ctor @0x7ffec28b5476
=========
copy ctor: 0x7ffec28b5476 to 0x7ffec28b5477
move ctor: 0x7ffec28b5477 to 0x55d0a7fa6288
dtor @0x7ffec28b5477
=========
dtor @0x55d0a7fa6288
dtor @0x7ffec28b5476

Indeed, if we A(A&&) = delete the constructor, g++ won't even compile it anymore (but Clang still accepts it).

It seems like g++ misinterprets the braced-init-list. IMHO, [expr.new] may allow that kind of interpretation, but this seems like a g++ defect and should probably get reported as such.

However, the whole ordeal reminds me of an older question of mine (Are curly braces really required around initialization?). So let's introduce more braces to make sure that g++ cannot misinterpret our initializer:

int main()
{
    A x;
    cout << "=========" << endl;
    A* y = new A[1]{{{x}}};
    cout << "=========" << endl;
    delete[] y;
    return 0;
}

This variant circumvents g++'s behaviour:

initializer for T[1]     start : {
initializer for first element  : {
actual initializer for A       : {x}

The program output is then (Explorer)

default ctor @0x7ffede3d9967
=========
copy ctor: 0x7ffede3d9967 to 0x1eb0ec8
=========
dtor @0x1eb0ec8
dtor @0x7ffede3d9967

So for multiple elements, we end up in brace-hell (Compiler Explorer):

int main()
{
    A x;
    cout << "=========" << endl;
    A* y = new A[2]{{{x},{{x}}};
    cout << "=========" << endl;
    delete[] y;
    return 0;
}

Again, no additional constructors are called:

default ctor @0x7fff3a2a7a27
=========
copy ctor: 0x7fff3a2a7a27 to 0x1f49ec8
copy ctor: 0x7fff3a2a7a27 to 0x1f49ec9
=========
dtor @0x1f49ec9
dtor @0x1f49ec8
dtor @0x7fff3a2a7a27
Littlejohn answered 21/5, 2021 at 8:30 Comment(0)
C
1

After doing some research in the standard I came to the conclusion that g++ is wrong and there should be only one copy constructor invocation. What is interesting it seems that there can be two interpretations of which type of initialization occurs here. Both lead to the same conclusion though.

First interpretation - direct initialization

From the C++14 Standard (Working Draft), [expr.new] 17:

A new-expression that creates an object of type T initializes that object as follows:

  • (17.1) — If the new-initializer is omitted, the object is default-initialized (8.5). [ Note: If no initialization is performed, the object has an indeterminate value. — end note ]
  • (17.2) — Otherwise, the new-initializer is interpreted according to the initialization rules of 8.5 for direct initialization.

In our case the new-initializer is present, so (according to 17.2) new A[1]{x} is interpreted using direct initialization rules. Let's look at [dcl.init] 16:

The initialization that occurs in the forms

  • T x(a);
  • T x{a};

as well as in new expressions (5.3.4), static_cast expressions (5.2.9), functional notation type conversions (5.2.3), mem-initializers (12.6.2), and the braced-init-list form of a condition is called direct-initialization

Ok, this further confirms that we are dealing with direct initialization. Now let's see how direct initialization works in [dcl.init] 17:

The semantics of initializers are as follows. The destination type is the type of the object or reference being initialized and the source type is the type of the initializer expression. If the initializer is not a single (possibly parenthesized) expression, the source type is not defined.

  • [... 17.1 through 17.5 omitted ...]
  • (17.6) — If the destination type is a (possibly cv-qualified) class type:
    • (17.6.1) — If the initialization is direct-initialization, or if it is copy-initialization where the cv-unqualified version of the source type is the same class as, or a derived class of, the class of the destination, constructors are considered. The applicable constructors are enumerated (13.3.1.3), and the best one is chosen through overload resolution (13.3). The constructor so selected is called to initialize the object, with the initializer expression or expression-list as its argument(s). If no constructor applies, or the overload resolution is ambiguous, the initialization is ill-formed.

According to the excerpt above, when the object being initialized is a class type (as is the case here) and when dealing with direct initialization (as is the case here) the destination object is initialized using the most suitable constructor.

I won't cite the rules about how the constructor is selected, as in this case when there is only the default A::A() constructor and the copy A::A(const A&) constructor, the copy constructor is obviously the better choice when initializing with x of type A. This is the source of one of the copy constructor invocations.

I didn't find any remarks about the initialization of arrays in particular in section [expr.new] and why it should cause a second constructor invocation.

Second interpretation - copy initialization

Here, we can start from [dcl.init.list] 1:

List-initialization is initialization of an object or reference from a braced-init-list. Such an initializer is called an initializer list, and the comma-separated initializer-clauses of the list are called the elements of the initializer list. An initializer list may be empty. List-initialization can occur in direct-initialization or copy initialization contexts; list-initialization in a direct-initialization context is called direct-list-initialization and list-initialization in a copy-initialization context is called copy-list-initialization. [ Note: List-initialization can be used

  • (1.1) — as the initializer in a variable definition (8.5)
  • (1.2) — as the initializer in a new-expression (5.3.4)
  • [... 1.3 through 1.10 omitted ...]

— end note ]

This excerpt can be understood to say that new A[1]{x} is actually a form of list intialization rather than direct initialization as a braced-init-list {x} is used. Assuming this is the case, let's look at how it works in [dcl.init.list] 3:

List-initialization of an object or reference of type T is defined as follows:

  • [... 3.1 through 3.2 omitted ...]
  • (3.3) — Otherwise, if T is an aggregate, aggregate initialization is performed (8.5.1).
  • [... 3.4 through 3.10 omitted ...]

In our case, point 3.3 applies as we are initializing an array which is an aggregate, according to [dcl.init.aggr] 1:

An aggregate is an array or a class (Clause 9) with no user-provided constructors (12.1), no private or protected non-static data members (Clause 11), no base classes (Clause 10), and no virtual functions (10.3).

As such let's look at how aggregate initialization is performed in [dcl.init.aggr] 2:

When an aggregate is initialized by an initializer list, as specified in 8.5.4, the elements of the initializer list are taken as initializers for the members of the aggregate, in increasing subscript or member order. Each member is copy-initialized from the corresponding initializer-clause. If the initializer-clause is an expression and a narrowing conversion (8.5.4) is required to convert the expression, the program is ill-formed.

This fragment tells us that elements are copy initialized. As such y[0] will be copy initialized from x. Now let's look at how copy initialization works in [dcl.init] 17:

The semantics of initializers are as follows. The destination type is the type of the object or reference being initialized and the source type is the type of the initializer expression. If the initializer is not a single (possibly parenthesized) expression, the source type is not defined.

  • [... 17.1 through 17.5 omitted ...]
  • (17.6) — If the destination type is a (possibly cv-qualified) class type:
    • (17.6.1) — If the initialization is direct-initialization, or if it is copy-initialization where the cv-unqualified version of the source type is the same class as, or a derived class of, the class of the destination, constructors are considered. The applicable constructors are enumerated (13.3.1.3), and the best one is chosen through overload resolution (13.3). The constructor so selected is called to initialize the object, with the initializer expression or expression-list as its argument(s). If no constructor applies, or the overload resolution is ambiguous, the initialization is ill-formed.

Just like last time, this initialization fulfills the requirements for point 17.6.1 as it is copy-initialization where the source type (A of x) is the same as the destination type (A of y[0]). This means that in this case the copy constructor will be called as well.

Conclusion

It seems that regardless of which interpretation is chosen, only one constructor should be called and that Clang is right. I was unable to find any evidence that a temporary should be created. For some more example-based evidence, other compilers like icc, and (admittedly clang-based) zapcc and elcc agree with clang, all having only one copy constructor invocation.

I don't know much about g++'s internal workings, but I have a theory about why it does two copy constructor invocations. It is possible that internally g++ uses some helper constructor invocations that are later always optimized out and that the use of the -fno-elide-constructors flag breaks the invariance that they will be always optimized out. This is however pure speculation about g++ on my side, so please correct me if I'm wrong.

Child answered 17/5, 2021 at 19:59 Comment(7)
All of your quotes (only) apply when the item being initialized is an object of class type, but in your code, you're initializing an array of objects of class type. The fact that the array has a size of 1 doesn't change the fact that it is an object of array type and not an object of class type.Executory
Weird. On one hand the standard says in two places that when using a new expression, direct initialization is used. At the same time [dcl.init.list] seems to say that in this case actually it is the form of initialization that applies rather than direct initialization, as a braced initializer is used. If this were the case, according to 8.5.4.3.3 aggregate initialization is actually performed, which in turn according to 8.5.1.2 causes copy initialization. However even if this is the case this still falls under 8.5.17.6.1 which still should mean that only one copy constructor should be invoked.Child
the {x} here is an initializer for array, so another, temporal A is created to create initializer for A[1] type. Then A[1] is copy-initialized, a part of which would be second call of A's copy constructor. That may be elided for more than one reason (including as-if rule) but is it guaranteed to be elided?Azpurua
according to gcc no-elide-constructors disables all ctor elision, including ones described in standard. It's an option that creates concompliant behaviour.Azpurua
@Swift-FridayPie Could you perhaps give an excerpt from the Standard that explains the creation of the temporary? In my research, that I showed in the answer and in the comment above, I was unable to find any mention of a temporary being created.Child
@Swift-FridayPie I would agree with your comment, but if you were right then A* y = new A[]{x, {}}; would produce the output copy ctor\ncopy ctor\ndefault ctor\ncopy ctor\ndtor\ndtor, but this expression does not make copy of the second item: copy ctor\ncopy ctor\ndtor\ndefault ctor.Formalin
Also A* y = new A[]{x, x}; is confusing me. I expect copy ctor\ncopy ctor\ncopy ctor\copy ctor\ndtor\ndtor, but the actual output is copy ctor\ncopy ctor\ndtor\ncopy ctor\copy ctor\ndtor. This does not look like creating initializer for A[2].Formalin

© 2022 - 2024 — McMap. All rights reserved.