Why is C++11's POD "standard layout" definition the way it is?
Asked Answered
P

6

55

I'm looking into the new, relaxed POD definition in C++11 (section 9.7)

A standard-layout class is a class that:

  • has no non-static data members of type non-standard-layout class (or array of such types) or reference,
  • has no virtual functions (10.3) and no virtual base classes (10.1),
  • has the same access control (Clause 11) for all non-static data members,
  • has no non-standard-layout base classes,
  • either has no non-static data members in the most derived class and at most one base class with non-static data members, or has no base classes with non-static data members, and
  • has no base classes of the same type as the first non-static data member.

I've highlighted the bits that surprised me.

What would go wrong if we tolerated data members with varying access controls?

What would go wrong if the first data member was also a base class? i.e.

struct Foo {};
struct Good : Foo {int x; Foo y;};
struct Bad  : Foo {Foo y; int x;};

I admit it's a weird construction, but why should Bad be prohibited but not Good?

Finally, what would go wrong if more than one constituent class had data members?

Putrid answered 23/8, 2011 at 12:16 Comment(4)
struct has always had all of its members public. C++11 has private now?Neurogram
@Mu: Yes, by default members of a struct are public. Class members are private by default, conversely.Neurogram
@Code It has always been possible to have private members in a struct in C++ (but not in C). The default is public, though.Electrodynamometer
@Code Monkey: It's always been legal to define protected and private members in a struct- the difference was only the default.Few
A
31

You are allowed to cast a standard layout class object address to a pointer to its first member and back by one of the later paragraphs, which is also often done in C:

struct A { int x; };
A a;

// "px" is guaranteed to point to a.x
int *px = (int*) &a;

// guaranteed to point to a
A *pa = (A*)px; 

For that to work, the first member and the complete object have to have the same address (the compiler cannot adjust the int pointer by any bytes because it can't know whether it's a member of an A or not).

Finally, what would go wrong if more than one constituent class had data members?

Within a class, members are allocated in increasing addresses according to the declaration order. However C++ doesn't dictate the order of allocation for data members across classes. If both the derived class and base class had data members, the Standard doesn't define an order for their addresses on purpose, so as to give an implementation full flexibility in layouting memory. But for the above cast to work, you need to know what is the "first" member in allocation order!

What would go wrong if the first data member was also a base class?

If the base class has the same type as the first data member, implementations that place the base classes before the derived class objects in memory would need to have a padding byte before the derived class object data members in memory (base class would have size one), to avoid having the same address for both the base class and the first data member (in C++, two distinct objects of the same type always have different addresses). But that would again make impossible to cast the address of the derived class object to the type of its first data member.

Ajar answered 23/8, 2011 at 19:45 Comment(13)
This is a great answer, except that I'm not sure about the last paragraph -- is this padding a base class which has no data members? I understand the mechanics of cast-to-member but I don't accept that it would hurt "standard layout" to relax this because we only need to support this old trick for aggregatesPutrid
@Putrid the spec allows it for standard layout classes, not just for aggregates. A base class with only functions and no data members can occur by traits or SFINAE usecases. In such cases it would be a pity to hurt the guarantee that there is no padding before the first data member, I think.Ajar
How about relaxing the distinct-objects-have-distinct-pointers requirement for non-data-bearing base classes vs the same class as a member? Would this cause nasty side-effects? (AFAIK under the diamond problem a duplicated empty base class would already have one pointer value for both "instances" because of the no padding guarantee)Putrid
@Putrid no, a duplicatred empty base class object never has the same address. That's a fundamental guarantee in the language, stated very early in the spec. Unfortunately I'm not the right person to ask about the pros and cons of relaxing this restriction, as I know too little about the use cases of it.Ajar
Doesn't that contradict "the guarantee that there is no padding before the first data member"?Putrid
@spraff, I don't see a contradiction. Can you cook up an example?Ajar
struct Duplicated{}; struct A:Duplicated{}; struct B:Duplicated{}; struct Derived:A,B{Duplicated d;} if there is to be no padding before Derived::d then Derived::A::Duplicated and Derived::B::Duplicated must have the same address even though Duplicated is not virtualPutrid
@Putrid your example contradicts "has no base classes of the same type as the first non-static data member", so there is no such guarantee as "no padding before Derived::d" for it.Ajar
+1, this is better than my answer, since you give a specific example of how a program might rely on a class having standard-layout.Highway
Btw, if it wasn't a deliberate mis-use, then it's "laying out" rather than "layouting". If it was deliberate, I quite like it :-)Highway
I understand now, thank you all. (Although I'm not convinced that cast-to-member-and-back is a legitimate operation worth supporting, but that's another story...)Putrid
@litb Your nuanced understanding of this kind of topic (as well as the entirety of C++, apparently) makes me to want to draw your attention to this question I suspect invokes standard layout rules. I'm sure if I could read between the lines better I could figure out if what I'm asking was legal or not. But alas I am confused, so if you can give me the definitive smack-down I'll gladly award you the bounty. More points! :-)Salahi
But in the derived class ,if the base class is empty ,then there is no base class subobject ,so base class will not have size one.Goggin
H
28

It's basically about compatibility with C++03 and C:

  • same access control - C++03 implementations are allowed to use access control specifiers as an opportunity to re-order the (groups of) members of a class, for example in order to pack it better.
  • more than one class in the hierarchy with non-static data members - C++03 doesn't say where base classes are located, or whether padding is elided in base class subobjects that would be present in a complete object of the same type.
  • base class and first member of the same type - because of the second rule, if the base class type is used for a data member, then it must be an empty class. Many compilers do implement the empty base class optimization, so what Andreas says about the sub-objects having the same address would be true. I'm not sure though what it is about standard-layout classes that means it's bad for the base class subobject to have the same address as a first data member of the same type, but it doesn't matter when the base class subobject has the same address as a first data member of a different type. [Edit: it's because different objects of the same type have different addresses, even if they're empty sub-objects. Thanks to Johannes]

C++0x probably could have defined that those things are standard-layout types too, in which case it would also define how they're laid out, to the same extent it does for standard-layout types. Johannes's answer goes into this further, look at his example of a nice property of standard-layout classes that these things interfere with.

But if it did that, then some implementations would be forced to change how they lay out the classes to match the new requirements, which is a nuisance for struct compatibility between different versions of that compiler pre- and post- C++0x. It breaks the C++ ABI, basically.

My understanding of how standard layout was defined is that they looked at what POD requirements could be relaxed without breaking existing implementations. So I assume without checking, that the above are examples where some existing C++03 implementation does use the non-POD nature of the class to do something that's incompatible with standard layout.

Highway answered 23/8, 2011 at 12:47 Comment(3)
At first I accepted this but the more I think about it the more it seems to not matter how the compiler orders/aligns the members of a class/struct -- only that base and member structs remain contained within themselves as the class which they are members/bases of get laid-out by the compiler. Nothing which was POD would cease to be so if we allowed heterogeneous access controls and multiple member-bearing bases!Putrid
It's worth noting that C++11 clarified 'same access control'/'reorder by access control specifier' to mean only when the specifiers are different - in case for some reason you want to include multiple redundant identical specifiers - whereas C++03 allowed reordering around these. Whether any compiler actually did the latter, I dunno.Cohe
@Steve Jessop "so what Andreas says about the sub-objects having the" I believe you meant member sub-object.Inunction
D
9

What would go wrong if we tolerated data members with varying access controls?

The current language says that the compiler cannot reorder members under the same access control. Like:

struct x
{
public:
    int x;
    int y;
private:
    int z;
};

Here x must be allocated before y, but there is no restriction on z relative to x and y.

struct y
{
public:
    int x;
public:
    int y;
};

The new wording says that y is still a POD despite the two publics. This is actually a relaxation of the rules.

Deplete answered 23/8, 2011 at 12:50 Comment(0)
L
4

As to why Bad isn't allowed let me quoute from an article I found:

This ensures that two subobjects that have the same class type and that belong to the same most-derived object are not allocated at the same address.

http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2172.html

Lazos answered 23/8, 2011 at 12:30 Comment(3)
If sizeof(Foo) is nonzero then struct Bar:Foo{Foo f;}; will be composed of two distinct Foo at two distinct locations. That's a nice link but I fail to see how these two Foo could have the same address.Putrid
Note, that if there is first data member, than the base necessarily does have zero size (as per 5th point if it has non-static data members, no base class may have them and thus all base classes have zero size).Frederickafredericks
@spraff: that doesn't follow. sizeof(Foo) is non-zero for any class Foo, but if the class is empty then even though it has non-zero size, when used as a subobject it can occupy no space.Highway
M
2

From bullet 5, it seems that both are non-pod since the most derived class has non static data member (the int), it can't have a base class with non-static data member.

I understand it as: "only one of the "base" class (i.e. the class itself or one of the classes it inherit from) can have non-static data members"

Mcgrody answered 23/8, 2011 at 12:26 Comment(2)
"only one of the 'base'..." -- yes, but why?Putrid
From the paper cited above, it seems to be because the standard does not make any restriction on where the data of the base class is allocated "in relation with the data of the derived class", i.e. the order of the layout (base data and derived data) is not specified. Thus, it would break the layout-compatibility guarantee on pod types (if I get it right)Mcgrody
P
1

struct Good is not a standard-layout either, since Foo and Good have non-static data-member.

This way, Good should be:

struct Foo {int foo;};
struct Good : public Foo {Foo y;};

which fails to satisfy 6th bullet. Hence the 6th bullet?

Prism answered 23/8, 2011 at 12:31 Comment(1)
Your version of Good doesn't satisfy the fifth bullet, either. A class with non-static data members cannot have a base class that also has non-static data members. The fifth bullet says that there can only be one block of non-static data members. That block can be either in the class itself or in exactly one of its base classes. The sixth bullet is for when the first non-static data member is an empty class. If it's empty, and there's a base class of the same type, then the address of the base object and the address of the first data member might be equal, which isn't allowed.Adverse

© 2022 - 2024 — McMap. All rights reserved.