Does C++ guarantee identical binary layout for "trivial" structs with a single trivial member?
Asked Answered
N

3

19

We have some strictly typed integer types in our project:

struct FooIdentifier {
  int raw_id; // the only data member

  // ... more shenanigans, but it stays a "trivial" type.
};

struct BarIdentifier {
  int raw_id; // the only data member

  // ... more shenanigans, but it stays a "trivial" type.
};

Basically something as proposed here or similar to things used in a Unit Library.

These structs basically are integers, except to the type system.

My question here now is: Does the C++ language guarantee that these types are layed out 100% equivalent in memory as a regular int would be?

Note: Since I can statically check whether the types have the same size (i.e. no padding), I'm really only interested in the no-surprising-padding case. I should've added this note from the beginning

// Precodition. If platform would yield false here, I'm not interested in the result.
static_assert(sizeof(int) == sizeof(ID_t)); 

That is, does the following hold from a C++ Standard POV:

int integer_array[42] = {}; // zero init
ID_t id_array[42] = {}; // zero init

static_assert(sizeof(int) == sizeof(ID_t)); // Precodition. If platform would yield false here, I'm not interested in the result.

const char* const pIntArrMem = static_cast<const char*>(static_cast<const void*>(integer_array));
const char* const pIdArrMem = static_cast<const char*>(static_cast<const void*>(id_array));
assert(0 == memcmp(pIntArrMem, pIdArrMem, sizeof(int))); // Always ???
Neogene answered 12/3, 2021 at 9:24 Comment(10)
There is a near dupe but over there it asks about an array as member, and here we have a single value.Neogene
I think we are looking at a meta.stackexchange.com/questions/66377/what-is-the-xy-problem Please take a logical step back and explain why you try this, what you want to achieve.Zurheide
@Zurheide - the question is tagged language-lawyer! I kept it extremely focused on purpose. What I want to do in practise will work if this holds, and won't work if this doesn't hold. / And if it does hold, I plan to do a practical follow up question anyway :-)Neogene
Though there may be an XY problem occurrence the question is not devoid of merit and the language-lawyer tag is there exactly for that purpose. I'm curious to know the answer to the original question :)Donndonna
Yes, it is language lawyer, point taken. You do not stay theoretical however, referring to "we have ..". So I thought that there might be a different way after all. I do see your point and your plan ahead. Interesting, by the way.Zurheide
In a limited sense, yes. If both of those are standard layout, then the common initial sequence guarantee holds when they are part of the same union. But I don't believe there's anything quite as far reaching as what you ask about.Trolley
See [basic.types.general] Two types cv1 T1 and cv2 T2 are layout-compatible types if T1 and T2 are the same type, layout-compatible enumerations (9.7.1), or layout-compatible standard-layout class types (11.4). Looks like no. You can compare two class types for layout compatibility but not a scaler type and a class type.Tricho
No guarantee. BUT. If the types are standard-layout types and sizeof(int) == alignof(int) then you should expect them to be compatible as the compiler is not allowed to add padding at the beginning of the object and has no need to add padding at the end. And I have not come across a system where the alignof(int) differs from its size.Tricho
Though these two types are not guaranteed to the same as int they are guaranteed to be the same as each other. Assuming they fulfill the requirements of a standard-layout class. Also if the two types where in a union then the member raw_id could be accessed legally from either object if set via either of them.Tricho
the answer is NO. and though you may assume it's true as some specific compilers, it may also be unexpected, because converting int* to ID_t* may affect or be affected by the alias analysis.Artillery
D
5

Challenging eerorika's answer, I believe you are guaranteed binary compatibility. I'll reference the C++11 spec for this.

Key pieces: [class/7] This defines a standard-layout class. It's pretty clear we all agree that these are standard layout.

[intro.object/5] and [intro.object/6]

An object of trivially copyable or standard-layout type (3.9) shall occupy contiguous bytes of storage.

Unless an object is a bit-field or a base class subobject of zero size, the address of that object is the address of the first byte it occupies.

This bounds the shapes that a standard-layout object can have, and specifies what we can call "the address of" an object.

[class.mem/20]

A pointer to a standard-layout struct object, suitably converted using a reinterpret_cast, points to its initial member (or if that member is a bit-field, then to the unit in which it resides) and vice versa. [ Note: There might therefore be unnamed padding within a standard-layout struct object, but not at its beginning, as necessary to achieve appropriate alignment. —end note ]

This says that we can at least convert a ID_t* to an int* via reinterpret cast.

Now, you assert that sizeof(ID_t) == sizeof(int). This is good news because it limits your options. int* someIdAsInt = reinterpret_cast<int*>(&someId) is guaranteed to succeed, and it will point at the first member, per class.mem. So the question is, what are the possible addresses that can be returned? Obviously, there is only one address which can possibly be the first byte of sizeof(int) bytes, which is, of course, the address of someId.

So we can be certain that &someId and someIdAsInt refer to the same address. And, in particular, someIdAsInt must point at the initial member per class.mem.

If I were to do *someIdAsInt = 43, the result must be the same as if I did someId.raw_id = 43, because someIdAsInt points at someId.raw_id. This statement must be true no matter what I do with this pointer to obscure it.

This says that *someIdAsInt and someId either must have the same layout (permitting the assignment), or the compiler must track the value of someIdAsInt, treating it different than a normal int*. This is why I depart from eerorika's answer. This information could not be handled in the type system with type tagging(it would force the compiler to be able to track tags, even if you did brutal things like pass int* between threads). So any information tagging must be baked into the bytes forming the value of the int*. The C++ spec does not say anything about the format of a pointer's value.

However, there are limits to how different int* can be, which are generally speaking, undisputed. The key one is that I can use std::memcpy to copy the bytes of one int into another, and the resulting integer must be the same value. To the best of my knowledge, this is not actually written into the spec, but it is accepted by (basically?) all programmers as a common law rule of C and C++. Indeed this sort of thing is further emphasized by the inclusion of std::bit_cast in C++20. To have two integer formats which cannot be distinguished by their bytes would break all sorts of things.

So, if you accept this common law ruling in a language-lawyer argument, then the layout of your ID_t must be identical to the layout of int if sizeof(ID_t) == sizeof(int). If that common law ruling is not accepted then... well... I'd just say some soul searching is in order =D

Note that this does not mean that you can safely go the other way. If you have an int array, you cannot cast it to ID_t* and then access those. That would be a violation of strict aliasing, as there was never an ID_t in that memory address in the first place. However, because they are identical layouts, using std::memcpy or std::bit_cast to convert to an ID_t with an equivalent bit pattern would still be fair game.

Delacruz answered 12/3, 2021 at 19:29 Comment(0)
F
10

TL;DR No, the standard seems to not guarantee it (as far as I can tell). You technically have to rely on having a sane ABI.

You may need to give up supporting ds9k.


The standard doesn't explicitly guarantee much about layout. At best we can make some reasonable assumptions about what practical implementations could do based on guarantees that we do have.

[basic.compound]

Two objects a and b are pointer-interconvertible if:

  • ...
  • one is a standard-layout class object and the other is the first non-static data member of that object, or, if the object has no non-static data members, any base class subobject of that object ([class.mem]), or
  • there exists an object c such that a and c are pointer-interconvertible, and c and b are pointer-interconvertible.

If two objects are pointer-interconvertible, then they have the same address, and it is possible to obtain a pointer to one from a pointer to the other via a reinterpret_­cast.

From this, we transitively know that there practically cannot be padding in the standard layout class before the first member.

[expr.sizeof]

... When applied to a class, the result is the number of bytes in an object of that class including any padding required for placing objects of that type in an array. ... When applied to an array, the result is the total number of bytes in the array. This implies that the size of an array of n elements is n times the size of an element.

This implies that neither integer_array nor id_array nor any array have padding before (nor between nor after) elements.

Given the lack of padding before the int sub object, your second assert would be a reasonable assumption unless an object could have one representation in one context and another representation in another context (free vs sub object, or sub object of different enclosing type). For example, big endian in one and little endian in another. I cannot find standard disallowing that, but I also cannot imagine how such implementation could work in practice, given that compiler cannot practically always know whether a particular glvalue is a sub object (and within which enclosing object) or not.

Given the above assumptions, the first assert boils down to "could the standard layout class have padding after the only member? Actually, this is entirely possible if there is alignas or some layout affecting language extension involved, but can we assume the negative if that is not the case? Standard doesn't say much, and I don't think this would even be impossible for a language implementation in practice to add some padding - just not very useful.

What little standards says about object representation:

[basic.types.general]

The object representation of an object of type T is the sequence of N unsigned char objects taken up by the object of type T, where N equals sizeof(T). The value representation of an object of type T is the set of bits that participate in representing a value of type T. Bits in the object representation that are not part of the value representation are padding bits. For trivially copyable types, the value representation is a set of bits in the object representation that determines a value, which is one discrete element of an implementation-defined set of values. 35

35) The intent is that the memory model of C++ is compatible with that of ISO/IEC 9899 Programming Language C.


Little bit regarding whether FooIdentifier and BarIdentifier are guaranteed to have same representation between each other.

[class.mem.general]

The common initial sequence of two standard-layout struct ([class.prop]) types is the longest sequence of non-static data members and bit-fields in declaration order, starting with the first such entity in each of the structs, such that corresponding entities have layout-compatible types, either both entities are declared with the no_­unique_­address attribute ([dcl.attr.nouniqueaddr]) or neither is, and either both entities are bit-fields with the same width or neither is a bit-field.

Two standard-layout struct ([class.prop]) types are layout-compatible classes if their common initial sequence comprises all members and bit-fields of both classes ([basic.types]).

[basic.compound]

... Pointers to layout-compatible types shall have the same value representation and alignment requirements

The classes are layout-compatible, which sounds promising as a description, but has little effect on rules of the language.

Forecast answered 12/3, 2021 at 11:30 Comment(2)
"From this, we transitively know that there practically cannot be padding in the standard layout class before the first member." (1) Why transtively? The pointer-cast rule guarantess it immediately. (2) Why practically? The pointer-cast rule guarantees it as a language feature. (Hm. Or would the standard permit that the cast change the address?)Carcinogen
@Peter-ReinstateMonica 1. Simply because we don't rely on standard saying "there is no padding". We rely on the pointer-cast rules to make that deduction. 2. Indeed, practically because standard doesn't say anything about the address changing or remaining the same during cast. In fact, some static_casts do change the address. I simply assume that changing address is not practical for reinterpret_casts.Forecast
W
5

No, it is not guaranteed. Simple counter-example:

#include <cstdio>
struct S {
    int s;
} __attribute__ ((aligned (8)));
int main() { printf("%d %d\n", sizeof(S), sizeof(int)); }

Prints 8 and 4 on my machine. __attribute__ is non-standard syntax but there is no guarantee that gcc won't change to eight-byte-alignment by default in the future.

Edit: Given the precondition that the struct and the int always is the same size, then identical binary layout is indeed guaranteed. At least in any implementation that is the least sensible.

Windrow answered 12/3, 2021 at 10:29 Comment(3)
That's answering a different question. Can you make it different if you anotate it.Tricho
No, it's using the annotation to demonstrate that the alignment requirements of the bare type and the struct are permitted to differ. See basic.align/2 - although the example has virtual inheritance, the language only says "The alignment required for a type may be different when it is used as the type of a complete object and when it is used as the type of a subobject"Roband
You may have answered before the OP mentioned that they can assert sizeof(int) == sizeof(S). As an aside, with different compiler options (optimizing for size in one, speed the other) instead of the attribute in different translation units you can even have two identically defined structs with different sizes; the problem is related but not restricted to int vs. struct.Carcinogen
D
5

Challenging eerorika's answer, I believe you are guaranteed binary compatibility. I'll reference the C++11 spec for this.

Key pieces: [class/7] This defines a standard-layout class. It's pretty clear we all agree that these are standard layout.

[intro.object/5] and [intro.object/6]

An object of trivially copyable or standard-layout type (3.9) shall occupy contiguous bytes of storage.

Unless an object is a bit-field or a base class subobject of zero size, the address of that object is the address of the first byte it occupies.

This bounds the shapes that a standard-layout object can have, and specifies what we can call "the address of" an object.

[class.mem/20]

A pointer to a standard-layout struct object, suitably converted using a reinterpret_cast, points to its initial member (or if that member is a bit-field, then to the unit in which it resides) and vice versa. [ Note: There might therefore be unnamed padding within a standard-layout struct object, but not at its beginning, as necessary to achieve appropriate alignment. —end note ]

This says that we can at least convert a ID_t* to an int* via reinterpret cast.

Now, you assert that sizeof(ID_t) == sizeof(int). This is good news because it limits your options. int* someIdAsInt = reinterpret_cast<int*>(&someId) is guaranteed to succeed, and it will point at the first member, per class.mem. So the question is, what are the possible addresses that can be returned? Obviously, there is only one address which can possibly be the first byte of sizeof(int) bytes, which is, of course, the address of someId.

So we can be certain that &someId and someIdAsInt refer to the same address. And, in particular, someIdAsInt must point at the initial member per class.mem.

If I were to do *someIdAsInt = 43, the result must be the same as if I did someId.raw_id = 43, because someIdAsInt points at someId.raw_id. This statement must be true no matter what I do with this pointer to obscure it.

This says that *someIdAsInt and someId either must have the same layout (permitting the assignment), or the compiler must track the value of someIdAsInt, treating it different than a normal int*. This is why I depart from eerorika's answer. This information could not be handled in the type system with type tagging(it would force the compiler to be able to track tags, even if you did brutal things like pass int* between threads). So any information tagging must be baked into the bytes forming the value of the int*. The C++ spec does not say anything about the format of a pointer's value.

However, there are limits to how different int* can be, which are generally speaking, undisputed. The key one is that I can use std::memcpy to copy the bytes of one int into another, and the resulting integer must be the same value. To the best of my knowledge, this is not actually written into the spec, but it is accepted by (basically?) all programmers as a common law rule of C and C++. Indeed this sort of thing is further emphasized by the inclusion of std::bit_cast in C++20. To have two integer formats which cannot be distinguished by their bytes would break all sorts of things.

So, if you accept this common law ruling in a language-lawyer argument, then the layout of your ID_t must be identical to the layout of int if sizeof(ID_t) == sizeof(int). If that common law ruling is not accepted then... well... I'd just say some soul searching is in order =D

Note that this does not mean that you can safely go the other way. If you have an int array, you cannot cast it to ID_t* and then access those. That would be a violation of strict aliasing, as there was never an ID_t in that memory address in the first place. However, because they are identical layouts, using std::memcpy or std::bit_cast to convert to an ID_t with an equivalent bit pattern would still be fair game.

Delacruz answered 12/3, 2021 at 19:29 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.