c++ data alignment /member order & inheritance
Asked Answered
G

9

34

How do data members get aligned / ordered if inheritance / multiple inheritance is used? Is this compiler specific?

Is there a way to specify in a derived class how the members (including the members from the base class) shall be ordered / aligned?

Gervais answered 5/1, 2010 at 14:12 Comment(3)
related: #2006716Easterling
Remember that the size of a structure (or class) may not be equal to the sum of the size of its members, primarily because compilers are allowed to add padding between members. A more robust method is to read packed data into an unsigned char buffer, then load the members from that buffer. In similar, write members to a buffer, the output the buffer. This will prevent any alignment or packing issues from wreaking havoc with your program.Casaba
i don't worry about padding - padding is fine - i just want to be able to predict the raw data format of a simple struct that is a decendant of multiple other simple structsThermobarograph
G
80

Really you’re asking a lot of different questions here, so I’m going to do my best to answer each one in turn.

First you want to know how data members are aligned. Member alignment is compiler defined, but because of how CPUs deal with misaligned data, they all tend to follow the same

guideline that structures should be aligned based on the most restrictive member (which is usually, but not always, the largest intrinsic type), and strucutres are always aligned such that elements of an array are all aligned the same.

For example:

struct some_object
{
    char c;
    double d;
    int i;
};

This struct would be 24 bytes. Because the class contains a double it will be 8 byte aligned, meaning the char will be padded by 7 bytes, and the int will be padded by 4 to ensure that in an array of some_object, all elements would be 8 byte aligned (the size of an object is always a multiple of its alignment). Generally speaking this is compiler dependent, although you will find that for a given processor architecture, most compilers align data the same.

The second thing you mention is derived class members. Ordering and alignment of derived classes is kinda a pain. Classes individually follow the rules I described above for structs, but when you start talking about inheritance you get into messy turf. Given the following classes:

class base
{
    int i;
};

class derived : public base // same for private inheritance
{
    int k;
};

class derived2 : public derived
{
    int l;
};

class derived3 : public derived, public derived2
{
    int m;
};

class derived4 : public virtual base
{
    int n;
};

class derived5 : public virtual base
{
    int o;
};

class derived6 : public derived4, public derived5
{
    int p;
};

The memory layout for base would be:

int i // base

The memory layout for derived would be:

int i // base
int k // derived

The memory layout for derived2 would be:

int i // base
int k // derived
int l // derived2

The memory layout for derived3 would be:

int i // base
int k // derived
int i // base
int k // derived
int l // derived2
int m // derived3

You may note that base and derived each appear twice here. That is the wonder of multiple inheritance.

To get around that we have virtual inheritance.

The memory layout for derived4 would be:

void* base_ptr // implementation defined ptr that allows to find base
int n // derived4
int i // base

The memory layout for derived5 would be:

void* base_ptr // implementation defined ptr that allows to find base
int o // derived5
int i // base

The memory layout for derived6 would be:

void* base_ptr // implementation defined ptr that allows to find base
int n // derived4
void* base_ptr2 // implementation defined ptr that allows to find base
int o // derived5
int i // base

You will note that derived 4, 5, and 6 all have a pointer to the base object. This is necessary so that when calling any of base's functions it has an object to pass to those functions. This structure is compiler dependent because it isn't specified in the language spec, but almost all compilers implement it the same.

Things get more complicated when you start talking about virtual functions, but again, most compilers implement them the same as well. Take the following classes:

class vbase
{
    virtual void foo() {}
};

class vbase2
{
    virtual void bar() {}
};

class vderived : public vbase
{
    virtual void bar() {}
    virtual void bar2() {}
};

class vderived2 : public vbase, public vbase2
{
};

Each of these classes contains at least one virtual function.

The memory layout for vbase would be:

void* vfptr // vbase

The memory layout for vbase2 would be:

void* vfptr // vbase2

The memory layout for vderived would be:

void* vfptr // vderived

The memory layout for vderived2 would be:

void* vfptr // vbase
void* vfptr // vbase2

There are a lot of things people don't understand about how vftables work. The first thing to understand is that classes only store pointers to vftables, not whole vftables.

What that means is that no matter how many virtual functions a class has, it will only have one vftable, unless it inherits a vftable from somewhere else via multiple inheritance. Pretty much all compilers put the vftable pointer before the rest of the members of the class. That means that you may have some padding between the vftable pointer and the class's members.

I can also tell you that almost all compilers implement the pragma pack capabilities which allow you to manually force structure alignment. Generally you don't want to do that unless you really know what you are doing, but it is there, and sometimes it is necessary.

The last thing you asked is if you can control ordering. You always control ordering. The compiler will always order things in the order you write them in. I hope this long-winded explanation hits everything you need to know.

Gambetta answered 5/1, 2010 at 17:48 Comment(8)
yees - this is very very good! thanks a lot! just one more question - do those rules also hold if non-default constructors and destructors are defined?Gervais
Absolutely. Constructors and destructors have no effect on class layout unless the destructor is virtual in which case a vfptr must be present. Also, I didn't go into it because it was a bit outside the scope of the question but, be careful of the order of initialization. Initialization always occurs from low memory address to high except in the case of virtual inheritance where the virtually inherited objects are constructed first. Similarly destuctors are called from high address to low, except in the case of virtual inheritance, where the virtual inherited class's destructor is called last.Gambetta
i've just read the following (in another context): "the language standard says that byte-for-byte copies are guaranteed to work only for PODs. std::pair<T,U> isn't a class aggregate, since it has a user-defined constructor, and that means it also isn't a POD." is this just a bad explanation of why one can't predict std::pair<T,U>'s memory layout or is something changing once a user defined constructor is present?Thermobarograph
This is a complicated question but I'll try to answer it briefly. Once a user-declared (it doesn't even need to be defined) constructor is specified there is no longer a guarantee that the object has a default constructor, which means it is not a POD. That doesn't mean that it can't be copied via a memcpy, it just means that the language doesn't guarantee that the memcpy will make a true copy of the object. The reason for this (I believe) is that guaranteeing byte-for-byte copies of complex or polymorphic objects would be a royal pain and probably result in slower code execution.Gambetta
do the rules also apply if a template is used? let's say base is templated and stores a value of type T instead of int (let's assume only POD types are used as template parameter) - will the int simply be replaced by a type of the templatetype or is there some additional magic going on then?Gervais
The use of templates should not alter this behavior at all. Template classes are generated at compile time and there will be a unique class generated for each combination of template parameters used.Gambetta
There is one exception about ordering. Ordering of bit fields apparently is implementation defined (and usually depends on endianness)Volumetric
The memory layout for derived6 does not include the local field p - presumably this is unintentional and it should appear immediately after the base ptr, but can someone more qualified than me confirm/refute this and update the post to correct/clarify?Bratwurst
H
3

It's not just compiler specific - it's likely to be affected by compiler options. I'm not aware of any compilers that give you fine grained control over how members and bases are packed and ordered with multiple inheritance.

If you're doing something that relies on order and packing, try storing a POD struct inside your class and using that.

Heteronym answered 5/1, 2010 at 14:21 Comment(3)
the datastructures in question ARE POD structs (at least if multiple inheritance from other POD structs still yelds a POD struct). will then the members just be ordered something like 'basePODStruct1_members'-padding-'basePODStruct2_members'-padding-...'derivedPODStruct_members'-padding ?Gervais
@genesys: if there's any inheritance (or virtual functions, or constructors or destructors), then the structure is not POD.Hegarty
The padding between members is compiler specific. The ordering of the members is defined by the language specification.Casaba
H
1

It is compiler specific.

Edit: basically it comes down to where the virtual table is placed and that can be different depending on which compiler is used.

Hesperides answered 5/1, 2010 at 14:17 Comment(2)
Nope, only the padding between members is compiler specific. The ordering of the members is defined by the language specifications.Casaba
Virtual table placement is "undefined". It just needs to be there. In the past there was a difference between how VC and GCC placed virtual tables in multiple inheritance cases ...Hesperides
M
1

As soon as your class is not POD (Plain old data) all bets are off. There are probably compiler-specific directives you can use to pack / align data.

Mccullum answered 5/1, 2010 at 14:17 Comment(2)
The preference is not to use compiler directives to pack the structure, but use methods to read and write members to a packed buffer. This gives a more robust program, especially when compiler vendor or versions change.Casaba
Yes, but you have more code to maintain and introduce scope for errors if the structures change. Likelihood is your structures change more often than your compiler does ;) Of course, there are no end of serialisation/deserialiation strategies when you get started, designed to solve both the problems.Mccullum
V
1

Compilers generally align data members in structs to allow for easy access. This means that data elements will normally start on word boundaries and it gaps will normally be left in a struct to ensure that word boundaries are not straddled.

so

struct foo
{
    char a;
    int b;
    char c;
}

Will normally take up more than 6 bytes for a 32 bit machine

The base class is normally layed out first and the derived class it layed out after the base class. This allows the address of the base class to equal the address of the derived class.

In multiple inheritance there is an offset between the address of a class and the address of the second base class. >static_cast and dynamic_cast will calculate the offset. reinterpret_cast does not. C style casts do a static cast if possible otherwise a reinterpret cast.

As others have mentioned, all this is compiler specific but the above should give you a rough guide of what normally happens.

Voracious answered 5/1, 2010 at 14:27 Comment(2)
structs are treated no differently to classes for alignment and packing. What's important is whether the struct/class is POD or not.Heteronym
There can be an offset even with single inheritance. When a class with virtual functions derives from a POD type, popular compilers will arrange the memory as vtableptr + POD + derivedUntitled
M
1

The order of objects in multiple inheritance is not always what you specify. From what I've experienced, the compiler will use the specified order unless it can't. It can't use the specified order when the first base class does not have virtual functions and another base class has virtual functions. In this case, the first bytes of the class has to be a virtual function table pointer, but the first base class doesn't have one. The compiler will rearrange the base classes so that the first one has a virtual function table pointer.

I've tested this with both msdev and g++ and both of them rearrange the classes. Annoyingly, they seem to have different rules for how they do it. If you have 3 or more base classes and the first one doesn't have virtual functions, these compilers will come up with different layouts.

To be safe, pick two and avoid the other.

  1. Don't rely on the ordering of base classes when using multiple inheritance.

  2. When using multiple inheritance, put all base classes with virtual functions before any base classes without virtual functions.

  3. Use 2 or fewer base classes (since the compilers both rearrange in the same way in this case)

Mydriatic answered 27/6, 2013 at 19:17 Comment(0)
H
0

All compiler I know put the base class object before data members in a derived class object. Data members are in order as given in the class declaration. There might be gaps due to alignment. I'm not saying that it has to be this way though.

Helvetia answered 5/1, 2010 at 14:25 Comment(2)
The ordering of members is defined by the language specification, including inheritance. Padding between members and classes during inheritance are implementation defined.Casaba
just out of curiosity - how it comes the padding is not specified by the language?Thermobarograph
V
0

I can answer one of the questions.

How do data members get aligned / ordered if inheritance / multiple inheritance is used?

I've created a tool to visualize the memory layout of classes, stack frames of functions and other ABI information (Linux, GCC). You can look at the result for mysqlpp::Connection class (inherits OptionalExceptions) from MySQL++ library here.

enter image description here

Vitriolic answered 12/9, 2015 at 21:5 Comment(0)
S
0

The order of the members in memory is equal to the order in which they are specified in the program. Elements of non-virtual bases classes come before elements of the derived class. In the case of multiple inheritance, the elements of the first (left-most) class come first (and so on). Virtual base classes come last.

Each class/struct that is derived from a virtual base class has a pointer type prepended for its elements (theoretically implementation dependent).

The alignment of a class/struct is equal to the largest alignment of its members (theoretically implementation dependent).

Padding happens when the next element in memory needs it (for the sake of its alignment) (theoretically implementation dependent).

Trailing padding is added to make the size of an object a multiple of its alignment.

Complex example,

struct base1 {
  char m_tag;
  int m_base1;
  base1() : m_tag(0x11), m_base1(0x1b1b1b1b) { }
};

struct derived1 : public base1 {
  char m_tag;
  alignas(16) int m_derived1;
  derived1() : m_tag(0x21), m_derived1(0x1d1d1d1d) { }
};

struct derived2 : virtual public derived1 {
  char m_tag;
  int m_derived2_a;
  int m_derived2_b;
  derived2() : m_tag(0x31), m_derived2_a(0x2d2daa2d), m_derived2_b(0x2d2dbb2d) { }
};

struct derived3 : virtual public derived1 {
  char m_tag;
  int m_derived3;
  virtual ~derived3() { }
  derived3() : m_tag(0x41), m_derived3(0x3d3d3d3d) { }
};

struct base2 {
  char m_tag;
  int m_base2;
  virtual ~base2() { }
  base2() : m_tag(0x51), m_base2(0x2b2b2b2b) { }
};

struct derived4 : public derived2, public base2, public derived3 {
  char m_tag;    
  int m_derived4;
  derived4() : m_tag(0x61), m_derived4(0x4d4d4d4d) { }
};

Has the following memory layout:

 derived4 = derived2 -> ....P....O....I....N....T....E....R....
  subobject derived2 -> 0x31 padd padd padd 0x2d 0xaa 0x2d 0x2d 
                        0x2d 0xbb 0x2d 0x2d padd padd padd padd 
virual table = base2 -> ....P....O....I....N....T....E....R....
     subobject base2 -> 0x51 padd padd padd 0x2b 0x2b 0x2b 0x2b 
            derived3 -> ....P....O....I....N....T....E....R....
  subobject derived3 -> 0x41 padd padd padd 0x3d 0x3d 0x3d 0x3d 
  subobject derived4 -> 0x61 padd padd padd 0x4d 0x4d 0x4d 0x4d 
    derived1 = base1 -> 0x11 padd padd padd 0x1b 0x1b 0x1b 0x1b 
  subobject derived1 -> 0x21 padd padd padd padd padd padd padd 
                        0x1d 0x1d 0x1d 0x1d padd padd padd padd 
                        padd padd padd padd padd padd padd padd

Note that after casting a derived4 object to a derived2 or derived3, the new object starts with a pointer to the virtual base class, which is somewhere down below in the image of derived4, just like a real derived2 or derived3 object would.

Casting this derived4 to a base2 gives us an object that has a virtual table pointer, as it should (base2 has a virtual destructor).

The order of the elements is: first the (virtual base class pointer and) elements of derived2, then the (virtual table pointer and) elements of base, the (virtual base class pointer and) elements of derived3 and finally the elements of (the subobject of) derived4 -- all of that followed by the virtual base class derived1.

Also note that although a real 'derived3' object must be aligned at 16 bytes because it "contains" (at the end) the virtual base class derived1 which is aligned at 16, because it has a member with that is aligned at 16; but the 'derived3' that is used in the multiple inheritance here is NOT aligned at 16 bytes. This is OK, because the derived3 without the virtual base class has a max. alignment of just 8 (its virtual base class pointer; this is on a 64bit machine).

Sita answered 16/1, 2020 at 15:13 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.