memory layout C++ objects [closed]
Asked Answered
H

5

50

I am basically wondering how C++ lays out the object in memory. So, I hear that dynamic casts simply adjust the object's pointer in memory with an offset; and reinterpret kind of allows us to do anything with this pointer. I don't really understand this. Details would be appreciated!

Hepplewhite answered 27/10, 2009 at 18:0 Comment(0)
U
16

Each class lays out its data members in the order of declaration.
The compiler is allowed to place padding between members to make access efficient (but it is not allowed to re-order).

How dynamic_cast<> works is a compiler implementation detail and not defined by the standard. It will all depend on the ABI used by the compiler.

reinterpret_cast<> works by just changing the type of the object. The only thing that you can guarantee that works is that casting a pointer to a void* and back to the same the pointer to class will give you the same pointer.

Urien answered 27/10, 2009 at 18:24 Comment(5)
Your first point isn't completely correct. The only guarantee you have is that members in the same access block will have a defined order. If you wanted to take this to its extreme, you could say that even where the access is the same that the order is no longer guaranteed.Finbar
@Richard. I am not sure I understand you. The compiler is not allowed to re-order elements (this is for back wards compatability with C). What is an access block. Can you point me at the correct part of the standard that you are getting your information from?Urien
An access block (really, access specifier) is the public:, private:, and protected: in a class. I found this article: embedded.com/design/218600150?pgno=1 extremely useful, and that's where I first learned what Richard mentioned in his comment. I looked up the C++ standard linked there, and on pg. 198 (sec. 9.2 clause 12) it states: "The order of allocation of non-static data members with different access control is unspecified"Steamheated
Does "order of allocation" refer to the order of memory address location or to the order in time at which allocation takes place?Lixivium
Compatibility with C applies only to types with "standard layout*. A "standard layout" type has (among other restrictions) all public, all private, or all protected member variables. The compiler is allowed to, for example lay out all of the private members first, then protected, then public, but within any block, they must be in order (but not necessarily contiguous). So yes, compilers are allowed to re-order, just not within an access-specifier block. In practice, I don't think any compiler does re-order the variables.Grosberg
I
35

Memory layout is mostly left to the implementation. The key exception is that member variables for a given access specifier will be in order of their declaration.

§ 9.2.14

Nonstatic data members of a (non-union) class with the same access control (Clause 11) are allocated so that later members have higher addresses within a class object. The order of allocation of non-static data members with different access control is unspecified (11). Implementation alignment requirements might cause two adjacent members not to be allocated immediately after each other; so might requirements for space for managing virtual functions (10.3) and virtual base classes (10.1).

Other than member variables, a class or struct needs to provide space for member variables, subobjects of base classes, virtual function management (e.g. a virtual table), and padding and alignment of these data. This is up to the implementation but the Itanium ABI specification is a popular choice. gcc and clang adhere to it (at least to a degree).

http://mentorembedded.github.io/cxx-abi/abi.html#layout

The Itanium ABI is of course not part of the C++ standard and is not binding. To get more detailed you need to turn to your implementor's documentation and tools. clang provides a tool to view the memory layout of classes. As an example, the following:

class VBase {
    virtual void corge();
    int j;
};

class SBase1 {
    virtual void grault();
    int k;
};

class SBase2 {
    virtual void grault();
    int k;
};

class SBase3 {
    void grault();
    int k;
};

class Class : public SBase1, SBase2, SBase3, virtual VBase {
public:
    void bar();
    virtual void baz();
    // virtual member function templates not allowed, thinking about memory
    // layout and vtables will tell you why
    // template<typename T>
    // virtual void quux();
private:
    int i;
    char c;
public:
    float f;
private:
    double d;
public:
    short s;
};

class Derived : public Class {
    virtual void qux();
};

int main() {
    return sizeof(Derived);
}

After creating a source file that uses the memory layout of the class, clang will reveal the memory layout.

$ clang -cc1 -fdump-record-layouts layout.cpp

The layout for Class:

*** Dumping AST Record Layout
   0 | class Class
   0 |   class SBase1 (primary base)
   0 |     (SBase1 vtable pointer)
   8 |     int k
  16 |   class SBase2 (base)
  16 |     (SBase2 vtable pointer)
  24 |     int k
  28 |   class SBase3 (base)
  28 |     int k
  32 |   int i
  36 |   char c
  40 |   float f
  48 |   double d
  56 |   short s
  64 |   class VBase (virtual base)
  64 |     (VBase vtable pointer)
  72 |     int j
     | [sizeof=80, dsize=76, align=8
     |  nvsize=58, nvalign=8]

More on this clang feature can be found on Eli Bendersky's blog:

http://eli.thegreenplace.net/2012/12/17/dumping-a-c-objects-memory-layout-with-clang/

gcc provides a similar tool, `-fdump-class-hierarchy'. For the class given above, it prints (among other things):

Class Class
   size=80 align=8
   base size=58 base align=8
Class (0x0x141f81280) 0
    vptridx=0u vptr=((& Class::_ZTV5Class) + 24u)
  SBase1 (0x0x141f78840) 0
      primary-for Class (0x0x141f81280)
  SBase2 (0x0x141f788a0) 16
      vptr=((& Class::_ZTV5Class) + 56u)
  SBase3 (0x0x141f78900) 28
  VBase (0x0x141f78960) 64 virtual
      vptridx=8u vbaseoffset=-24 vptr=((& Class::_ZTV5Class) + 88u)

It doesn't itemize the member variables (or at least I don't know how to get it to) but you can tell they would have to be between offset 28 and 64, just as in the clang layout.

You can see that one base class is singled out as primary. This removes the need for adjustment of the this pointer when Class is accessed as an SBase1.

The equivalent for gcc is:

$ g++ -fdump-class-hierarchy -c layout.cpp

The equivalent for Visual C++ is:

cl main.cpp /c /d1reportSingleClassLayoutTest_A

see: https://blogs.msdn.microsoft.com/vcblog/2007/05/17/diagnosing-hidden-odr-violations-in-visual-c-and-fixing-lnk2022/

Ilise answered 29/12, 2014 at 1:26 Comment(2)
The information for g++ does not seem correct: error: unrecognized command line option ‘-fdump-class-hierarchy’Ironclad
Note that since GCC 8, the -fdump-class-hierarchy option has been replaced with -fdump-lang-class. (Source)Zoologist
U
16

Each class lays out its data members in the order of declaration.
The compiler is allowed to place padding between members to make access efficient (but it is not allowed to re-order).

How dynamic_cast<> works is a compiler implementation detail and not defined by the standard. It will all depend on the ABI used by the compiler.

reinterpret_cast<> works by just changing the type of the object. The only thing that you can guarantee that works is that casting a pointer to a void* and back to the same the pointer to class will give you the same pointer.

Urien answered 27/10, 2009 at 18:24 Comment(5)
Your first point isn't completely correct. The only guarantee you have is that members in the same access block will have a defined order. If you wanted to take this to its extreme, you could say that even where the access is the same that the order is no longer guaranteed.Finbar
@Richard. I am not sure I understand you. The compiler is not allowed to re-order elements (this is for back wards compatability with C). What is an access block. Can you point me at the correct part of the standard that you are getting your information from?Urien
An access block (really, access specifier) is the public:, private:, and protected: in a class. I found this article: embedded.com/design/218600150?pgno=1 extremely useful, and that's where I first learned what Richard mentioned in his comment. I looked up the C++ standard linked there, and on pg. 198 (sec. 9.2 clause 12) it states: "The order of allocation of non-static data members with different access control is unspecified"Steamheated
Does "order of allocation" refer to the order of memory address location or to the order in time at which allocation takes place?Lixivium
Compatibility with C applies only to types with "standard layout*. A "standard layout" type has (among other restrictions) all public, all private, or all protected member variables. The compiler is allowed to, for example lay out all of the private members first, then protected, then public, but within any block, they must be in order (but not necessarily contiguous). So yes, compilers are allowed to re-order, just not within an access-specifier block. In practice, I don't think any compiler does re-order the variables.Grosberg
C
4

The answer is, "it's complicated". Dynamic cast does not simply adjust pointers with an offset; it may actually retrieve internal pointers inside the object in order to do its work. GCC follows an ABI designed for Itanium but implemented more broadly. You can find the gory details here: Itanium C++ ABI.

Corson answered 27/10, 2009 at 18:10 Comment(0)
O
4

As stated previously, the full details are complicated, painful to read, and really only useful to compiler developers, and varies between compilers. Basically, each object contains the following (usually laid out in this order):

  1. Runtime type information
  2. Non-Virtual base objects and their data (probably in order of declaration).
  3. Member variables
  4. Virtual base objects and their data (Probably in some DFS tree search order).

These pieces of data may or may not be padded to make memory alignment easier etc. Hidden in the runtime type information is stuff about the type, v-tables for virtual parent classes etc, all of which is compiler specific.

When it comes to casts, reinterpret_cast simply changes the C++ data type of the pointer and does nothing else, so you had better be sure you know what you're doing when you use it, otherwise you're liable to mess things up badly. dynamic_cast does very much the same thing as static_cast (in altering the pointer) except it uses the runtime type information to figure out if it can cast to the given type, and how to do so. Again, all that is compiler specific. Note that you can't dynamic_cast a void* because it needs to know where to find the runtime type information so it can do all its wonderful runtime checks.

Oblige answered 27/10, 2009 at 18:49 Comment(0)
A
1

this question is already answered at http://dieharddeveloper.blogspot.in/2013/07/c-memory-layout-and-process-image.html here is a excerpt from there: In the middle of the process's address space, there is a region is reserved for shared objects. When a new process is created, the process manager first maps the two segments from the executable into memory. It then decodes the program's ELF header. If the program header indicates that the executable was linked against a shared library, the process manager (PM) will extract the name of the dynamic interpreter from the program header. The dynamic interpreter points to a shared library that contains the runtime linker code.

Archiepiscopal answered 23/6, 2014 at 17:34 Comment(1)
While it is valuable info, it answers a different question.Palmary

© 2022 - 2024 — McMap. All rights reserved.