How is access for private variables implemented in C++ under the hood?
Asked Answered
E

2

14

How does the compiler control protection of variables in memory? Is there a tag bit associated with private variables inside the memory? How does it work?

Ellipsoid answered 14/7, 2012 at 19:10 Comment(1)
compiler = compile-time, memory = run-time, big differencePinkie
A
20

If you mean private members of instances, then there's no protection whatsoever at run-time. All protection takes place at compile-time and you can always get at the private members of a class if you know how they are laid out in memory. That requires knowledge of the platform and the compiler, and in some cases may even depend on compiler settings such as the optimization level.

E.g., on my Linux/x86-64 w/GCC 4.6, the following program prints exactly what you expect. It is by no means portable and might print unexpected things on exotic compilers, but even those compilers will have their own specific ways to get to the private members.

#include <iostream>

class FourChars {
  private:
    char a, b, c, d;

  public:
    FourChars(char a_, char b_, char c_, char d_)
      : a(a_), b(b_), c(c_), d(d_)
    {
    }
};

int main()
{
    FourChars fc('h', 'a', 'c', 'k');

    char const *p = static_cast<char const *>(static_cast<const void *>(&fc));

    std::cout << p[0] << p[1] << p[2] << p[3] << std::endl;
}

(The complicated cast is there because void* is the only type that any pointer can be cast to. The void* can then be cast to char* without invoking the strict aliasing rule. It might be possible with a single reinterpret_cast as well -- in practice, I never play this kind of dirty tricks, so I'm not too familiar with how to do them in the quickest way :)

Achaemenid answered 14/7, 2012 at 19:12 Comment(25)
@LuchianGrigore: Why would it be?Solvable
@LuchianGrigore: I don't think so. Given that a FourChars must have at least size four, the result of printing it's first four bytes has implementation-dependent, but not undefined behavior.Achaemenid
@Mehrdad: reinterpret_cast<>?Balboa
@Vlad: I lazily used the first cast that worked. A C-style cast would have the same result.Achaemenid
But if you had a virtual function in there, it wouldn't work any more. Can you really rely on the layout of a class?Workable
I agree with you that it works in 100% of the cases (this exact samples), I'm just curious whether it's defined or not...Workable
@Mehrdad: Imagine that the compiler decides to keep data somewhere, leaving the memory of the class with just a pointer to actual data -- this is perhaps not prohibited? reinterpret_cast is perhaps not a problem, accessing the pointer is.Balboa
@Balboa I don't think the compiler has that kind of freedom - just store the members somewhere else.Workable
@Vlad: yes, that's prohibited. You can always cast a pointer to a POD type to void*, char* or unsigned char*. If you couldn't, you could never implement std::memcpy and friends. (Other casts may cause aliasing problems.)Achaemenid
@luchian: in fact, gcc sometimes optimizes the whole classes out, not saying about just members.Balboa
@Balboa that's not a good example, and it most certainly won't do that in this case.Workable
@luchian: it won't, of course, but I think it has right to.Balboa
@Balboa not if it affects observable behavior, which in this case (or any similar case) it would.Workable
@Mehrdad: well, given that the code uses offset_of, the layout may be fixed by this only fact. But in case of absence of valid access to the members, isn't the compiler free to relocate or even remove them?Balboa
@luchian: I wonder if the semantics of observable behaviour is preserved even for reinterpret_cast.Balboa
@Vlad: You may be right, I guess I don't know enough about C++ to answer this. But in practice I don't expect it will cause trouble.Solvable
@Vlad: you may be right. I've replaced the reinterpret_cast by a static_cast via const void*.Achaemenid
It is perfectly well defined for reinterpret_cast.Hayward
@larsmans: I did not mean that your example is somehow "invalid", so from my POV both examples are right. Your code just proves that there is no run-time protection.Balboa
So could you also call private member functions?Milissa
Draft Standard N3242 9.2.18 Two standard-layout struct (Clause 9) types are layout-compatible if they have the same number of non-static data members and corresponding non-static data members (in declaration order) have layout-compatible typesMilissa
Draft Standard N3242 9.2.20 A pointer to a standard-layout struct object, suitably converted using a reinterpret_cast, points to its initial member (or if that member is a bit-field, then to the unit in which it resides) and vice versa. [ Note: There might therefore be unnamed padding within a standard-layout struct object, but not at its beginning, as necessary to achieve appropriate alignment. end note ]Milissa
@rhalbersma: thanks for the reference. As for the private member functions, that's an interesting question to which I don't immediately know the answer. If you ask it and post the link here, I'll upvote.Achaemenid
@rhalbersma This isn't a standard-layout struct because it has private members (quite relevant to the Q) and the standard was ratified well over a year ago so you should get the final N3290 and dispose of the draft.Schatz
@Potatoswatter: Sorry for the late comment, but a standard-layout struct has all members with the same accessibility. It does not need to be public, it may aswell be private or protected. See this question.Lesotho
W
3

It's the compiler's job to see that some members are private and disallow you from using them. They aren't any much different from other members after compilation.

There is however an important aspect, in that data members aren't required to be laid out in memory in the order in which they appear in the class definition, but they are required to for variables with the same access level.

Workable answered 14/7, 2012 at 19:15 Comment(4)
By a pure C++ standpoint, the sample here https://mcmap.net/q/808177/-how-is-access-for-private-variables-implemented-in-c-under-the-hood is UB, since the variables can be somehow padded to get a given alignment (hence, (&a)[1] is not necessarily b, even if it is granted that &b > &a). But once the compiler is given and the compiling options defined, the behavior does not depend on the execution. If it works will always work, if it doesn't ... will never. Is fact, it is non-portable code.Pernickety
@EmilioGaravaglia: if there's padding, then my example does not display UB; its behavior is just implementation-dependent. I.e., it will print some arbitrary byte, but it won't crash. And yes, it's non-portable, maybe I should state that even more explicitly.Achaemenid
@larsmans: Acording to the C++ specification definition of UB (because UB itself is defined :-) ), UB does not mean necessarily "crash". It just mean "not defined by the C++ specification temnselves". If it is "compiler dependent", for the ISO C++, is UB. In fact we are just saying the same concept, with different wording, because of a different perspective (the language or the compiler)Pernickety
@EmilioGaravaglia: I know what UB means. I wasn't saying that UB causes crashes, but that my program doesn't because it doesn't cause UB. The C++ standard also leaves some things implementation-defined, where compiler writers are given a choice as to what behavior to implement. Alignment is an example; see draft Standard pp. 1318 ff. for a complete list.Achaemenid

© 2022 - 2024 — McMap. All rights reserved.