Does memory layout in ABI specifications apply only across ABI boundaries?
Asked Answered
P

2

6

Do memory layout related specifications in ABI standards generally apply only across ABI boundaries or also e.g. within a translation unit, or if that is not the case, do compilers generally make such additional guarantees?

If "generally" is too broad, consider e.g. GCC/Clang with the System V x64 and Itanium C++ ABIs.

Here are two examples for what I mean:

  1. The System V x64 ABI specifies that arrays of size at least 16 bytes have an alignment of at least 16 bytes, even if the alignment of the element type is smaller and are therefore aligned stricter than alignof would suggest. It also specifies that the alignment of long double is 16. So is the following function that has undefined behavior under the C++ standard if called, safe to use under the System V x86 ABI, even though the storage array is never exposed across translation unit boundaries?

    void f() {
        char storage[16]; // Only guaranteed to have alignment `1` by the C++ standard.
        using T = long double;
        auto p = new(storage) T;
    }
    
  2. The Itanium C++ ABI specifies the layout of classes. For example:

    #include<new>
    
    struct A {
        int i;
        virtual ~A() {}
    };
    
    struct B : A {
        int j;
    };
    
    void f() {
        B b;
        std::launder(reinterpret_cast<A*>(&b))->i = 1;
    }
    

    f when called has undefined behavior under the C++ standard, because B and A are not standard-layout and therefore it is unspecified whether the A subobject is located at the same address as b, which causes undefined behavior on the std::launder if it doesn't. Under the Itanium C++ ABI however it is guaranteed that the A subobject has the same address as b and therefore the std::launder will succeed. So under the Itanium C++ ABI, is this safe, even though b is never passed over translation unit boundaries?

I assume that both my examples are safe, but is this specified, either in the referenced standards or as by policies of the compilers?

Preconscious answered 31/3, 2020 at 5:40 Comment(1)
From the Itanium C++ ABI: "In this document, we specify the Application Binary Interface for C++ programs, that is, the object code interfaces between user C++ code and the implementation-provided system and libraries.". For me, that doesn't say anything from the data objects that used only inside a binary. So it might possible that the memory layout of "inner" classes are different, but practically I doubt that any compiler uses different binary layout "inside" and "outside".Localize
P
0

Yes, both instances are safe according to my reading.

I cannot point you to a section of the Itanium C++ ABI, but you seem to be firm about what it has to say anyway.

But I do know:

One possible manifestation of behavior that is undefined according to the C++ standard is that some implementation of the language, such as the Itanium C++ ABI, guarantees a certain behavior for that construct.

That is, if one standard says "That's not defined", and another standard says "That's defined to do Y", then, if your implementations conforms to both standards, you should be able to assume that "Y" happens.

(Side note: On the other hand, if one standard says "That's defined to do X", and another standard says "That's defined to do Y", then, if "X" != "Y", your implementation cannot conform to both standards.)

Prayer answered 2/4, 2020 at 6:41 Comment(3)
"but you seem to be firm about what it has to say anyway": I am not really. The section which I think is relevant in the introduction is not clear enough formulated for me to be sure whether it intends to apply only between libraries or also inside individual translation units. That is basically why I am asking.Preconscious
"then, if your implementations conforms to both standards, you should be able to assume that "Y" happens.": Yes, that is clear and I think that it applies here, because it would otherwise be difficult to use such implementation details in programs, but I am not sure whether the standards specify it themselves or whether compilers just extend the specifications in these ABI standards or whether I am mistaken and it is not safe to use these constructs after all. I have multiple times read comments that claim that relying on ABI specification in scenarios in the question is not safe.Preconscious
The C++ Standard places certain restrictions on how the program text behaves. Additional standards do not "extend" the behavior, but place additional restrictions on how the program text behaves. For example, C++ places no restrictions on what a function named getuid() does; but if the implementation also conforms to POSIX, the function has a very particular meaning. That's another restriction! With this in mind, you can say that your constructs are not safe under C++ alone, but under both C++ and Itanium ABI, the constructs are safe.Prayer
D
0

You are asking two questions:

[...], do compilers generally make such additional guarantees?:

They would be pretty much unusable otherwise, but I don't think it is possible to give a definitive answer to this. We would have to prove that no compiler exists that does not make such guarantees.

Thus, "generally" is too broad.

When looking at specific compilers/platforms, your examples lead to the second question:

Will my undefined-behavior examples work on certain platforms?

You already know that the examples have UB. If the compiler detects that, it can do whatever it wants with that code, e.g.

  1. drop the whole thing
  2. generate code that formats your SSD
  3. generate code that does what you are hoping for

The problem with UB is: even if today your compiler goes for the third option, there is no guarantee whatsoever that it will do so again tomorrow.

See also cppreference

So the answer to the second question is: well, maybe, but there is no guarantee.

Dorking answered 5/4, 2020 at 9:6 Comment(3)
I wrote in my question: "If "generally" is too broad, consider e.g. GCC/Clang with the System V x64 and Itanium C++ ABIs.". For the second part: The code is UB by the C++ standard, but if another standard that the compiler adheres to makes it defined, then the compiler will have to output code to that effect. The additional standards I am asking about are the ABI specifications, which at least GCC and Clang definitively adhere to. But I don't know whether the ABI specifications themselves (with the C++ standard) actually define the behavior of the programs I mentioned. That is the question.Preconscious
@Preconscious Got it, and I confirmed that (at least IMHO), "generally" is too broad :-) As for the specific case, you explain your intention/hope with examples. for those "The code is UB by the C++ standard" is your answer. The compiler is allowed to do something sensible, but there is no guarantee. Edited my answer a bit.Dorking
@Preconscious To phrase it differently, I think your examples ask for something that I would not have read from the first question. You seem to hope that the C++ standard is a template that behaves according to the target platform. That would be super frightening, though. How could anyone write any platform independent code? Thus: If it is UB according to the standard, then it is UB.Dorking

© 2022 - 2024 — McMap. All rights reserved.