reinterpret_cast creating a trivially default-constructible object
Asked Answered
P

3

58

cppreference states that:

Objects with trivial default constructors can be created by using reinterpret_cast on any suitably aligned storage, e.g. on memory allocated with std::malloc.

This implies that the following is well-defined code:

struct X { int x; };
alignas(X) char buffer[sizeof(X)];    // (A)
reinterpret_cast<X*>(buffer)->x = 42; // (B)

Three questions follow:

  1. Is that quote correct?
  2. If yes, at what point does the lifetime of the X begin? If on line (B), is it the cast itself that is considered acquiring storage? If on line (A), what if there were a branch between (A) and (B) that would conditionally construct an X or some other pod, Y?
  3. Does anything change between C++11 and C++1z in this regard?

Note that this is an old link. The wording was changed in response to this question. It now reads:

Unlike in C, however, objects with trivial default constructors cannot be created by simply reinterpreting suitably aligned storage, such as memory allocated with std::malloc: placement-new is required to formally introduce a new object and avoid potential undefined behavior.

Pongee answered 29/11, 2016 at 18:48 Comment(11)
I actually tried to figure out the question of when lifetime begins of those objects. I was not able to find a definitive answer in standard, and I believe, it is vague in this regard. As for first question, I doubt the quote is correct, since there is an aliasing rule to pay attention to.Blanchette
@Blanchette as long as the buffer is a char buffer, strict aliasing is not an issue.Skin
No, and I thought we went over this multiple times already? [intro.object]/1 exhaustively enumerates which language constructs can create objects.Betteanne
@RichardHodges, nope. char* can alias anything, but anything can't alias char*Blanchette
@Blanchette if that were true, it would not be allowable to alias the memory of a variant.Skin
@Betteanne Do you mind writing a good canonical answer for this? Help me, T.C., you're my only hope.Pongee
@RichardHodges, not sure what you mean by variant in this context.Blanchette
@Blanchette std::variant or boost::variant for example. The storage can't be allocated with a union because there's no way to build a union from a type list. So you use a std::aligned_storage, which is simply an aligned char buffer that is at least as big and as aligned as the most restrictive type in the type list.Skin
@RichardHodges Actually you can use a (recursive) union, and must use one if you want constexpr.Betteanne
@M.M That's because it just got fixed a few minutes agoBetteanne
@M.M I fixed the question wording.Pongee
B
38

There is no X object, living or otherwise, so pretending that there is one results in undefined behavior.

[intro.object]/1 spells out exhaustively when objects are created:

An object is created by a definition ([basic.def]), by a new-expression ([expr.new]), when implicitly changing the active member of a union ([class.union]), or when a temporary object is created ([conv.rval], [class.temporary]).

With the adoption of P0137R1, this paragraph is the definition of the term "object".

Is there a definition of an X object? No. Is there a new-expression? No. Is there a union? No. Is there a language construct in your code that creates a temporary X object? No.

Whatever [basic.life] says about the lifetime of an object with vacuous initialization is irrelevant. For that to apply, you have to have an object in the first place. You don't.

C++11 has roughly the same paragraph, but doesn't use it as the definition of "object". Nonetheless, the interpretation is the same. The alternative interpretation - treating [basic.life] as creating an object as soon as suitable storage is obtained - means that you are creating Schrödinger's objects*, which contradicts N3337 [intro.object]/6:

Two objects that are not bit-fields may have the same address if one is a subobject of the other, or if at least one is a base class subobject of zero size and they are of different types; otherwise, they shall have distinct addresses.


* Storage with the proper alignment and size for a type T is by definition storage with the proper alignment and size for every other type whose size and alignment requirements are equal to or less than those of T. Thus, that interpretation means that obtaining the storage simultaneously creates an infinite set of objects with different types in said storage, all having the same address.

Betteanne answered 29/11, 2016 at 19:29 Comment(29)
So nothing in [basic.lval]/8 is relevant because there is no "type of the object" via which we're accessing because there's no object?Pongee
@Pongee Well, there is an object - buffer, the char array.Betteanne
With the greatest respect (this is not tongue in cheek, your answers are very informative) if this were the case, then allocating an object via malloc would be undefined behaviour. Yet §3.8 explicitly allows it. There seems to be a disconnect in the wording of the standard.Skin
@RichardHodges First, footnotes are non-normative. Second, that footnote pertains to the definition of "safely-derived pointer", which is completely unrelated - that concept is there for GC support. Third, it is fairly well-established that malloc alone is not sufficient to create an object under the current wording - P0137 explicitly refers to that as the status quo.Betteanne
@RichardHodges Where in §3.8? What version of the standard?Billion
@Betteanne While I understand that, the inability to make using auto p_int = (int*)malloc(sizeof(int)); defined behavior seems like a really bad idea. I get that making it defined behavior is hard, but the alternative is horrible. Reems and reems of legacy code gone from "in practice working" to "anathema". If the standard did not permit that, the standard was wrong; the way to fix it is to fix the standard, not make the standard's error more explicit.Billion
I think "it's there for GC" seems a little glib. The footnote specifically mentions "other languages" and "C". GC is not mentioned at all. If we accept the the X-pointer is not pointing to an X, but memory of the correct alignment and size to accomodate an X, and an X is a POD, then I struggle to see how the code can possibly be UB. This would make interfacing with C libraries UB. Patently it is not. The standard seems to be contradicting itself.Skin
@Yakk I lifted it from N4527, pages 68-69Skin
@Yakk Well, that's undefined anyway for reading an uninitialized object :) The current state of affairs is certainly suboptimal - the formal object model makes std::vector unimplementable in standard C++ - but making it work is nontrivial.Betteanne
@RichardHodges Airlifting a footnote out of context doesn't help your case. That footnote is attached to [basic.stc.dynamic.safety]/2.1, and by happenstance [basic.life] started on the same page in that particular version of the working draft. "safely-derived pointer" is only relevant on implementations with strict pointer safety (aka GC'd implementations), which is an empty set AFAIK. It sheds absolutely zero light on the meaning of [basic.life], because it is dealing with a completely different subject.Betteanne
@Betteanne I'm not sure that I have a case to make, other than the sure and certain knowledge that a POD that has been malloced and then cast is safe to use and will yield expected behaviour in all cases. This is the foundation of C interoperability. In addition, there is the std::aligned_storage et.al. which are specifically there to allow this kind of gerrymandering. It simply is not correct to say that an object can only be born by definition, new, union or temporary. It can also be forced into existence through these means. I'm not saying you are wrong - the standard is.Skin
@Betteanne I'll be explicit: auto p_int = (int*)malloc(sizeof(int)); *p_int = 0; std::cout << *p_int << "\n"; -- anything that doesn't make that standards compliant should be a non-starter. That is legacy C-style memory handling, and it exists in massive legacy code bases that have compiled and worked in C++ for 30+ years. If the C++ standard says "that isn't defined", it is a flaw in the standard. I get why it is hard, but leaving it ambiguous or poorly worded is better than explicitly stating that is undefined.Billion
@Yakk The relevant wording has been around in every standard. It's of course a problem, but a long-standing one.Betteanne
@Yakk IMO having the standard be clear is better than having it be ambiguous or poorly worded. Then discussion can at least move onto fixing it instead of having endless threads like this where people apply their own interpretation and we argue about whose interpretation is [better | was the intent | etc.]Dabble
@T.c. prior to 1776, "by the implementation (12.2) when needed." left a lot of lattitude. "when needed". Other clauses refering to object lifetime would imply an object was needed. After 1776, object lifetime moment of creation was pinned down. Prior ambiguity on when an object actually exists meant that the standards was ambiguous about if int_p could be used; this change seems to make it explicitly illegal to use it as there is no object there. That seems wrong. Or am I reading it incorrectly?Billion
@Dabble No; if it is unambiguously undefined behavior to use that int_p, some idiot on a compiler team might actually break code that uses it and get people on side (after all, the standard is clear!). If it requires convoluted reasoning that is ambiguously correct to justify the same, other people are more likely to smack them upside the head for being an idiot. Anything that "clarifies" that the int_p use is illegal is either changing the standard to be broken, or polishing a standard defect.Billion
@Yakk Not really, the cross-reference to 12.2 means it's only talking about the cases in that section ([class.temporary]).Betteanne
@Yakk compilers can offer extensions; if a compiler previously supported creating an object in this way,and now decides not to support that, that's a business decision on their part. Compilers are supposed to help their users to achieve programming goals, not break working code on purpose.Dabble
@Dabble Yet that's what gcc 6 ended up doing with the null pointer check that broke Qt/Chromium.Pongee
@Pongee Code that relies on the behaviour of dereferencing null pointers is a ticking timebomb, the problem would arise sooner or later anyway. I would argue that gcc never explicitly supported defined behaviour of dereferencing null pointers - what you got was just happenstance. In terms of the userbase, there's a conflict between those who want defined behaviour of dereferencing null, and those who don't want their code slowed down by runtime null pointer checks being inserted (etc.). But I don't see any similar conflict in this case.Dabble
@Dabble Code that relies on UB is a ticking timebomb in general. I don't think there's anything in particular about one form of UB or another.Pongee
@GundolfGundelfinger I'd say that the behaviour of mmap (and the status of any memory "retured" by it) it is outside of what is covered by the standard. In a vacuum the compiler would have to assume that it might have had objects created correctly in itDabble
Thank you for taking the time to update the answer. In the light of re-reading the draft standard, plus P0137 I have posted a new, extremely carefully worded, question - complete with compilable code. I would be truly grateful if you could give it a careful look. I believe an informed answer will be of benefit to the community. #40930975Skin
"Is there a language construct in your code that creates a temporary X object?" yes, the definition of the buffer object. Anyway, this definition of an object is broken.Flier
"Thus, that interpretation means that obtaining the storage simultaneously creates an infinite set of objects with different types in said storage, all having the same address." Yes, it does. Do you have a problem with that?Flier
Please consider these four events: (1) an object comes into existence as per your answer ("you have to have an object") (2) an object is created as per intro.object (3) an object lifetime begins as per basic.life (4) storage suitable for an object is obtained as per basic.life. Which of these can be considered separate independent events? In what order are they sequenced?Centrosphere
@n.m.: 2 is the means by which 1 takes place. 4 happens before 3. So the order of operations is always 2, 4, 3. Now, 4&3 may happen simultaneously (acquiring the memory starts its lifetime), but you cannot start the lifetime of an object before you've acquired storage for it. After all, [basic.life]/1 says that vacuous initialization happens when you acquire storage for the object. So that has to already have happened.Laktasic
@Betteanne It is now defined behavior in C++20. According to [intro.object]/13, beginning a lifetime of an array of chars implicitly creates another object in the storage occupied by the array, provided that another object is of an implicit-lifetime type. It is time to update the answer.Disendow
@kalaider: What must one do to cause a region of storage to revert to being an "array of chars", thus allowing the implicit creation of a new object within it?Lina
A
7

Based on p0593r6 I believe the code in the OP is valid and should be well defined. The new wording, based on the DR retroactively applied to all versions from C++98 inclusive, allows implicitly object creation as long as the created object is well defined (tautology is sometimes the rescue for complicated definitions), see § 6.7.2.11 Object model [intro.object]):

implicitly-created objects whose address is the address of the start of the region of storage, and produce a pointer value that points to that object, if that value would result in the program having defined behavior [...]

See also: https://mcmap.net/q/23300/-is-circumventing-a-class-39-constructor-legal-or-does-it-result-in-undefined-behaviour

Accustom answered 25/5, 2020 at 10:33 Comment(0)
O
3

This analysis is based on n4567, and uses section numbers from it.

§5.2.10/7: When a prvalue v of object pointer type is converted to the object pointer type “pointer to cv T”, the result is static_cast<cv T*>(static_cast<cv void*>(v)).

So, in this case, the reinterpret_cast<X*>(buffer) is the same as static_cast<X *>(static_cast<void *>(buffer)). That leads us to look at the relevant parts about static_cast:

§5.2.9/13: A prvalue of type “pointer to cv1 void” can be converted to a prvalue of type “pointer to cv2 T”, where T is an object type and cv2 is the same cv-qualification as, or greater cv-qualification than, cv1. The null pointer value is converted to the null pointer value of the destination type. If the original pointer value represents the address A of a byte in memory and A satisfies the alignment requirement of T, then the resulting pointer value represents the same address as the original pointer value, that is, A.

I believe that's enough to say that the original quote is sort of correct--this conversion gives defined results.

As to lifetime, it depends on what lifetime you're talking about. The cast creates a new object of pointer type--a temporary, which has a lifetime starting from the line where the cast is located, and ending whenever it goes out of scope. If you have two different conversions that happen conditionally, each pointer has a lifetime that starts from the location of the cast that created it.

Neither of these affects the lifetime of the object providing the underlying storage, which is still buffer, and has exactly the same lifetime, regardless of whether you create a pointer (of the same or converted type) to that storage or not.

Octad answered 29/11, 2016 at 19:15 Comment(5)
What's the conclusion though? Is your claim that the the pointer to X is created legally, but that it can't actually be dereferenced (e.g., the ->x is UB) because they don't point to a created X object? It isn't clear to be the relevance of the lifetime of the pointers themselves and it's hard to understand on what side of the debate this answer comes down on.Karolkarola
Yes, creating the pointer has defined behavior, but dereferencing the pointer gives UB. I considered his question about lifetime somewhat ambiguous, so I pointed out the lifetime of every object in the code, even though I agree that the lifetime of the pointers themselves probably isn't what he cared about. He asked about the lifetime of the X, and there is no actual X involved, just a pointer to X initialized with the address of a buffer of chars.Octad
Right, but at the end, the code dereferences the pointer as if there was an X - if that isn't going to work (the crux of the question, really), maybe point it out?Karolkarola
@BeeOnRope: I'm hesitant to say that. The reality is that it's officially undefined behavior, but it will work (for almost any reasonable definition of the word) on every known implementation, and I'd expect it to continue working essentially permanently. The simple fact is that breaking this breaks essentially all C compatibility, and I doubt there's even one compiler vendor that's willing to throw that away.Octad
Fair enough - it's that exact "problem" that caused me to come here, since I find it hard to believe (for example) that memcpying a trivially copyable type into suitable aligned uninitialized storage isn't allowed by the standard, but that seems to be the place we're in today :(.Karolkarola

© 2022 - 2024 — McMap. All rights reserved.