All programs were UB before C++20?

Asked 13/9, 2023 at 5:44 Answered 14/9, 2023 at 1:17

c++language-lawyer c++20 undefined-behavior

Yesterday I heard this in a talk from David Stone

Prior to C++20 it was not possible to implement std::vector, all vector implementations, if they are written in C++, had undefined behavior.

But didn't really pay to much attention to it.

Until today, when I watched Quantum Interpretations of the C++ Object Model, which starts with him saying

Prior to C++20 all C++ programs were undefined. Since C++20 they're probably still undefined, but at least not for lifetime reasons.

What is he talking about?

I know there are things that can't be implemented just by using what's available in the language (I guess std::mutex definitely can't?); and I know that this is obviously a problem that exists in all (?) programming languages, e.g. Haskell's seq requires compiler support.

But is anybody really telling me that I can't implement, say std::monostate?

My point is that there are things that can and things that can't be implemented just by using language tools. What makes std::vector belong to the latter category?

Tornado answered 13/9, 2023 at 5:44 Comment(5)

arent the stl and vendors excluded from what invokes UB? The STL is allowed to use compiler extensions and is thus excempt from strict standard practices. the MSVC runtime for example uses anonymous structs, which is not allowed by the C or C++ standard. – Electronic 13/9, 2023 at 5:54

Is it really undefined behaviour if it concerns implementation done by the compiler's vendor? – Monogyny 13/9, 2023 at 5:55

These statements basically mean that you cannot implement your own version of std::vector using C++. The implementations provided by the compiler (vendor) have well-defined behavior by the mere fact that it is defined by the Standard. Vendors can use whatever magic they need to make it work including the magic knowledge that their compiler actually does "the right thing" when it sees an implementation of std::vector that follows C++ syntax. – Apollonian 13/9, 2023 at 6:3

Undefined behaviour isn't voodoo, it means the standard imposes no restrictions on behaviour. The library doesn't rely on the guarantees made by the standard, it relies on the compiler team being nice people. So yes, the standard imposes no restrictions on what the compiler do to std::vector, but the standard imposes restrictions on what the resulting std::vector do to you. – Aceae 13/9, 2023 at 8:6

@Electronic nested anonymous structs are legal in C11. Function-scope anonymous structs existed before that as mainstream as a byproduct of syntax definition, I'm not sure if they were made illegal by ISO C. – Goetz 13/9, 2023 at 8:24

What is he talking about?

He is not saying that literally all C++ programs were undefined.

For example, setting aside that there are almost certainly oversights in the standard that make it self-contradictory, I am pretty sure that the following program does not have undefined behavior:

int main() {}

What he is talking about is that practically all actual non-trivial C++ projects will use one of the many techniques he lists in the following slides and for most them everyone just assumes defined behavior (even if possibly implementation-defined or unspecified). But all of the cases he lists have been and some still are UB.

In particular any use of dynamic memory as a certain type without an explicit new expression had to be UB, because there was no implicit object creation and that's the first main point of the talk. With C++20 implicit object creation rules have been added that should cover most of these cases where defined behavior was commonly assumed with regards to the existence and lifetime of objects.

It is also true that without implicit object creation (+ special casing of std::allocator::allocate) it is impossible to implement std::vector in C++ that has defined behavior according to the standard, because that's the only way you can create an array object on which pointer arithmetic is allowed without starting the lifetime of the individual array objects. But that's not a problem for the standard library implementation, because nothing requires it to be written in portable C++ that has defined behavior according to the standard. The standard library implementer knows the compiler on which it will be used and what additional guarantees it will make (e.g. that it is permitted to do pointer arithmetic on adjacent objects of same type, even if there is no encompassing array object).

Trangtranquada answered 14/9, 2023 at 1:17 Comment(8)

Formally problems he described with vector began since C++14. Before both rules were different and std::vector had less features (and C++98 std::vector is well-defined...) – Goetz 14/9, 2023 at 8:8

@Swift-FridayPie C++98 also has the array object problem (at least after cplusplus.github.io/LWG/issue69). In C++98 pointer arithmetic is also defined only in array objects, but an array object can be created only by an explicit new of the whole array in dynamic memory. So it is impossible in general to implement std::vector so that the contiguity property from the linked issue can be satisfied. – Trangtranquada 14/9, 2023 at 9:59

Array object problem I talk about isn't that, but is a question if we are allowed to do pointer arythmetics on a pointer to char if that's within object's storage (CWG1314). Since beginning vector used placement new and array of chars. And it always was starting with at least one element, usually more. Note is is always allowed to pointer arythmetics to size of array plus one. And having a separate iterotar base type, nothing was prohibiting to use implementation of "pointer to array and offset". arythmetics upon iterators not necessary meant to be poiter arythmetics. – Goetz 14/9, 2023 at 10:35

@Swift-FridayPie: If you have a std::vector<T> v; (not showing code to add contents), you can pass &v[0] or v.data() and v.size() to some other function and it can use pointer arithmetic to access all elements, just as if you passed in an array (which decays to a pointer to its first element). vectors are not always accessed using std::vector<T>::iterator. – Envoi 20/9, 2023 at 19:39

@BenVoigt v[0] was/is defined as UB in that case, as access to non-existing element, so that's fair. Vector started allocated, so v.data()+1 was appearing well-defined from compiler's point of view albeit UB from standard's, only v.data() + size() is valid (so one should take in account if size() is 0, then doing something with data() is UB). – Goetz 21/9, 2023 at 5:54

@Swift-FridayPie: v[0] is perfectly well defined, that's a custom operator[] not transformed to *(v+0). Were you assuming that v is empty despite the fact I told you it wasn't? – Envoi 21/9, 2023 at 16:29

@BenVoigt I guess you did that in roundabut way I didn't parse well (ESL, heh). In that case I don't see anything that is UB there. user17732522 were claiming it's not. Nothing required implicit object creation in C++98 context. – Goetz 21/9, 2023 at 17:52

Also note that my initial comment was clarifying your comment I replied to, not disagreeing. Yes, iterators can have special handling and not rely on pointer arithmetic. I was just pointing out that this doesn't excuse the std::vector author from also making pointer arithmetic work correctly. (As a further note, while the iterator can store the element index it is pointing to, it cannot store a pointer to the vector itself, because that wouldn't behave correctly when swap() is called on the vector) – Envoi 21/9, 2023 at 17:59

In C++11 and later there are plenty more. std::any, std::optional, std::future, std::thread, std::atomic, std::shared_ptr and many more... you can't implement them in a fully compliant way.

Most of subsystem controllers (thread API, time API, etc.), containers and thread-safety\thread-synchronization components aren't implementable by using strictly language's capacities. And not requred to, they are abstraction of compiler's or run-time's built-in functionality.

Undefined behaviour means a behaviour not described or restricted in any end-user documentation: of OS, of language standard or of the compiler. It doesn't necessary mean a catastrophic behaviour, it's an undocumented behaviour.

But program's behaviour is not undefined. On opposite, the standard

defines behaviour of those components, and
states that those components not necessarly have an implementation at all. These headers might not exists or contain tokens understandable only by this particular implementation of compiler.

As a by-product of this, defining your own version of std::vector would be an UB. It's undefined how compiler would parse the code which try to ODR-use such custom definition.

Goetz answered 13/9, 2023 at 8:15 Comment(21)

"not described or restricted in any end-user documentation" -- Hm... The term "undefined behavior", as used in the C++ standard, refers only to the behavior mandated by the C++ standard, not any other documentation, I think. For example, I remember someone gave an example saying that if you use some non-standard format specifier with printf(), you'll have undefined behavior, but sometimes this is totally OK, because the behavior could be described by some other document, a POSIX standard, for example. And MSVC allows %I64u and such, which is UB by the standard, but not by the MS specs. – Otti 13/9, 2023 at 8:32

@heapunderrun in case of implementation-defined, or unspecified behavior standard explicitly states so. "Implementation-defined behavior,for a well-formed program construct and correct data,that depends on the implementation and that each implementation documents." Undefined behaviour - where ISO "imposes no requirements", i.e it is NOT required to be implementation-defined or unspecified and IS NOT described in standard. This leaves random behaviour, "circumstance-defined" and internal implementation of compiler and runtime to be under the umbrella of UB. – Goetz 13/9, 2023 at 8:58

@heapunderrun your example is example of implementation-defined. Also, POSIX desribes macro-dfinition that make those portable, as originally it marks specific descriptors as platform-dependant (MS implementation doesn't require I64 or ll, at least older versions. And they implement that little-known header with those macroses). – Goetz 13/9, 2023 at 9:4

Exactly. In case of UB, the C++ does not describe it and does not require it to be documented anywhere, as you note. However, that doesn't mean it couldn't be defined elsewhere. So, it's not correct to claim that UB is never described/restricted in some other document (like for OS or compiler). It could be, it could be not. There could even be an extension to the standard, defining something that is UB in the standard. – Otti 13/9, 2023 at 9:7

@heapunderrun That's where my specification "end-user documentation" came from. As far as develpers are concerned, the correct behaviour is well-defined by documentation they use to test compiler's compliance. End user just have to use that as a "black box", which is compliant on in and out with standard, anything UB in standard can be defined or not by additional documentation. And if that's OSS, then code is documentation too. What we argue about? – Goetz 13/9, 2023 at 9:15

I disagree on that last sentence. The code is not documentation, unless you set in stone that this is the only version you will use, ever. – Thinia 14/9, 2023 at 1:25

@Thinia "Code is documentation"is part of OSS and Agile philosophy, as well as of some state standards (I recall ANSI having something like that and source code is defined as "a document written in programming language" in GOST) It doesn't mean that particular codebase is a "good documentation". It means that code is the major part of information about how program works. Other way is true too. Bjarne said that a document (standard) is also a software and software can have bugs. – Goetz 14/9, 2023 at 6:3

@Swift-FridayPie agree that it describes how the program works. Where it falls short though is to describe the design intent. ⇒ Is the program working in this specific way by design, with the guarantee it will keep working like that, or is this just a technical detail of the current implementation that can change at every release? – Thinia 14/9, 2023 at 17:5

…or if you prefer to think of it in that way: at best, the code only describes how the past and current version work. It's enough only if you never plan to use a future version, or if you are okay with revisiting everything at each and every update. The whole point of guarantees and documentation is to eliminate that very costly cost to upgrade. – Thinia 14/9, 2023 at 17:9

@Thinia It's indeed how most of technical documentation outside of programming works - describing what already exist. The intent, the task, the design principles are separate documents. And sometimes they are a more guarded secret than the result. I worked in metallurgy where it is very so. Resulting document is like: follow the numbers and procedure. Why those numbers, how they were obtained, why we even making this? That's science, trade secret or state secret. And in some cases modelling documentation or original goals are getting invalidated by implementation or are written afterwards. – Goetz 14/9, 2023 at 17:52

@Swift-FridayPie you are overgeneralizing there. I can definitely tell that you do not set bridge load limits by infering what they were from looking at the built bridge or its components. And tbh even for your domain I would be very surprised that it is routine to take your assembly apart and guess what the manufacturer guarantees are based on how it is built. "Code is documentation" is pretty much the same idea as "disassembling is documentation". It definitely teaches you how it's built, but not what it guarantees. – Thinia 15/9, 2023 at 13:49

@Thinia Assembling and disassembling, and maintenace, both. But yeah, you got it there. When a military company supplies document form parts, they give requirent for parts, not for whole thing. And maintennce documentation doesn't tell why you have to flip that switch furst (sadly... I knew a story where that mattered... because a guy didn't do that and flew away with a missile) – Goetz 15/9, 2023 at 16:9

Sorry but what does defining your own version of std::vector would be an UB even mean? What's special in std::vector which prevents me from defining my::std::vector? Or, if it's not special (wrt other std:: things), then what makes struct Foo {}; non UB? – Tornado 17/9, 2023 at 13:56

@Enlico: my::std::vector is not std::vector.if you write your own namespace std { class vector { ... in your code, attempting to use that is defined by standard as UB. Implementation (compiler's documents) may allow you do so but not required to allow that and not required to diagnose it, standard doesn't put any requirements. Doing that with another namespace is fine. Typical scheme, e.g. used by boost, is to pull name from std into custom namespace (boost) by using OR define own class in it. – Goetz 17/9, 2023 at 14:13

@Swift-FridayPie, so your point is basically that David Stone is stating the obvious, i.e. that implementing my stuff in std:: is UB (well, except in the few cases I can, e.g. this)? I mean, why would he make this comment about std::vector, given that the same comment applies basically to almost everying in std::? – Tornado 17/9, 2023 at 14:24

@Tornado Partially. yes. Standard doesn't require component to be implemented in any certain way, it requres guarantees. That's why Stroustroup and alike state that standard component library (not STL!) is a PART of language. If you cannot implement it in terms of available language's tools, it's another part of it. Technically it allows implementation where the headers wouldn't exist and their existence can be emulated by compiler itself. An IDE may still require some kind of correct-ish header for static code analisis. His comment rather about that if you wanted to create it, you cannot. – Goetz 17/9, 2023 at 15:55

@Tornado consider how you could possibly implement, say, std::launder. There simply is no way for a developer to obtain a valid pointer from an invalid pointer using code. std::launder is able to do that only because the language says so. Which means implementors need to provide that guarantee somehow (they have to make it special in the eye of the compiler) – Thinia 20/9, 2023 at 17:22

@spectras, I understand the point in the general sense. I don't think I can implement + for ints myself in a standard-compliant way, can I? And I guess I can't implement std::mutex. But what's special in std::vector, other than being in namespace std? I can copy all of it, sanitizing the reserved indentifiers and other stuff, put in my namespace, and use it. Would this lead to UB? – Tornado 20/9, 2023 at 17:42

To go to an extreme, @spectras, are you really saying I can't implement std::monostate in a standard-compliant way? – Tornado 20/9, 2023 at 17:42

@Tornado you would need to implementand std::variant in that case, otherwise it is useless (it's a tag type for std::variant, an equvalent of null-object pattern. Probably. IF implemented. Complers can do special magic there.). std::variant is very problematic to implement in standard-defined way though. and they have to have different names , eg. nonstd::monostate and nonstd::variant. And nonstd::hash<std::monostate>. And everything that can use that. – Goetz 20/9, 2023 at 18:53

@Enlico> yes it would lead to UB because there is one specific step you cannot perform yourself, which is starting the lifetime of the underlying array of T. C++26 finally brings std::start_lifetime_as_array specifically for this, so starting in C++26 you will be able to have your own vector without invoking UB (well you also have std::allocator<T>::allocate which has special language blessing for that so it is an option too). – Thinia 21/9, 2023 at 22:14

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags