C++26 has introduced erroneous behavior in addition to undefined, unspecified, and implementation-defined behavior (see Undefined, unspecified and implementation-defined behavior). How is this new construct different from the existing kinds of behavior, and why has it been added to the C++ standard?
Erroneous behavior is "buggy" or "incorrect" behavior, as explained by P2795: Erroneous behavior for uninitialized reads. This proposal introduced erroneous behavior into C++26, turning previously undefined behavior into erroneous behavior.
The most notable difference is that undefined behavior has no limit as to what the program may do, including jumping into "random" functions, accessing memory that shouldn't be accessed, and other effects which are detrimental to security. Erroneous behavior is formally ([defns.erroneous]):
well-defined behavior that the implementation is recommended to diagnose
It may be diagnosed through warnings, run-time errors, etc.; formally, [intro.abstract] p5, sentence 3 explains:
If the execution contains an operation specified as having erroneous behavior, the implementation is permitted to issue a diagnostic and is permitted to terminate the execution at an unspecified time after that operation.
Motivation
Unfortunately, a substantial amount of C++ code is not bug-free, and many bugs can be harmful to security. An obvious example is this:
void (*f)(); // uninitialized function pointer;
// basically an abstraction for an instruction address
// ...
f(); // what address do we jump to?
If f
takes up some space on the stack, an attacker could make sure that the memory on the stack prior to executing this code has a value of their choice.
f()
would thus allow an attacker to jump to any instruction in the program they want.
There are many more such cases of CWE-457: Use of Uninitialized Variable.
Simply making this code "correct" by initializing the function pointer to nullptr
by default also wouldn't make sense, since there is clearly a bug here.
f
should have been initialized and the compiler should bring our attention to that fact if we forgot to initialize it before calling f
, storing it somewhere, etc. We wouldn't want this bug to simply be "swept under the rug".
How does erroneous behavior work?
Erroneous behavior starts with an erroneous value, which e.g. is produced when leaving a variable uninitialized.
On a side note, the pre-C++26 behavior can be reproduced using the [[indeterminate]]
attribute:
void f(int);
int indet [[indeterminate]]; // indet has indeterminate value
int erron; // erron has erroneous value ([basic.indet])
f(indet); // undefined behavior
f(erron); // erroneous behavior
As explained above, undefined behavior could do anything here, including jumping into a function other than f
, whereas f(erron)
should always have defined behavior, but should be diagnosed at some point.
Erroneous vs. ill-formed
Erroneous behavior may seem similar to ill-formed programs, since both should result in a diagnostic (see also [intro.compliance.general] p8).
However, erroneous behavior comes into effect during program execution, whereas a program is ill-formed during translation (compilation). For example:
int x = float; // ill-formed; not valid C++ code,
// shall be diagnosed
int y; // well-formed (valid C++ code) but y has erroneous value
int z = y; // erroneous behavior, should be diagnosed
Erroneous behavior in constant expressions
Unlike undefined behavior, erroneous behavior always disqualifies expressions from being constant expressions ([expr.const] p5.8).
Note that undefined behavior behaves the same in most cases, but e.g. failed Preconditions in most of the standard library or failed [[assume]]
attributes can still result in UB inside of constant expressions.
Also, a constexpr
object cannot have erroneous value:
constexpr int x; // error: x has erroneous value
In C++23, this would have also been ill-formed because x
would have had indeterminate value.
The broader picture
In general, C++ developers and the C++ committee are pushing the language into a "safer" direction. As part of that, a large amount of undefined behavior could be turned into erroneous behavior over the coming years.
In some cases, there is already a highly motivated proposal for it, such as P2973: Erroneous behavior for missing return from assignment. Some other cases of undefined behavior like signed integer overflow, division by zero, etc. could be made erroneous.
Harder-to-diagnose forms of UB such such as data races or invalid down-casts (with static_cast
) will likely remain undefined, perhaps indefinitely.
The cost of erroneous behavior
Compilers increasingly rely on undefined behavior for the purpose of optimizations. For example:
void f(int i) {
int arr[1] { 123 };
return arr[i];
}
The compiler can optimize this down to:
void f(int):
mov eax, 123
ret
If i
was anything other than 0
, the array arr
would be accessed out of bounds, which is undefined behavior.
The compiler is allowed to assume that UB simply doesn't happen and optimize accordingly.
If accessing arrays out of bounds was turned into erroneous behavior, the compiler would be encouraged to add a run-time bounds check to the array access and terminate the program if i
is not 0
.
In conclusion, erroneous behavior isn't "free"; it comes at a performance cost. Erroneous behavior is typically added to the C++ standard where the security risk of undefined behavior is significant, and where the cause of undefined behavior is not commonly used for optimizations.
int y;
erroneous when y
is not even odr-used? Also is "errornous value" the same as "indeterminate value" ? –
Grivation int z = y;
would be erroneous behavior. I've updated the example to showcase that. And erroneous values are not the same as indeterminate values; using indeterminate values in most ways results in undefined behavior whereas using erroneous values results in erroneous behavior. –
Equi y
would have had indeterminate value up until C++23, and would have indeterminate value when using the [[indeterminate]]
attribute. Starting with C++26, default-initialized objects with automatic storage duration (i.e. "local variables") have erroneous value. –
Equi int x; int y = x; assert(x == y);
In C++23, x
has indeterminate value, initializing y
is undefined behavior, and the assert has undefined behavior (and, in practice, could fail). In C++26, x
has an erroneous value, initializing y
has well-defined behavior - it has the same erroneous value, and the assertion definitely passes. –
Nilgai [[assume]]
would apply anyway for erroneous behavior anyway? It's the same bullet. –
Nilgai © 2022 - 2024 — McMap. All rights reserved.