What is erroneous behavior? How is it different from undefined behavior?
Asked Answered
E

1

52

C++26 has introduced erroneous behavior in addition to undefined, unspecified, and implementation-defined behavior (see Undefined, unspecified and implementation-defined behavior). How is this new construct different from the existing kinds of behavior, and why has it been added to the C++ standard?

Equi answered 28/3 at 14:0 Comment(3)
Oh boy, just when you think you are getting the hang of C++.Smithers
@user541686 it has been voted into the standard by plenary vote at the Tokyo meeting. See en.cppreference.com/w/cpp/26 for a list of proposals that have already made it into the standard.Equi
Oh wow I see, this is super recent. Thanks.Coyne
E
52

Erroneous behavior is "buggy" or "incorrect" behavior, as explained by P2795: Erroneous behavior for uninitialized reads. This proposal introduced erroneous behavior into C++26, turning previously undefined behavior into erroneous behavior.

The most notable difference is that undefined behavior has no limit as to what the program may do, including jumping into "random" functions, accessing memory that shouldn't be accessed, and other effects which are detrimental to security. Erroneous behavior is formally ([defns.erroneous]):

well-defined behavior that the implementation is recommended to diagnose

It may be diagnosed through warnings, run-time errors, etc.; formally, [intro.abstract] p5, sentence 3 explains:

If the execution contains an operation specified as having erroneous behavior, the implementation is permitted to issue a diagnostic and is permitted to terminate the execution at an unspecified time after that operation.

Motivation

Unfortunately, a substantial amount of C++ code is not bug-free, and many bugs can be harmful to security. An obvious example is this:

void (*f)(); // uninitialized function pointer;
             // basically an abstraction for an instruction address
// ...
f();         // what address do we jump to?

If f takes up some space on the stack, an attacker could make sure that the memory on the stack prior to executing this code has a value of their choice. f() would thus allow an attacker to jump to any instruction in the program they want. There are many more such cases of CWE-457: Use of Uninitialized Variable.

Simply making this code "correct" by initializing the function pointer to nullptr by default also wouldn't make sense, since there is clearly a bug here. f should have been initialized and the compiler should bring our attention to that fact if we forgot to initialize it before calling f, storing it somewhere, etc. We wouldn't want this bug to simply be "swept under the rug".

How does erroneous behavior work?

Erroneous behavior starts with an erroneous value, which e.g. is produced when leaving a variable uninitialized. On a side note, the pre-C++26 behavior can be reproduced using the [[indeterminate]] attribute:

void f(int);

int indet [[indeterminate]]; // indet has indeterminate value
int erron;                   // erron has erroneous value ([basic.indet])

f(indet); // undefined behavior
f(erron); // erroneous behavior

As explained above, undefined behavior could do anything here, including jumping into a function other than f, whereas f(erron) should always have defined behavior, but should be diagnosed at some point.

Erroneous vs. ill-formed

Erroneous behavior may seem similar to ill-formed programs, since both should result in a diagnostic (see also [intro.compliance.general] p8).

However, erroneous behavior comes into effect during program execution, whereas a program is ill-formed during translation (compilation). For example:

int x = float; // ill-formed; not valid C++ code,
               // shall be diagnosed

int y;         // well-formed (valid C++ code) but y has erroneous value
int z = y;     // erroneous behavior, should be diagnosed

Erroneous behavior in constant expressions

Unlike undefined behavior, erroneous behavior always disqualifies expressions from being constant expressions ([expr.const] p5.8). Note that undefined behavior behaves the same in most cases, but e.g. failed Preconditions in most of the standard library or failed [[assume]] attributes can still result in UB inside of constant expressions.

Also, a constexpr object cannot have erroneous value:

constexpr int x; // error: x has erroneous value

In C++23, this would have also been ill-formed because x would have had indeterminate value.

The broader picture

In general, C++ developers and the C++ committee are pushing the language into a "safer" direction. As part of that, a large amount of undefined behavior could be turned into erroneous behavior over the coming years.

In some cases, there is already a highly motivated proposal for it, such as P2973: Erroneous behavior for missing return from assignment. Some other cases of undefined behavior like signed integer overflow, division by zero, etc. could be made erroneous.

Harder-to-diagnose forms of UB such such as data races or invalid down-casts (with static_cast) will likely remain undefined, perhaps indefinitely.

The cost of erroneous behavior

Compilers increasingly rely on undefined behavior for the purpose of optimizations. For example:

void f(int i) {
    int arr[1] { 123 };
    return arr[i];
}

The compiler can optimize this down to:

void f(int):
    mov eax, 123
    ret

If i was anything other than 0, the array arr would be accessed out of bounds, which is undefined behavior. The compiler is allowed to assume that UB simply doesn't happen and optimize accordingly. If accessing arrays out of bounds was turned into erroneous behavior, the compiler would be encouraged to add a run-time bounds check to the array access and terminate the program if i is not 0.

In conclusion, erroneous behavior isn't "free"; it comes at a performance cost. Erroneous behavior is typically added to the C++ standard where the security risk of undefined behavior is significant, and where the cause of undefined behavior is not commonly used for optimizations.

Equi answered 28/3 at 14:0 Comment(16)
"well-defined behavior that the implementation is recommended to diagnose.." So the same as ill-formed with the exception that the standard just changed the word "required" to "recommended".Grivation
@Grivation I've added a section that discusses the difference between ill-formed programs and erroneous behavior to the answer.Equi
I see. But why is int y; erroneous when y is not even odr-used? Also is "errornous value" the same as "indeterminate value" ?Grivation
@Grivation it has erroneous value, which isn't erroneous behavior yet, but then e.g. doing int z = y; would be erroneous behavior. I've updated the example to showcase that. And erroneous values are not the same as indeterminate values; using indeterminate values in most ways results in undefined behavior whereas using erroneous values results in erroneous behavior.Equi
So it should be indeterminate value instead of errornous value, right?Grivation
@Grivation y would have had indeterminate value up until C++23, and would have indeterminate value when using the [[indeterminate]] attribute. Starting with C++26, default-initialized objects with automatic storage duration (i.e. "local variables") have erroneous value.Equi
"Starting with C++26, default-initialized objects with automatic storage duration (i.e. "local variables") have erroneous value." I suppose one motivation for that was to make initialization simpler .... ;)Percale
@Grivation No, erroneous. For instance, int x; int y = x; assert(x == y); In C++23, x has indeterminate value, initializing y is undefined behavior, and the assert has undefined behavior (and, in practice, could fail). In C++26, x has an erroneous value, initializing y has well-defined behavior - it has the same erroneous value, and the assertion definitely passes.Nilgai
"Unlike undefined behavior, erroneous behavior always disqualifies expressions from being constant expressions ([expr.const])." - huh? You cannot have UB in constant expressions.Nilgai
@Nilgai that's true in most cases but there are some exceptions stated in eel.is/c++draft/expr.const#5.8. Undefined behavior from failed assumptions and from the standard library is generally not caught and just results in UB within constant evaluation.Equi
@JanSchultke I think that's just because the implementation might not even evaluate the assumption - not because it could evaluate the assumption and have undefined behavior.Nilgai
@JanSchultke Besides, the same exception for [[assume]] would apply anyway for erroneous behavior anyway? It's the same bullet.Nilgai
"the compiler would be encouraged to add a run-time bounds check to the array access and terminate the program if i is not 0." - Where in the paper do you see a suggestion that erroneous behavior should lead to termination (or bounds checking)?Nilgai
@Nilgai in principle yes, but assumptions don't produce erroneous behavior and neither does anything in the standard library ... yet. For the time being, there should be no case where erroneous behavior remains uncaught within constant evaluation. As for the termination, it's part of the proposed wording for [intro.abstract] p4.1.2. "... and is permitted to terminate the execution at an unspecified time after that operation.", which presumably includes immediately after, or even just before, assuming there is no observable difference.Equi
@JanSchultke 'Erroneous behaviour is well defined behaviour that the implementation is recommended to diagnose' But who is defining the well defined behaviour. The standard or the implementation?Smithers
@Smithers the C++ standard defines it. Use of erroneous values has the same semantics as use of non-erroneous values, except that this is additionally considered "erroneous behavior", where diagnostics are encouraged and termination is possible.Equi

© 2022 - 2024 — McMap. All rights reserved.