What is undefined behavior (UB) in C and C++? What about unspecified behavior and implementation-defined behavior? What is the difference between them?
Undefined behavior is one of those aspects of the C and C++ language that can be surprising to programmers coming from other languages (other languages try to hide it better). Basically, it is possible to write C++ programs that do not behave in a predictable way, even though many C++ compilers will not report any errors in the program!
Let's look at a classic example:
#include <iostream>
int main()
{
char* p = "hello!\n"; // yes I know, deprecated conversion
p[0] = 'y';
p[5] = 'w';
std::cout << p;
}
The variable p
points to the string literal "hello!\n"
, and the two assignments below try to modify that string literal. What does this program do? According to the C++ standard, [lex.string] note 4, it invokes undefined behavior:
The effect of attempting to modify a string literal is undefined.
I can hear people screaming "But wait, I can compile this no problem and get the output yellow
" or "What do you mean undefined, string literals are stored in read-only memory, so the first assignment attempt results in a core dump". This is exactly the problem with undefined behavior. Basically, the standard allows anything to happen once you invoke undefined behavior (even nasal demons). If there is a "correct" behavior according to your mental model of the language, that model is simply wrong; The C++ standard has the only vote, period.
Other examples of undefined behavior include
- accessing an array beyond its bounds,
- division by zero,
- dereferencing a null pointer,
- accessing objects after their lifetime ended, or
- writing allegedly clever expressions like
i++ + ++i
.
[intro.defs] also defines undefined behavior's two less dangerous brothers, unspecified behavior and implementation-defined behavior:
implementation-defined behavior [defns.impl.defined]
behavior, for a well-formed program construct and correct data, that depends on the implementation and that each implementation documents
unspecified behavior [defns.unspecified]
behavior, for a well-formed program construct and correct data, that depends on the implementation
[Note: The implementation is not required to document which behavior occurs. The range of possible behaviors is usually delineated by this document. — end note]
undefined behavior [defns.undefined]
behavior for which this document imposes no requirements
[Note: Undefined behavior may be expected when this document omits any explicit definition of behavior or when a program uses an erroneous construct or erroneous data. Permissible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message). [...] — end note]
What can you do to avoid running into undefined behavior? Basically, you have to read good C++ books by authors who know what they're talking about. Avoid internet tutorials. Avoid bullschildt.
int f(){int a; return a;}
: the value of a
may change between function calls. –
Marlenmarlena auto q= "hello!\n";
q now may or may not be "hello!\n" or "yellow\n" and worse; q may or may not be in the same function as p. –
Latta int a; char s[20]; cin>>a; cin>>s;
input: 23bonapart
+ Enter key hit now, code cout<<a; cout<<s;
Output: 23
and bonapart
here I am not separating input data with delimiters but output is correct, so my question is it really distinguishing inputs or is it case of Undefined behavior "If there is a "correct" behavior according to your mental model of the language, that model is simply wrong " ? –
Mcardle f
function because it invokes undefined behaviour: reading an uninitialized variable. Your example doesn't seems non-deterministic to me. –
Brockbrocken int
will simply yield an unspecified value of that type without side effects, given a program that exploits that guarantee, may be able to generate more efficient code to accomplish a task than could be generated for a program that didn't exploit it. –
Symbology Well, this is basically a straight copy-paste from the C standard:
3.4.1 1 implementation-defined behavior unspecified behavior where each implementation documents how the choice is made
2 EXAMPLE An example of implementation-defined behavior is the propagation of the high-order bit when a signed integer is shifted right.
3.4.3 1 undefined behavior behavior, upon use of a nonportable or erroneous program construct or of erroneous data, for which this International Standard imposes no requirements
2 NOTE Possible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message).
3 EXAMPLE An example of undefined behavior is the behavior on integer overflow.
3.4.4 1 unspecified behavior use of an unspecified value, or other behavior where this International Standard provides two or more possibilities and imposes no further requirements on which is chosen in any instance
2 EXAMPLE An example of unspecified behavior is the order in which the arguments to a function are evaluated.
a[i] = i++
is undefined. Is it possible for a compiler to be sophisticated enough to be programed for this event? What would happen, it would just output random assembly? –
Duffel int foo(int x) { if (x >= 0) launch_missiles(); return x << 1; }
a compiler can determine that since all means of invoking the function that don't launch the missiles invoke Undefined Behavior, it can make the call to launch_missiles()
unconditional. –
Symbology Maybe simpler wording could be easier to understand than the rigorous definition of the standards.
implementation-defined behavior:
The language says that we have data-types. The compiler vendors specify what sizes shall they use, and provide a documentation of what they did.
undefined behavior:
You are doing something wrong. For example, you have a very large value in an int
that doesn't fit in char
. How do you put that value in char
? actually there is no way! Anything could happen, but the most sensible thing would be to take the first byte of that int and put it in char
. It is just wrong to do that to assign the first byte, but thats what happens under the hood.
unspecified behavior:
Which of these two functions is executed first?
void fun(int n, int m);
int fun1() {
std::cout << "fun1";
return 1;
}
int fun2() {
std::cout << "fun2";
return 2;
}
//...
fun(fun1(), fun2()); // which one is executed first?
The language doesn't specify the evaluation, left to right or right to left! So an unspecified behavior may or mayn't result in an undefined behavior, but certainly your program should not produce an unspecified behavior.
@eSKay I think your question is worth editing the answer to clarify more :)
for
fun(fun1(), fun2());
isn't the behaviour "implementation defined"? The compiler has to choose one or the other course, after all?
The difference between implementation-defined and unspecified, is that the compiler is supposed to pick a behavior in the first case but it doesn't have to in the second case. For example, an implementation must have one and only one definition of sizeof(int)
. So, it can't say that sizeof(int)
is 4 for some portion of the program and 8 for others. Unlike unspecified behavior, where the compiler can say: "OK I am gonna evaluate these arguments left-to-right and the next function's arguments are evaluated right-to-left." It can happen in the same program, that's why it is called unspecified. In fact, C++ could have been made easier if some of the unspecified behaviors were specified. Take a look here at Dr. Stroustrup's answer for that:
It is claimed that the difference between what can be produced giving the compiler this freedom and requiring "ordinary left-to-right evaluation" can be significant. I'm unconvinced, but with innumerable compilers "out there" taking advantage of the freedom and some people passionately defending that freedom, a change would be difficult and could take decades to penetrate to the distant corners of the C and C++ worlds. I am disappointed that not all compilers warn against code such as
++i+i++
. Similarly, the order of evaluation of arguments is unspecified.IMO far too many "things" are left undefined, unspecified, that's easy to say and even to give examples of, but hard to fix. It should also be noted that it is not all that difficult to avoid most of the problems and produce portable code.
fun(fun1(), fun2());
isn't the behaviour "implementation defined"
? The compiler has to choose one or the other course, after all? –
Peso "I am gonna evaluate these arguments left-to-right and the next function's arguments are evaluated right-to-left"
I understand this can
happen. Does it really, with compilers that we use these days? –
Peso From the official C Rationale Document
The terms unspecified behavior, undefined behavior, and implementation-defined behavior are used to categorize the result of writing programs whose properties the Standard does not, or cannot, completely describe. The goal of adopting this categorization is to allow a certain variety among implementations which permits quality of implementation to be an active force in the marketplace as well as to allow certain popular extensions, without removing the cachet of conformance to the Standard. Appendix F to the Standard catalogs those behaviors which fall into one of these three categories.
Unspecified behavior gives the implementor some latitude in translating programs. This latitude does not extend as far as failing to translate the program.
Undefined behavior gives the implementor license not to catch certain program errors that are difficult to diagnose. It also identifies areas of possible conforming language extension: the implementor may augment the language by providing a definition of the officially undefined behavior.
Implementation-defined behavior gives an implementor the freedom to choose the appropriate approach, but requires that this choice be explained to the user. Behaviors designated as implementation-defined are generally those in which a user could make meaningful coding decisions based on the implementation definition. Implementors should bear in mind this criterion when deciding how extensive an implementation definition ought to be. As with unspecified behavior, simply failing to translate the source containing the implementation-defined behavior is not an adequate response.
Undefined Behavior vs. Unspecified Behavior has a short description of it.
Their final summary:
To sum up, unspecified behavior is usually something you shouldn't worry about, unless your software is required to be portable. Conversely, undefined behavior is always undesirable and should never occur.
Implementation defined-
Implementors wish,should be well documented,standard gives choices but sure to compile
Unspecified -
Same as implementation-defined but not documented
Undefined-
Anything might happen,take care of it.
uint32_t s;
, evaluating 1u<<s
when s
is 33 could be expected to maybe yield 0 or maybe yield 2, but not do anything else wacky. Newer compilers, however, evaluating 1u<<s
may cause a compiler to determine that because s
must have been less than 32 beforehand, any code before or after that expression which would only be relevant if s
had been 32 or greater may be omitted. –
Symbology Historically, both Implementation-Defined Behavior and Undefined Behavior represented situations in which the authors of the Standard expected that people writing quality implementations would use judgment to decide what behavioral guarantees, if any, would be useful for programs in the intended application field running on the intended targets. The needs of high-end number-crunching code are quite different from those of low-level systems code, and both UB and IDB give compiler writers flexibility to meet those different needs. Neither category mandates that implementations behave in a way that's useful for any particular purpose, or even for any purpose whatsoever. Quality implementations that claim to be suitable for a particular purpose, however, should behave in a manner befitting such purpose whether the Standard requires it or not.
The only difference between Implementation-Defined Behavior and Undefined Behavior is that the former requires that implementations define and document a consistent behavior even in cases where nothing the implementation could possibly do would be useful. The dividing line between them is not whether it would generally be useful for implementations to define behaviors (compiler writers should define useful behaviors when practical whether the Standard requires them to or not) but whether there might be implementations where defining a behavior would be simultaneously costly and useless. A judgment that such implementations might exist does not in any way, shape, or form, imply any judgment about the usefulness of supporting a defined behavior on other platforms.
Unfortunately, since the mid 1990s compiler writers have started to interpret the lack of behavioral mandates as an judgment that behavioral guarantees aren't worth the cost even in application fields where they're vital, and even on systems where they cost practically nothing. Instead of treating UB as an invitation to exercise reasonable judgment, compiler writers have started treating it as an excuse not to do so.
For example, given the following code:
int scaled_velocity(int v, unsigned char pow)
{
if (v > 250)
v = 250;
if (v < -250)
v = -250;
return v << pow;
}
a two's-complement implementation would not have to expend any effort
whatsoever to treat the expression v << pow
as a two's-complement shift
without regard for whether v
was positive or negative.
The preferred philosophy among some of today's compiler writers, however, would suggest that because v
can only be negative if the program is going to engage in Undefined Behavior, there's no reason to have the program clip the negative range of v
. Even though left-shifting of negative values used to be supported on every single compiler of significance, and a large amount of existing code relies upon that behavior, modern philosophy would interpret the fact that the Standard says that left-shifting negative values is UB as implying that compiler writers should feel free to ignore that.
<<
is UB on negative numbers is a nasty little trap and I'm glad to be reminded of that! –
Dewan i+j>k
yields 1 or 0 in cases where the addition overflows, provided it has no other side effects, a compiler may be able to make some massive optimizations that would not be possible if the programmer wrote the code as (int)((unsigned)i+j) > k
. –
Symbology i+j>k
in such a case? –
Jacquie i
, j
, and k
are arguments to a function a compiler is expanding in line for a function call foo(x, y, x)
. In that scenario, a compiler could replace i+j > k
with x+y > x
, which it could in turn replaced with y > 0
, skipping the addition entirely, eliminating any dependency on the value of x
, and possibly allowing a compiler to eliminate the comparison and any dependency upon the exact value of y
if it can determine that y
will always be positive. –
Symbology v << pow
can be losslesly rewritten into (int)((unsigned)v<<pow)
)). –
Jacquie v << pow
, should an implementation be allowed to, or forbidden from, replacing e.g. (v << pow) > 0
with v > 0
if v
is signed? If the programmer had cast v
to unsigned, such substitution would be forbidden, but all cases where the former expression's behavior wouldn't match the latter are regarded as UB under C99 and later. I think that allowing programmers to write v << pow
in cases where C89 would define the behavior but the substitution would be useful, would make more sense than requiring them to block the optimization by using the unsigned cast. –
Symbology C++ standard n3337 § 1.3.10 implementation-defined behavior
behavior, for a well-formed program construct and correct data, that depends on the implementation and that each implementation documents
Sometimes C++ Standard doesn't impose particular behavior on some constructs but says instead that a particular, well defined behavior has to be chosen and described by particular implementation (version of library). So user can still know exactly how will program behave even though Standard doesn't describe this.
C++ standard n3337 § 1.3.24 undefined behavior
behavior for which this International Standard imposes no requirements [ Note: Undefined behavior may be expected when this International Standard omits any explicit definition of behavior or when a program uses an erroneous construct or erroneous data. Permissible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message). Many erroneous program constructs do not engender undefined behavior; they are required to be diagnosed. — end note ]
When the program encounters construct that is not defined according to C++ Standard it is allowed to do whatever it wants to do ( maybe send an email to me or maybe send an email to you or maybe ignore the code completely).
C++ standard n3337 § 1.3.25 unspecified behavior
behavior, for a well-formed program construct and correct data, that depends on the implementation [ Note: The implementation is not required to document which behavior occurs. The range of possible behaviors is usually delineated by this International Standard. — end note ]
C++ Standard doesn't impose particular behavior on some constructs but says instead that a particular, well-defined behavior has to be chosen ( but not necessarily described) by particular implementation (version of library). So in the case when no description has been provided it can be difficult for the user to know exactly how the program will behave.
Undefined behavior is ugly -- as in, "The good, the bad, and the ugly".
Good: a program that compiles and works, for the right reasons.
Bad: a program that has an error, of a kind that the compiler can detect and complain about.
Ugly: a program that has an error, that the compiler cannot detect and warn about, meaning that the program compiles, and may seem to work correctly some of the time, but also fails bizarrely some of the time. That's what undefined behavior is.
Some program languages and other formal systems try hard to limit the "gulf of undefinedness" -- that is, they try to arrange things so that most or all programs are either "good" or "bad", and that very few are "ugly". It's a characteristic feature of C, however, that its "gulf of undefinedness" is quite wide.
© 2022 - 2024 — McMap. All rights reserved.