Has C++ standard changed with respect to the use of indeterminate values and undefined behavior in C++14?
Asked Answered
P

1

67

As covered in Does initialization entail lvalue-to-rvalue conversion? Is int x = x; UB? the C++ standard has a surprising example in section 3.3.2 Point of declaration in which an int is initialized with it's own indeterminate value:

int x = 12;
{ int x = x; }

Here the second x is initialized with its own (indeterminate) value. — end example ]

Which Johannes answer to this question indicates is undefined behavior since it requires an lvalue-to-rvalue conversion.

In the latest C++14 draft standard N3936 which can be found here this example has changed to:

unsigned char x = 12;
{ unsigned char x = x; }

Here the second x is initialized with its own (indeterminate) value. — end example ]

Has something changed in C++14 with respect to indeterminate values and undefined behavior that has driven this change in the example?

Prevalent answered 1/5, 2014 at 20:4 Comment(4)
Relevant paper: Why Nothing Matters: The Impact of Zeroing when the question comes up why not just zero out uninitialized memory.Prevalent
While the paper is interesting, its conclusions don't necessarily apply to an ahead-of-time compiled language where static analyses could potentially remove most or all of the cost.Naturally
@Naturally This is more practical example and all the other articles I found on this found similar costs. I don't think it has been proven we can remove this cost, although perhaps it is possible.Prevalent
Example showing how the cost can definitely be removed in one trivial case: godbolt.org/g/Kh9xsp - I agree that it certainly won't always be possible/practical to remove all cost, but it certainly has been proven that compilers can remove the cost in at least some cases, and there doesn't seem to be any hard numbers attempting to assess the average/potential cost for an optimizing AOT compiler, which is my main point.Naturally
P
58

Yes, this change was driven by changes in the language which makes it undefined behavior if an indeterminate value is produced by an evaluation but with some exceptions for unsigned narrow characters.

Defect report 1787 whose proposed text can be found in N39141 was recently accepted in 2014 and is incorporated in the latest working draft N3936:

The most interesting change with respect to indeterminate values would be to section 8.5 paragraph 12 which goes from:

If no initializer is specified for an object, the object is default-initialized; if no initialization is performed, an object with automatic or dynamic storage duration has indeterminate value. [ Note: Objects with static or thread storage duration are zero-initialized, see 3.6.2. — end note ]

to (emphasis mine):

If no initializer is specified for an object, the object is default-initialized. When storage for an object with automatic or dynamic storage duration is obtained, the object has an indeterminate value, and if no initialization is performed for the object, that object retains an indeterminate value until that value is replaced (5.17 [expr.ass]). [Note: Objects with static or thread storage duration are zero-initialized, see 3.6.2 [basic.start.init]. —end note] If an indeterminate value is produced by an evaluation, the behavior is undefined except in the following cases:

  • If an indeterminate value of unsigned narrow character type (3.9.1 [basic.fundamental]) is produced by the evaluation of:

    • the second or third operand of a conditional expression (5.16 [expr.cond]),

    • the right operand of a comma (5.18 [expr.comma]),

    • the operand of a cast or conversion to an unsigned narrow character type (4.7 [conv.integral], 5.2.3 [expr.type.conv], 5.2.9 [expr.static.cast], 5.4 [expr.cast]), or

    • a discarded-value expression (Clause 5 [expr]),

    then the result of the operation is an indeterminate value.

  • If an indeterminate value of unsigned narrow character type (3.9.1 [basic.fundamental]) is produced by the evaluation of the right operand of a simple assignment operator (5.17 [expr.ass]) whose first operand is an lvalue of unsigned narrow character type, an indeterminate value replaces the value of the object referred to by the left operand.

  • If an indeterminate value of unsigned narrow character type (3.9.1 [basic.fundamental]) is produced by the evaluation of the initialization expression when initializing an object of unsigned narrow character type, that object is initialized to an indeterminate value.

and included the following example:

[ Example:

int f(bool b) {
  unsigned char c;
  unsigned char d = c; // OK, d has an indeterminate value
  int e = d;           // undefined behavior
  return b ? d : 0;    // undefined behavior if b is true
}

end example ]

We can find this text in N3936 which is the current working draft and N3937 is the C++14 DIS.

Prior to C++1y

It is interesting to note that prior to this draft unlike C which has always had a well specified notion of what uses of indeterminate values were undefined C++ used the term indeterminate value without even defining it (assuming we can not borrow definition from C99) and also see defect report 616. We had to rely on the underspecified lvalue-to-rvalue conversion which in draft C++11 standard is covered in section 4.1 Lvalue-to-rvalue conversion paragraph 1 which says:

[...]if the object is uninitialized, a program that necessitates this conversion has undefined behavior.[...]


Footnotes:

  1. 1787 is a revision of defect report 616, we can find that information in N3903
Prevalent answered 1/5, 2014 at 20:4 Comment(18)
But why does the example (in the question) change int to unsigned char for both variables?Typify
@Typify b/c in the int case it would now be undefined behavior. It is now only well defined in the case of narrow character types.Prevalent
IMO it's easier to do proper formatting with code and quotations by just writing it w/o spaces and > and then selecting the text and using the buttons or the shortcuts (CTRL-K for Kode, CTRL-Q for Quotations).Skyward
@Typify To be specific, fundamental types may have trap representations (e.g., signaling NaN) that do terrible things to the running program. In C and C++, this is represented as a type of undefined behavior. unsigned char is forbidden to have a trap representation, so the new examples have defined behavior.Tiossem
@Skyward awesome tip, thank you for the formatting changes, I always struggle with formatting by I try to learn from everyones fixes.Prevalent
@Casey: While that's true, it isn't the rationale for this rule. In particular, all integral unsigned types are indirectly (by the modulo arithmetic rules) forbidden to have trap representations. But only the unsigned narrow character type(s) fall into this special exemption.Multiple
I'm glad to see this DR, it was annoying how C++ always was unclear on the issue of accessing indeterminate ints.Heraclitus
@BenVoigt: Any operations on legitimate values of unsigned integral types are required to yield legitimate values of unsigned literal types, but at least in C it has always been legal for the number of usable bits in an unsigned type larger than unsigned char to be less than the number of bits in the chars that it occupies. Are unsigned types in C++ not allowed to have corrupt or trap representations (which could be created only by operations on the underlying storage or on corrupt values of those types, and not by operations on legitimate values the types themselves)?Subchaser
@supercat: Both rules use essentially the same verbiage: "For unsigned narrow character types, each possible bit pattern of the value representation represents a distinct number." vs "Unsigned integers shall obey the laws of arithmetic modulo 2<sup>n</sup> where n is the number of bits in the value representation of that particular size of integer." I guess you're saying character types are unique because of the extra rule "For narrow character types, all bits of the object representation participate in the value representation."Multiple
@supercat: But the language makes evaluation of an indeterminate value of all types other than unsigned char into undefined behavior, regardless of whether all bits of the object representation participate in the value representation.Multiple
@BenVoigt: I don't know whether the intention was to avoid extra verbiage "unless all the bits of the ...", though with the evolution of UB in compilers perhaps hyper-modern compiler authors are merely seeking excuses for wacky behavior.Subchaser
@BenVoigt: Upon further consideration, I think at one at-least-theoretical issue may be that compilers are allowed to use things like CPU registers for local or cached variables, and that even types which wouldn't have any padding if stored in main memory might have padding or trap bits in other legitimate representations. Still, I wish a standards committee would formalize a definition of "implementation-constrained behavior" which would be a cross between UB and implementations-defined behavior: implementations would be required to document what the consequences of something could be, and...Subchaser
...would be allowed to have those consequences include UB if the documentation expressly stated that, but would be encouraged to state consequences as narrowly as practical. For example, a useful definition of the consequences of reading an uninitialized auto-variable would be to say that it may trap in debug builds and will otherwise yield an arbitrary value which will poison any variable into which it is stored, such that the variable's value may forevermore appear to change at any time for any reason. Pretty severe consequences, but not nearly as severe as having a compiler...Subchaser
...make reverse-causal inferences that would cause it to ignore any conditions which might cause the variable to be read without having been initialized, especially if the only "use" of the value was to return it down a call chain and ultimately discard it. Unfortunately, I am unaware of any plans to codify such things; trends seem to be going in the reverse direction.Subchaser
What's the rationale for not keeping it undefined? How does it interact with the rule that says using variables that could have been declared register without initializing them results in UB?Iconoclast
@PSkocik: Consider volatile unsigned char x,y; unsigned char test(uint32_t p, uint32_t q) { unsigned char result; if (q & 1) result = x; if (q & 2) result = y; return result; } On some platforms like the ARM, a function that returns unsigned char must exit with a value 0-255 in the 32-bit register R0, but the most efficient conforming machine code for that function would result in R0 holding whatever 32-bit value was passed in p if neither if condition is satisfied.Subchaser
If the calling code wouldn't care about whether the return value is outside the range 0-255 except in cases where it passes a q value of 1, 2, or 3, requiring that the programmer set the value of result in those cases would result in less efficient code than would be necessary if neither the programmer nor the compiler had any obligation to ensure that the value is 0-255 in such cases.Subchaser
In section "Prior to C++ 1y", a link is broken. Could it be frama-c.com/2013/03/13/… ?Sohn

© 2022 - 2024 — McMap. All rights reserved.