In the C++14 standard (n3797), the section on lvalue to rvalue conversions reads as follows (emphasis mine):
4.1 Lvalue-to-rvalue-conversion [conv.lval]
A glvalue (3.10) of a non-function, non-array type
T
can be converted to a prvalue. IfT
is an incomplete type, a program that necessitates this conversion is ill-formed. IfT
is a non-class type, the type of the prvalue is the cv-unqualified version ofT
. Otherwise the type of the prvalue isT
.When an lvalue-to-rvalue conversion occurs in an unevaluated operand or a subexpression thereof (Clause 5) the value contained in the referenced object is not accessed. In all other cases, the result of the conversion is determined according to the following rules:
- If
T
is a (possibly cv-qualified)std::nullptr_t
then the result is a null pointer constant.- Otherwise, if
T
has class type, the conversion copy-initializes a temporary of typeT
from the glvalue and the result of the conversion is a prvalue for the temporary.- Otherwise, if the object to which the glvalue refers contains an invalid pointer value, the behavior is implementation-defined.
- Otherwise, if
T
is a (possibly cv-qualified) unsigned character type, and the object to which the glvalue refers contains an indeterminate value, and that object does not have automatic storage duration or the glvalue was the operand of a unary&
operator or it was bound to a reference, the result is an unspecified value.- Otherwise, if the object to which the glvalue refers has an indeterminate value, the behavior is undefined.
- Otherwise, the object indicated by the glvalue is the prvalue result.
- [Note: See also 3.10]
What's the significance of this paragraph (in bold)?
If this paragraph were not here, then the situations in which it applies would lead to undefined behavior. Normally, I would expect that accessing an unsigned char
value while it has an indeterminate value leads to undefined behavior. But, with this paragraph it means that
- If I'm not actually accessing the character value, i.e. I'm immediately passing it to
&
or binding it to a reference, or - If the
unsigned char
does not have automatic storage duration,
then the conversion yields an unspecified value, and not undefined behavior.
Am I correct to conclude that this program:
#include <new>
#include <iostream>
// using T = int;
using T = unsigned char;
int main() {
T * array = new T[500];
for (int i = 0; i < 500; ++i) {
std::cout << static_cast<int>(array[i]) << std::endl;
}
delete[] array;
}
is well-defined by the standard, and must output a sequence of 500 unspecified ints, while the same program where T = int
, would have undefined behavior?
IIUC, one of the reasons to make it UB to read things with indeterminate values, is to allow aggressive dead store elimination by the optimizer. So, this paragraph may mean that a conforming compiler can't do as much optimization when working with unsigned char
or arrays of unsigned char
.
Assuming I understand correctly, what is the rationale for this rule? When is it useful to be able to read unsigned char
that have indeterminate values, and get unspecified results instead of UB? I have this feeling that if they put this much effort into crafting this part of the rule, they had some motivation to help certain code examples that they cared about, or to be consistent with some other part of the standard, or simplify some other issue. But I have no idea what that might be.
uin16_t volatile vv; uint16_t test1(uint32_t x, uint32_t mode ) { int16_t temp; if (mode) temp = vv; return temp; }
the simplest code for 32-bit processors like the ARM would return the passed-in value ofx
when mode is zero, even if it's greater than 65535. The Standard has no way to describe the consequences of that except to call the whole situation UB. – Timoteo