When is an erroneous value not valid for the object's type?
D

3

7

In [conv.lval] p3.4, the result of lvalue-to-rvalue conversion is described as follows:

Otherwise, the object indicated by the glvalue is read ([defns.access]), and the value contained in the object is the prvalue result. If the result is an erroneous value ([basic.indet]) and the bits in the value representation are not valid for the object's type, the behavior is undefined.

When or why would this situation happen? For example, say we have the following code:

bool x;      // x has erroneous value (since C++26)
bool y = x;  // lvalue-to-rvalue conversion

Can't the compiler always choose the erroneous value of x so that it's valid for bool? If this was undefined behavior, it would defeat the purpose of erroneous behavior.

Keep in mind that P2795R5: Erroneous behaviour for uninitialized reads says:

The proposed change of behaviour has a runtime cost for existing code, since in general additional initialization of memory is now required.

Or in standardese ([basic.indet] p1.2):

otherwise, the bytes have erroneous values, where each value is determined by the implementation independently of the state of the program.

Presumably, this means that the compiler can and should initialize the memory (e.g. by zeroing) of x so that it would have a valid (erroneous) value. After all, erroneous behavior is:

well-defined behavior that the implementation is recommended to diagnose

Darcee answered 5/6 at 5:34 Comment(3)
Yes, this can be undefined. The paper you linked literally says that.Oscillatory
@Oscillatory where specifically? It's not obvious, and that's the point of this Q&A.Darcee
right before the section you linked to?Oscillatory
L
3

When or why would this situation happen? For example, say we have the following code:

bool x;      // x has erroneous value (since C++26)
bool y = x;  // lvalue-to-rvalue conversion

Can't the compiler always choose the erroneous value of x so that it's valid for bool?

It could, but it doesn't have to. You already quoted the section in [basic.indet] that takes about bytes having erroneous values - and the erroneous value in the byte representation of x might be 42, which is not valid for bool.

Same as for:

T* a;
T* b = a;

So if such a pattern is chosen as the erroneous value of an uninitialized object (e.g. for bool or a pointer), then such a read still has undefined behavior.

Levin answered 5/6 at 18:40 Comment(1)
I've initially thought that @anatolyg is wrong, but my assumptions were all totally wrong. I thought that only a malicious implementation would choose a bit pattern for bool that wouldn't be true or false, but that's really not the case. You're right, and even the benign case of bool x; can produce a bool that is neither true or false. After making this Q&A, I've opened a number of issues and submitted a bunch of PRs related to this. Most notably, the example (provided in part by Jens Maurer) github.com/cplusplus/draft/pull/7049 confirms that your answer is correct.Darcee
P
3

Your example isn't undefined behaviour, as you have noted the implementation can choose a valid representation for x.

The following could be, depending on what the implementation picks for erroneous char values:

char c; // erroneous value, determined independently of the state of the program
bool b = true;
std::memcpy(&b, &c, 1); // no erroneous behaviour, but b now has an erroneous value
int d = b ? 0 : 1; // reading an erroneous value with a potentially invalid representation

If there is a implementation which has a pair of types, for which all valid representations of the first type correspond to invalid representations of the second, then memcpying the value representation of an erroneous value from an instance of the first to an instance of the second would always lead to undefined behaviour.

Pouched answered 5/6 at 8:5 Comment(9)
Why OP's example isn't UB?Dogfight
Well, even bool x; could in itself have an invalid value if the implementation stupidly chooses erroneous values so that they're invalid for bool. However, just like that case, this issue with memcpying into a bool seems totally avoidable. I don't think the answer is totally wrong, but I also doubt that the wording was really designed to accommodate this case, and if zeroing would avoid UB for erroneous values completely, I doubt that the implementation freedom not to zero would exist.Darcee
Also note that a slightly smart implementation could see that the c is memcpied into a bool and deliberately choose zero as the erroneous value for that byte. Just like in the basic example in the question, you can avoid UB by just being clever (although that cannot be generalized; there's a limit to how clever you can be).Darcee
@JanSchultke the problem with picking zero is that it's a commonly occurring value. Current debug implementations use specific bit patterns to signpost use of uninitialized valuesPouched
I think this answer is simply wrong. bool is trivially copyable, but that only means you can use memcpy to copy the value of an existing bool and back into another bool and preserve the value. What you're doing is overwriting the object representation, and this is UB by omission; the standard does not permit you to modify the object representation like that/says what happens then. Presumably, memcpy would just end the lifetime of the bool without transparent replacement, making access to b UB later. This case is already UB before C++26.Darcee
@JanSchultke you can memcpy the values 0 or 1 into a bool in C++23, even if they reside in a non-bool object. You can especially do that via char, because you can reinterpret_cast any pointer to a char * and inspect the object representation that way.Pouched
"you can memcpy the values 0 or 1 into a bool in C++23, even if they reside in a non-bool object" where does it say that in the standard? Also, even if this is true, that doesn't mean you can memcpy the (erroneous) value 2 into bool. Either the value is valid for bool and then accessing b is well-defined, or the effect of the memcpy on b is UB by omission (the effect on the value representation isn't defined in any way), meaning that you run into UB, but not the UB that is described in [conv.lval].Darcee
@JanSchultke see [basic.types.general], noting that chars allow implicitly created objectsPouched
I now think there's a good chance that you're right, but the wording is defective. I think what currently happens is that memcpy begins the lifetime of a new bool within b, although the value representation 0x02 doesn't correspond to any value of bool. Because there is no value, by definition, the attempted "read" of the bool is undefined behavior, since there is no value to access. The premise of [conv.lval] "If the result is an erroneous value" is therefore impossible because no value was read, and no result exists. I need to talk to some committee members about this ...Darcee
L
3

When or why would this situation happen? For example, say we have the following code:

bool x;      // x has erroneous value (since C++26)
bool y = x;  // lvalue-to-rvalue conversion

Can't the compiler always choose the erroneous value of x so that it's valid for bool?

It could, but it doesn't have to. You already quoted the section in [basic.indet] that takes about bytes having erroneous values - and the erroneous value in the byte representation of x might be 42, which is not valid for bool.

Same as for:

T* a;
T* b = a;

So if such a pattern is chosen as the erroneous value of an uninitialized object (e.g. for bool or a pointer), then such a read still has undefined behavior.

Levin answered 5/6 at 18:40 Comment(1)
I've initially thought that @anatolyg is wrong, but my assumptions were all totally wrong. I thought that only a malicious implementation would choose a bit pattern for bool that wouldn't be true or false, but that's really not the case. You're right, and even the benign case of bool x; can produce a bool that is neither true or false. After making this Q&A, I've opened a number of issues and submitted a bunch of PRs related to this. Most notably, the example (provided in part by Jens Maurer) github.com/cplusplus/draft/pull/7049 confirms that your answer is correct.Darcee
D
2

Since asking this question, I have talked to CWG. I have originally misunderstood a few crucial things:

  1. Compilers are totally free to choose values for erroneous bytes (e.g. 0xCC...) which wouldn't be a valid bool representation, so the example in the question may have undefined behavior. This is intentional.

  2. Compilers could avoid such a "bad representation" in the given example, but there are cases (such as creating a bool in existing memory using placement-new) where the memory cannot be overwritten at all.

CWG Issue 2899 is also related to this and provides an example of when the value representation is not valid for an object's type. Notably, this can occur as the result of memcpy or other means of implicitly beginning lifetimes, not just through erroneous values.

Darcee answered 2/10 at 7:47 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.