Does initialization entail lvalue-to-rvalue conversion? Is `int x = x;` UB?
Asked Answered
M

3

58

The C++ standard contains a semi-famous example of "surprising" name lookup in 3.3.2, "Point of declaration":

int x = x;

This initializes x with itself, which (being a primitive type) is uninitialized and thus has an indeterminate value (assuming it is an automatic variable).

Is this actually undefined behaviour?

According to 4.1 "Lvalue-to-rvalue conversion", it is undefined behaviour to perform lvalue-to-rvalue conversion on an uninitialized value. Does the right-hand x undergo this conversion? If so, would the example actually have undefined behaviour?

Matchboard answered 18/2, 2013 at 11:55 Comment(24)
I feel the behaviour is quite defined. The value of x will not change. The value of x, however, is undefined.Malkamalkah
@Bingo: If you feel that, can you formulate an argument derived from the langauge standard and post it as an answer? :-)Matchboard
Isn't it the same as asking whether int y; int x = y; is UB? [edit: Hm, probably no. This is about computing the value of an unitialized variable, not a default-initialized one]Breezy
I'm wondering whether the lifetime of x necessarily began in the moment it gets evaluated on the right side. Is it specified that the part on the left side (which, I guess, allocates storage for x) is sequenced before the part on the right side?Breezy
@AndyProwl: Since int is a scalar type, the lifetime of x begins after int x, as described by 3.8/1.Matchboard
@KerrekSB: What I'm saying (I might be wrong, I'm just trying to figure it out) is that the storage allocation specified in 3.8/1 as the beginning of x's lifetime, which is done by int x on the left side of the copy-initialization, is not necessarily sequenced before the value computation of x on the right side of the copy-initialization.Breezy
@AndyProwl: Automatic storage allocation isn't sequenced at all. Only expression evaluation is something that can be sequenced. A declaration statement isn't an expression.Matchboard
@AndyProwl Regarding your first remark (to which nobody replied): Since y in your example has a built-in type, default-initialized is the same as uninitialized. And yes, I think the question is the same as asking whether int y; int x = y; is UB. I also think that the answer is no, they are not UB. The Standard actually uses a very similar example at the very end of 8.5/16 and there is no word of UB. It also seems to follow from Jesse Good's answer.Codel
@jogojapan: I see what you mean, and the example seems to prove you right, on the other hand consider this case: int y; int x = y; would not be UB, while int x; int y; x = y; would be (the assignment operator probably accepts a prvalue - see my answer to figure out what I mean). Does that makes sense? To me, it doesn't. It makes much more sense to assume that wherever a value is required, a prvalue is expected unless specified otherwise. Jesse Good correctly points out that 8.5/16 is the key paragraph, but hurries into the interpretation that no conversion is required. [continues...]Breezy
@jogojapan: [...follows] However, one cannot really tell that, because although a type conversion is not required obviously, nothing is said about whether a cagetory conversion is needed. In fact, without knowing which value category copy-initialization expects, we won't come up with a definite answer (at least IMO). So the first step to formulate a meaningful answer is to figure out what value category is expected by copy-initialization. Considering what I wrote in my answer, I believe prvalue is the most reasonable expectation. Of course I may be wrong, but it makes sense to me.Breezy
@AndyProwl Yes, that's true. There is also the problem that while int x; int y = x; does not seem to be UB, short x; int y = x; is, because the integral promotion requires an implicit conversion, and according to §4/3 this implies an operation equivalent to the creation of a prvalue temporary and therefore lvalue-to-rvalue conversion.Codel
@jogojapan: Indeed. Mind if I add these comments to my answer?Breezy
@AndyProwl Of course not, please go ahead.Codel
General comment: There seems to be a proposal which covers this case.Matchboard
This changes a lot in C++1y. Let me know if you want me to add an answer here or if you want to edit a link into your question etc...Superfamily
@ShafikYaghmour: Very interesting, but is there any difference for int? E.g. will it be expressly undefined behaviour?Matchboard
@KerrekSB it is UB for anything besides a narrow char. The language produced by an evaluation seems to be pretty encompassing.Superfamily
@Codel "default-initialized is the same as uninitialized." I don't believe there's anything in the C++11 spec or earlier that indicates this. My reading of the spec is that default-initialized and uninitialized are mutually exclusive categories.Ribbon
@Ribbon The relevant statement in C++11 is 8.5/6. Note that my earlier comment was only about built-in types, i.e. the third bullet-point in 8.5/6.Codel
Admittedly my use of the word "built-in" in this context is somewhat unprecise. I use it in analogy to how the standard uses it when talking about "built-in operators".Codel
@Codel I believe it's incorrect to interpret 'no initialization is performed' as implying the variable is uninitialized. The spec doesn't define it this way and I read it as simply saying that no initialization needs to be performed in order to default initialize the object; The object is default initialized without such performance. In any case C++14 thankfully closes this loophole.Ribbon
@Ribbon I am not sure I follow. You are saying that in a case like int x;, no initialization has been performed on x, but it is not uninitialized? So performing an lvalue-to-rvalue conversion and accessing its value does not cause undefined behavior? Or in what way is its status different from "uninitialized"?Codel
@Codel Exactly. It's status is 'uninitialized' in the way programers generally mean, but under the spec it is in a category 'default initialized' which I interpret as being mutually exclusive with the spec's usage of 'uninitialized'. An uninitialized object under the specs usage would be limited to objects that do not fall under any of the initialized categories, e.g. in int *x = malloc(sizeof(int)); the object *x is uninitialized.Ribbon
@Ribbon Ok, I see. I agree that's a possible way to interpret the C++11 Standard (as far as I know).Codel
B
22

UPDATE: Following the discussion in the comments, I added some more evidence at the end of this answer.


Disclaimer: I admit this answer is rather speculative. The current formulation of the C++11 Standard, on the other hand, does not seem to allow for a more formal answer.


In the context of this Q&A, it has emerged that the C++11 Standard fails to formally specify what value categories are expected by each language construct. In the following I will mostly focus on built-in operators, although the question is about initializers. Eventually, I will end up extending the conclusions I drew for the case of operators to the case of initializers.

In the case of built-in operators, in spite of the lack of a formal specification, (non-normative) evidences are found in the Standard that the intended specification is to let prvalues be expected wherever a value is needed, and when not specified otherwise.

For instance, a note in Paragraph 3.10/1 says:

The discussion of each built-in operator in Clause 5 indicates the category of the value it yields and the value categories of the operands it expects. For example, the built-in assignment operators expect that the left operand is an lvalue and that the right operand is a prvalue and yield an lvalue as the result. User-defined operators are functions, and the categories of values they expect and yield are determined by their parameter and return types

Section 5.17 on assignment operators, on the other hand, does not mention this. However, the possibility of performing an lvalue-to-rvalue conversion is mentioned, again in a note (Paragraph 5.17/1):

Therefore, a function call shall not intervene between the lvalue-to-rvalue conversion and the side effect associated with any single compound assignment operator

Of course, if no rvalue were expected, this note would be meaningless.

Another evidence is found in 4/8, as pointed out by Johannes Schaub in the comments to linked Q&A:

There are some contexts where certain conversions are suppressed. For example, the lvalue-to-rvalue conversion is not done on the operand of the unary & operator. Specific exceptions are given in the descriptions of those operators and contexts.

This seems to imply that lvalue-to-rvalue conversion is performed on all operands of built-in operators, except when specified otherwise. This would mean, in turn, that rvalues are expected as operands of built-in operators unless specified otherwise.


CONJECTURE:

Even though initialization is not assignment, and therefore operators do not enter the discussion, my suspicion is that this area of the specification is affected by the very same problem described above.

Traces supporting this belief can be found even in Paragraph 8.5.2/5, about the initialization of references (for which the value of the lvalue initializer expression is not needed):

The usual lvalue-to-rvalue (4.1), array-to-pointer (4.2), and function-to-pointer (4.3) standard conversions are not needed, and therefore are suppressed, when such direct bindings to lvalues are done.

The word "usual" seems to imply that when initializing objects which are not of a reference type, lvalue-to-rvalue conversion is meant to apply.

Therefore, I believe that although requirements on the expected value category of initializers are ill-specified (if not completely missing), on the grounds of the evidences provided it makes sense to assume that the intended specification is that:

Wherever a value is required by a language construct, a prvalue is expected unless specified otherwise.

Under this assumption, an lvalue-to-rvalue conversion would be required in your example, and that would lead to Undefined Behavior.


ADDITIONAL EVIDENCE:

Just to provide further evidence to support this conjecture, let's assume it wrong, so that no lvalue-to-rvalue conversion is indeed required for copy-initialization, and consider the following code (thanks to jogojapan for contributing):

int y;
int x = y; // No UB
short t;
int u = t; // UB! (Do not like this non-uniformity, but could accept it)
int z;
z = x; // No UB (x is not uninitialized)
z = y; // UB! (Assuming assignment operators expect a prvalue, see above)
       // This would be very counterintuitive, since x == y

This non-uniform behavior does not make a lot of sense to me. What makes more sense IMO is that wherever a value is required, a prvalue is expected.

Moreover, as Jesse Good correctly points out in his answer, the key Paragraph of the C++ Standard is 8.5/16:

— Otherwise, the initial value of the object being initialized is the (possibly converted) value of the initializer expression. Standard conversions (Clause 4) will be used, if necessary, to convert the initializer expression to the cv-unqualified version of the destination type; no user-defined conversions are considered. If the conversion cannot be done, the initialization is ill-formed. [ Note: An expression of type “cv1 T” can initialize an object of type “cv2 T” independently of the cv-qualifiers cv1 and cv2.

However, while Jesse mainly focuses on the "if necessary" bit, I would also like to stress the word "type". The paragraph above mentions that standard conversions will be used "if necessary" to convert to the destination type, but does not say anything about category conversions:

  1. Will category conversions be performed if needed?
  2. Are they needed?

For what concerns the second question, as discussed in the original part of the answer, the C++11 Standard currently does not specify whether category conversions are needed or not, because nowhere it is mentioned whether copy-initialization expects a prvalue as an initializer. Thus, a clear-cut answer is impossible to give. However, I believe I provided enough evidence to assume this to be the intended specification, so that the answer would be "Yes".

As for the first question, it seems reasonable to me that the answer is "Yes" as well. If it were "No", obviously correct programs would be ill-formed:

int y = 0;
int x = y; // y is lvalue, prvalue expected (assuming the conjecture is correct)

To sum it up (A1 = "Answer to question 1", A2 = "Answer to question 2"):

          | A2 = Yes   | A2 = No |
 ---------|------------|---------|
 A1 = Yes |     UB     |  No UB  | 
 A1 = No  | ill-formed |  No UB  |
 ---------------------------------

If A2 is "No", A1 does not matter: there's no UB, but the bizarre situations of the first example (e.g. z = y giving UB, but not z = x even though x == y) show up. If A2 is "Yes", on the other hand, A1 becomes crucial; yet, enough evidence has been given to prove it would be "Yes".

Therefore, my thesis is that A1 = "Yes" and A2 = "Yes", and we should have Undefined Behavior.


FURTHER EVIDENCE:

This defect report (courtesy of Jesse Good) proposes a change that is aimed at giving Undefined Behavior in this case:

[...] In addition, 4.1 [conv.lval] paragraph 1 says that applying the lvalue-to-rvalue conversion to an “object [that] is uninitialized” results in undefined behavior; this should be rephrased in terms of an object with an indeterminate value.

In particular, the proposed wording for Paragraph 4.1 says:

When an lvalue-to-rvalue conversion occurs in an unevaluated operand or a subexpression thereof (Clause 5 [expr]) the value contained in the referenced object is not accessed. In all other cases, the result of the conversion is determined according to the following rules:

— If T is (possibly cv-qualified) std::nullptr_t, the result is a null pointer constant (4.10 [conv.ptr]).

— Otherwise, if the glvalue T has a class type, the conversion copy-initializes a temporary of type T from the glvalue and the result of the conversion is a prvalue for the temporary.

— Otherwise, if the object to which the glvalue refers contains an invalid pointer value (3.7.4.2 [basic.stc.dynamic.deallocation], 3.7.4.3 [basic.stc.dynamic.safety]), the behavior is implementation-defined.

— Otherwise, if T is a (possibly cv-qualified) unsigned character type (3.9.1 [basic.fundamental]), and the object to which the glvalue refers contains an indeterminate value (5.3.4 [expr.new], 8.5 [dcl.init], 12.6.2 [class.base.init]), and that object does not have automatic storage duration or the glvalue was the operand of a unary & operator or it was bound to a reference, the result is an unspecified value. [Footnote: The value may be different each time the lvalue-to-rvalue conversion is applied to the object. An unsigned char object with indeterminate value allocated to a register might trap. —end footnote]

Otherwise, if the object to which the glvalue refers contains an indeterminate value, the behavior is undefined.

— Otherwise, if the glvalue has (possibly cv-qualified) type std::nullptr_t, the prvalue result is a null pointer constant (4.10 [conv.ptr]). Otherwise, the value contained in the object indicated by the glvalue is the prvalue result.

Breezy answered 18/2, 2013 at 11:55 Comment(16)
Hm, there's a lot of talk about "operators" in your post, but my question has nothing to do with operators...Matchboard
@KerrekSB: Yes, I'm aware of this. That's why I marked my answer as a "conjecture". My assumption is that in the same way that value category requirements were left unspecified for operators, they were left unspecified for initializers. And since the intended specification for operators is (EDIT: seems to be) that wherever a value is needed, a prvalue is expected unless specified otherwise, it makes sense IMO to make the same assumption for initializers. A purely formal answer to your question can't be given I'm afraid, because the Standard itself lacks a well-defined specification.Breezy
+1, clearly useful, even though I don't know whether the conjecture is correct.Codel
@jogojapan: Thank you. Neither do I, which why I called it a conjecture of course ;-) However, IMHO it makes more much sense to assume it true than false.Breezy
Ok, deleted. Also, slightly related is defect report 616 and its related issues, but AFAICT it doesn't cover the OP's case.Procure
@JesseGood: Actually that defect report seems to show that the intended behavior is to give UB: "if the object to which the glvalue refers contains an indeterminate value, the behavior is undefined." Or am I too biased by my own viewpoint?Breezy
@AndyProwl: That part is true only if x is converted to a prvalue. It is still unclear whether that is the case (although I believe it isn't the case as it is not needed).Procure
@JesseGood: True, it requires the conversion. I do believe it is (intended to be) needed, but again, this is about opinions.Breezy
Consider volatile int x; volatile int y=x;. What happens if x happens to be a trap representation?Rossie
@tc: Not sure what you mean. In the C++11 Standard, the word "trap" appears just a few times, and always in an unrelated context. Do you think this should not be UB? If so, why?Breezy
@AndyProwl I'm not intricately familiar with C++11, but in C99, -INT_MAX-1 (two's complement) or negative zero (ones' complement or sign-magnitude) are allowed to be "trap" representations, as are integers with incorrect padding bits. It's unclear why int x=y; would be valid but int x;x=y; would be UB (and if so, are compilers on such platforms required to take steps to ensure that the former doesn't trap?). Other curiosities are int x=x=x; (UB?) or volatile int x=x; (what memory accesses are required?).Rossie
@tc.: OK, I'm not familiar with C99, so this will be hard :-) However, I am advocating that both int x = y and int x; x = y should lead to UB (the latter certainly does, which is why I believe the former should as well). In my view, int x=x=x; would be UB as well of course. It seems to me that you share this view, or don't you? If not, why so?Breezy
@AndyProwl In C99, an "indeterminate value" is either "unspecified" (any valid value) or a trap representation, which suggests that they are UB in general but an unspecified value on common architectures (because int has no trap representations). However, in C, x=x=x; is UB because it writes to x twice, which suggests that int x=x=x; should be too (though x is only "modified" once); I think C++ differs, but I don't remember if it's only in the presence of operator overloading.Rossie
@AndyProwl For the short/int example: The original code I suggested was short x; int y = x;, i.e. it converted from short to int, not vice versa. I think this is better, because it requires an implicit conversion, but it avoids potential overflow situations, which complicate the discussion.Codel
+1 because I finally agree that you were right after all this time!Procure
@JesseGood: Thank you, but after all I think it's Johannes who gave the clear-cut answer (mine is more a collection of evidence) :)Breezy
L
7

An implicit conversion sequence of an expression e to type T is defined as being equivalent to the following declaration, using t as the result of the conversion (modulo value category, which will be defined depending on T), 4p3 and 4p6

T t = e;

The effect of any implicit conversion is the same as performing the corresponding declaration and initialization and then using the temporary variable as the result of the conversion.

In clause 4, the conversion of an expression to a type always yields expressions with a specific property. For example, conversion of 0 to int* yields a null pointer value, and not just one arbitrary pointer value. The value category too is a specific property of an expression and its result is defined as follows

The result is an lvalue if T is an lvalue reference type or an rvalue reference to function type (8.3.2), an xvalue if T is an rvalue reference to object type, and a prvalue otherwise.

Hence we know that in int t = e;, the result of the conversion sequence is a prvalue, because int is a non-reference type. So if we provide a glvalue, we are in obvious need of a conversion. 3.10p2 further clarifies that to leave no doubt

Whenever a glvalue appears in a context where a prvalue is expected, the glvalue is converted to a prvalue; see 4.1, 4.2, and 4.3.

Lema answered 7/7, 2013 at 16:52 Comment(3)
(I'd love to give you a reward bounty, but the minimum bounty I can give is 300 -- am I that stingy or cheap? :-))Matchboard
Check out this proposal.Matchboard
@kerrek i already know that proposal. It is good that they are crafting clearer rules rather than using weak english casual terms.Lema
R
-7

The behavior is not undefined. The variable is uninitialized and stays with whatever random value uninitialized values start up with. One example from clan'g test suit:

int test7b(int y) {
  int x = x; // expected-note{{variable 'x' is declared here}}
  if (y)
    x = 1;
  // Warn with "may be uninitialized" here (not "is sometimes uninitialized"),
  // since the self-initialization is intended to suppress a -Wuninitialized
  // warning.
  return x; // expected-warning{{variable 'x' may be uninitialized when used here}}
}

Which you can find in clang/test/Sema/uninit-variables.c tests for this case explicitly.

Rhinelandpalatinate answered 26/2, 2013 at 23:16 Comment(2)
The behaviour is undefined according to the C++ standard. This means that compilers may do what they like, and your example shows what clang has chosen to do.Ruffian
The variable is uninitialized and stays with whatever random value uninitialized values start up with ... No, the compiler can do anything including optimizing the code away, see an example of clang doing so here.Superfamily

© 2022 - 2024 — McMap. All rights reserved.