Dangling references and undefined behavior
Asked Answered
E

3

14

Assume a dangling reference x. Is it undefined behavior to just write

&x;

or even

x;

?

Entomophilous answered 6/2, 2013 at 13:41 Comment(16)
My best guess is that it's not UD. *x; definitely is.Hunterhunting
x is a reference, so *x... not really legal.Entomophilous
Jan Dvorak: question says dangling reference, not dangling pointer.Nerti
@Nerti in which case x; is illegal as per my guess, and &x is perfectly fine.Hunterhunting
Just out of curiosity, why do you ask?Adios
In reference to comments on https://mcmap.net/q/830914/-passing-a-reference-to-a-c-constructorLoaning
Jan Dvorak: how do you figure &x is fine? In particular if the type of the referenced object is a class which overloads operator&, it would definitely be undefined behaviour; nothing in the standard makes me think it's defined even if that's not the case.Nerti
@Nerti excellent point.Entomophilous
5.3.1 about the address-of operator says: "if the type of the expression is T, the result has type “pointer to T” and is a prvalue that is the address of the designated object (1.7) or a pointer to the designated function". Here, it seems to imply the object must exist. Or is it just my interpretation?Reside
Andy, I think your interpretation is sound. At least, it is not safe to take the alternative view.Nerti
@AndyProwl I'd definitely add that as an answer (at least partial) and go back to the original comment here - #14730656Entomophilous
@LuchianGrigore: The thing is I'm not sure. I edited my answer then and decided to keep it as deleted for the moment. 1.8 says "An object is a region of storage.". So "the designated object" could mean "the designated region of storage". I wonder why is it always so difficult to infer something so simple from the StandardReside
@AndyProwl Because it's written by lots of people with slightly different ideas about certain things. A bit like the Bible.Officialism
@sftrabbit: I understand it is written by people with different ideas, and that's good for the standardization process. But the Standard itself should embody just one unambiguous idea, and I believe it does. That's fine. The problem is how it expresses it. In order to figure out whether some simple proposition is or is not implied by the Standard you often need to deduce implicit corollaries and theorems as if it was Number Theory (think of all the const/thread-safe story, or the fact that std::strings shall get stored in a continuous memory region, etc). This is unnecessary IMO.Reside
@AndyProwl Oh, I agree. I wasn't excusing it.Officialism
@sftrabbit: I know, I was just letting out some frustration :-)Reside
O
6

What makes the use of an invalid object (reference, pointer, whatever) undefined behaviour is lvalue-to-rvalue conversion (§4.1):

If the object to which the glvalue refers is not an object of type T and is not an object of a type derived from T, or if the object is uninitialized, a program that necessitates this conversion has undefined behavior.

Assuming we haven't overloaded operator&, the unary & operator takes an lvalue as its operand, so no conversion occurs. Having just an identifier, as in x; also requires no conversion. You will only get undefined behaviour when the reference is used as an operand in an expression that expects that operand to be an rvalue - which is the case for most operators. The point is, doing &x doesn't actually require accessing the value of x. Lvalue-to-rvalue conversion occurs with those operators that need to access its value.

I believe your code is well defined.

When operator& has been overloaded, the expression &x is transformed into a function call and does not obey the rules of the built-in operators - instead it follows the rules of a function call. For &x, the translation to function call results in either x.operator&() or operator&(x). In the first case, lvalue-to-rvalue conversion will occur on x when the class member access operator is used. In the second case, the argument of operator& will be copy-initialised with x (as in T arg = x), and the behaviour of this depends on the type of the argument. For example, in the case of the argument being an lvalue reference, there is no undefined behaviour because lvalue-to-rvalue conversion does not occur.

So if operator& is overloaded for the type of x, the code may or may not be well-defined, depending on the calling of the operator& function.

You could argue that the unary & operator relies on there being at least some valid region of storage that you have the address of:

Otherwise, if the type of the expression is T, the result has type "pointer to T" and is a prvalue that is the address of the designated object

And an object is defined as being a region of storage. After the object that is referred to is destroyed, that region of storage no longer exists.

I prefer to believe that it will only result in undefined behaviour if the invalid object is actually accessed. The reference still believes it's referring to some object and it can happily give the address of it even if it doesn't exist. However, this seems to be an ill-specified part of the standard.


Aside

As an example of undefined behaviour, consider x + x. Now we hit another ill-specified part of the standard. The value category of the operands of + are not specified. It is generally inferred from §5/8 that if it is not specified, then it expects a prvalue:

Whenever a glvalue expression appears as an operand of an operator that expects a prvalue for that operand, the lvalue-to-rvalue (4.1), array-to-pointer (4.2), or function-to-pointer (4.3) standard conversions are applied to convert the expression to a prvalue.

Now because x is an lvalue, the lvalue-to-rvalue conversion is required and we get undefined behaviour. This makes sense because addition requires accessing the value of x so it can work out the result.

Officialism answered 6/2, 2013 at 13:49 Comment(8)
The &-operator can be overloaded, however.Nerti
The second part of the answer (which you just added) isn't relevant.Entomophilous
@Nerti I was hoping that was a safe assumption for this question!Officialism
@LuchianGrigore I thought it'd be useful for other readers of the question.Officialism
@sftrabbit also, if you don't overload the default operator&, you still have the default one. How are they different?Entomophilous
An lvalue is an expression referring to an object, and the result of & is the address of the object. But in this case, there is no object. I'm not so sure what the Standard really says.Deluna
@LuchianGrigore The default operator& is described by the unary expression semantics in §5.3.1. When operator& is overloaded, it is transformed into a function call and instead follows the rules of function calls.Officialism
In C++, objects and storage are separate concepts. Objects must exist in storage, however storage can exist with no objects in it. Objects may be destroyed without releasing the storage (this is how std::vector is implemented, for example, it obtains storage and creates/moves/destroys objects within the storage as needed)Snapp
D
4

Supposing that x was initialized with a valid object, which was then destroyed, §3.8/6 applies:

Similarly, before the lifetime of an object has started but after the storage which the object will occupy has been allocated or, after the lifetime of an object has ended and before the storage which the object occupied is reused or released, any glvalue that refers to the original object may be used but only in limited ways. For an object under construction or destruction, see 12.7. Otherwise, such a glvalue refers to allocated storage (3.7.4.2), and using the properties of the glvalue that do not depend on its value is well-defined. The program has undefined behavior if:

— an lvalue-to-rvalue conversion (4.1) is applied to such a glvalue,

— the glvalue is used to access a non-static data member or call a non-static member function of the object, or

— the glvalue is bound to a reference to a virtual base class (8.5.3), or

— the glvalue is used as the operand of a dynamic_cast (5.2.7) or as the operand of typeid.

So, simply taking the address is well-defined, and (referring to the neighboring paragraphs) can even be productively used to create a new object in place of the old one.

As for not taking the address and just writing x, that really does absolutely nothing, and it is a proper subexpression of &x. So it's also OK.

Deluna answered 6/2, 2013 at 14:19 Comment(4)
Which means this only applies "before the storage which the object occupied is reused or released".Vargueno
@Angew Yes. Suppose I could trace further, but my general instinct is that this kind of thing is tempting fate.Deluna
This answer covers cases where the storage still exists, but what about cases where it doesn't? (E.g. function returns reference to local variable - the storage is released for automatic variables when the function returns)Snapp
@Snapp That would require a memory model specification which C++ still lacks as far as I know. The reasoning of UB "between the cracks" of other answers here still holds.Deluna
V
4

First off, very interesting question.

I would say it is undefined behaviour, assuming "dangling reference" means "referred-to object's lifetime has ended and the storage the object occupied has been reused or released." I base my reasoning on the following standard rulings:

3.8 §3:

The properties ascribed to objects throughout this International Standard apply for a given object only during its lifetime. [ Note: In particular, before the lifetime of an object starts and after its lifetime ends there are significant restrictions on the use of the object, as described below ...]

All the cases "as described below" refer to

Before the lifetime of an object has started but after the storage which the object will occupy has been allocated38 or, after the lifetime of an object has ended and before the storage which the object occupied is reused or released

1.3.24: undefined behavior

behavior for which this International Standard imposes no requirements [ Note: Undefined behavior may be expected when this International Standard omits any explicit definition of behavior or when a program uses an erroneous construct or erroneous data. ...]

I apply the following train of thoughts to the above quotes:

  1. If the standard doesn't describe behaviour for a situation, the behvaiour is undefined.
  2. The standard only describes behvaiour for objects within their lifetime, and a few special cases near the start/end of their lifetime. None of these apply to our dangling reference.
  3. Therefore, using the danling reference in any way has no behaviour prescribed by the standard, hence the behaviour is undefined.
Vargueno answered 6/2, 2013 at 14:19 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.