Where exactly does C++ standard say dereferencing an uninitialized pointer is undefined behavior?
Asked Answered
A

7

16

So far I can't find how to deduce that the following:

int* ptr;
*ptr = 0;

is undefined behavior.

First of all, there's 5.3.1/1 that states that * means indirection which converts T* to T. But this doesn't say anything about UB.

Then there's often quoted 3.7.3.2/4 saying that using deallocation function on a non-null pointer renders the pointer invalid and later usage of the invalid pointer is UB. But in the code above there's nothing about deallocation.

How can UB be deduced in the code above?

Ablaze answered 26/11, 2010 at 14:1 Comment(4)
My guess is that it comes from C Standard 6.5.3.2/4Detruncate
What does the standard say about initialisation and declaration of pointers? As far as I'm aware, the declaration doesn't initialise the pointer, so it could be anything, assigning a value to where it points could do anything. I could be wrong;-)Atchley
Is it not undefined behaviour to read from any uninitialised variable, pointer or not? Consider that you may be writing to the pointed-to address, but you're reading from the pointer in the process.Tinytinya
I recently asked a similar question and got a pretty good answer: stackoverflow.com/questions/43533262/…Edson
T
13

Section 4.1 looks like a candidate (emphasis mine):

An lvalue (3.10) of a non-function, non-array type T can be converted to an rvalue. If T is an incomplete type, a program that necessitates this conversion is ill-formed. If the object to which the lvalue refers is not an object of type T and is not an object of a type derived from T, or if the object is uninitialized, a program that necessitates this conversion has undefined behavior. If T is a non-class type, the type of the rvalue is the cv-unqualified version of T. Otherwise, the type of the rvalue is T.

I'm sure just searching on "uninitial" in the spec can find you more candidates.

Tot answered 26/11, 2010 at 14:18 Comment(6)
I learned from Johannes/litb the other day that there's a bit of a defect in the spec here: open-std.org/jtc1/sc22/wg21/docs/cwg_active.html#240. So you need to read that paragraph with some sympathy to the fact that when it says "uninitialized", really the standard should be more clear about uninitialized/indeterminate values. In this case the object certainly is uninitialized, though, so your quotation covers it.Glance
A badly-worded standards document?! But that's unpossible! ;)Tot
@SteveJessop I don't see how it applies in this case, how can there be an lvalue to rvalue conversion when you have *ptr = 0;? The result of * is an lvalue and = requires the left had operand to be a modifiable lvalue.Archiphoneme
@ShafikYaghmour: ptr is an lvalue. There is an lvalue-to-rvalue conversion applied to ptr in order to evaluate *ptr.Glance
@SteveJessop hmmm, ptr is an identifer and the result of that expression is lvalue if it is a variable, so I still don't see it, per 5.1.1 p 8.Archiphoneme
@ShafikYaghmour: In *ptr the * unary operator has a single operand, ptr, which is an lvalue expression that requires conversion. This is no different from if you write a + b. There is an lvalue-to-rvalue conversion on each operand of + (a and b).Glance
A
6

I found the answer to this question is a unexpected corner of the C++ draft standard, section 24.2 Iterator requirements, specifically section 24.2.1 In general paragraph 5 and 10 which respectively say (emphasis mine):

[...][ Example: After the declaration of an uninitialized pointer x (as with int* x;), x must always be assumed to have a singular value of a pointer. —end example ] [...] Dereferenceable values are always non-singular.

and:

An invalid iterator is an iterator that may be singular.268

and footnote 268 says:

This definition applies to pointers, since pointers are iterators. The effect of dereferencing an iterator that has been invalidated is undefined.

Although it does look like there is some controversy over whether a null pointer is singular or not and it looks like the term singular value needs to be properly defined in a more general manner.

The intent of singular is seems to be summed up well in defect report 278. What does iterator validity mean? under the rationale section which says:

Why do we say "may be singular", instead of "is singular"? That's becuase a valid iterator is one that is known to be nonsingular. Invalidating an iterator means changing it in such a way that it's no longer known to be nonsingular. An example: inserting an element into the middle of a vector is correctly said to invalidate all iterators pointing into the vector. That doesn't necessarily mean they all become singular.

So invalidation and being uninitialized may create a value that is singular but since we can not prove they are nonsingular we must assume they are singular.

Update

An alternative common sense approach would be to note that the draft standard section 5.3.1 Unary operators paragraph 1 which says(emphasis mine):

The unary * operator performs indirection: the expression to which it is applied shall be a pointer to an object type, or a pointer to a function type and the result is an lvalue referring to the object or function to which the expression points.[...]

and if we then go to section 3.10 Lvalues and rvalues paragraph 1 says(emphasis mine):

An lvalue (so called, historically, because lvalues could appear on the left-hand side of an assignment expression) designates a function or an object. [...]

but ptr will not, except by chance, point to a valid object.

Archiphoneme answered 16/12, 2013 at 15:10 Comment(0)
P
5

The OP's question is nonsense. There is no requirement that the Standard say certain behaviours are undefined, and indeed I would argue that all such wording be removed from the Standard because it confuses people and makes the Standard more verbose than necessary.

The Standard defines certain behaviour. The question is, does it specify any behaviour in this case? If it does not, the behaviour is undefined whether or not it says so explicitly.

In fact the specification that some things are undefined is left in the Standard primarily as a debugging aid for the Standards writers, the idea being to generate a contradiction if there is a requirement in one place which conflicts with an explicit statement of undefined behaviour in another: that's a way to prove a defect in the Standard. Without the explicit statement of undefined behaviour, the other clause prescribing behaviour would be normative and unchallenged.

Pejorative answered 13/12, 2010 at 18:1 Comment(1)
the Standard defines certain behaviours, but it should define, and it has , to some degree and most of the time, defined certain behaviours as undefined - especially when such behaviours are in the gray moral areas. The question is valid.Edson
A
5

Evaluating an uninitialized pointer causes undefined behaviour. Since dereferencing the pointer first requires evaluating it, this implies that dereferencing also causes undefined behaviour.

This was true in both C++11 and C++14, although the wording changed.

In C++14 it is fully covered by [dcl.init]/12:

When storage for an object with automatic or dynamic storage duration is obtained, the object has an indeterminate value, and if no initialization is performed for the object, that object retains an indeterminate value until that value is replaced.

If an indeterminate value is produced by an evaluation, the behavior is undefined except in the following cases:

where the "following cases" are particular operations on unsigned char.


In C++11, [conv.lval/2] covered this under the lvalue-to-rvalue conversion procedure (i.e. retrieving the pointer value from the storage area denoted by ptr):

A glvalue of a non-function, non-array type T can be converted to a prvalue. If T is an incomplete type, a program that necessitates this conversion is ill-formed. If the object to which the glvalue refers is not an object of type T and is not an object of a type derived from T, or if the object is uninitialized, a program that necessitates this conversion has undefined behavior.

The bolded part was removed for C++14 and replaced with the extra text in [dcl.init/12].

Across answered 8/6, 2015 at 1:11 Comment(0)
J
3

I'm not going to pretend I know a lot about this, but some compilers would initialize the pointer to NULL and dereferencing a pointer to NULL is UB.

Also considering that uninitialized pointer could point to anything (this includes NULL) you could concluded that it's UB when you dereference it.

A note in section 8.3.2 [dcl.ref]

[Note: in particular, a null reference cannot exist in a well-defined program, because the only way to create such a reference would be to bind it to the “object” obtained by dereferencing a null pointer, which causes undefined behavior. As described in 9.6, a reference cannot be bound directly to a bitfield. ]

—ISO/IEC 14882:1998(E), the ISO C++ standard, in section 8.3.2 [dcl.ref]

I think I should have written this as comment instead, I'm not really that sure.

Jellied answered 26/11, 2010 at 14:13 Comment(6)
Notes are not binding - they are for information only stackoverflow.com/questions/4274763/…Ablaze
@sharptooth: they aren't binding if they contradict normative text, but unless there's a defect in the standard anything they say about the language is true. I don't know of anywhere in the standard that says you can dereference a null pointer :-) Notes should be redundant with other information present elsewhere in the standard. But the proof that they're true might require a lot of flipping through sections, deduction, etc. So a programmer can opt to rely on their truth, whereas an implementer might need to know exactly why they're true and hence might need the normative text.Glance
Dereferencing an uninitialised pointer is defined. It points to the memory address (this is not defined in the program) contained in the pointer variable. Dereferencing a NULL pointer (where NULL is defined as 0, not nullptr or __nullptr) will point to memory address 0. There are platforms where it is necessary to access memory address 0. Some embedded devices for example.Skirting
@T33C, I think you might be right there. I think you can't randomly obtain the same "null" the note is talking about. (not sure if what I said makes sense). There is still issue of the random value being outside of memory address range.Jellied
The note is referring to null reference. Nowhere in the standard does it say that dereferencing null causes undefined behaviour. Only in this note which says that obtaining a reference by dereferencing a null pointer is undefined but it is using (or creating) such a reference that is undefined. Dereferencing a pointer with the value 0 (NULL) is defined behaviour and sometimes necessary to access memory location 0x0000 on the hardware. I have needed to do this on a PIC (embedded device)Skirting
@T33C, the note first mentions a null reference, then by way of explanation notes that dereferencing a null pointer is undefined. But we shouldn't get side-tracked, since the question is about an uninitialized pointer.Parrotfish
E
3

To dereference the pointer, you need to read from the pointer variable (not talking about the object it points to). Reading from an uninitialized variable is undefined behaviour.

What you do with the value of pointer after you have read it, doesn't matter anymore at this point, be it writing to (like in your example) or reading from the object it points to.

Eupatorium answered 26/11, 2010 at 14:41 Comment(0)
A
1

Even if the normal storage of something in memory would have no "room" for any trap bits or trap representations, implementations are not required to store automatic variables the same way as static-duration variables except when there is a possibility that user code might hold a pointer to them somewhere. This behavior is most visible with integer types. On a typical 32-bit system, given the code:

uint16_t foo(void);
uint16_t bar(void);
uint16_t blah(uint32_t q)
{
  uint16_t a;
  if (q & 1) a=foo();
  if (q & 2) a=bar();
  return a;
}
unsigned short test(void)
{
  return blah(65540);
}

it would not be particularly surprising for test to yield 65540 even though that value is outside the representable range of uint16_t, a type which has no trap representations. If a local variable of type uint16_t holds Indeterminate Value, there is no requirement that reading it yield a value within the range of uint16_t. Since unexpected behaviors could result when using even unsigned integers in such fashion, there's no reason to expect that pointers couldn't behave in even worse fashion.

Aerification answered 8/6, 2015 at 16:31 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.