When is it valid to access a pointer to a "dead" object?

Asked 10/6, 2013 at 13:17 Answered 21/11, 2023 at 22:21

Solved c pointers language-lawyer undefined-behavior

First, to clarify, I am not talking about dereferencing invalid pointers!

Consider the following two examples.

Example 1

typedef struct { int *p; } T;

T a = { malloc(sizeof(int) };
free(a.p);  // a.p is now indeterminate?
T b = a;    // Access through a non-character type?

Example 2

void foo(int *p) {}

int *p = malloc(sizeof(int));
free(p);   // p is now indeterminate?
foo(p);    // Access through a non-character type?

Question

Do either of the above examples invoke undefined behaviour?

Context

This question is posed in response to this discussion. The suggestion was that, for example, pointer arguments may be passed to a function via x86 segment registers, which could cause a hardware exception.

From the C99 standard, we learn the following (emphasis mine):

[3.17] indeterminate value - either an unspecified value or a trap representation

and then:

[6.2.4 p2] The value of a pointer becomes indeterminate when the object it points to reaches the end of its lifetime.

and then:

[6.2.6.1 p5] Certain object representations need not represent a value of the object type. If the stored value of an object has such a representation and is read by an lvalue expression that does not have character type, the behavior is undefined. If such a representation is produced by a side effect that modifies all or any part of the object by an lvalue expression that does not have character type, the behavior is undefined. Such a representation is called a trap representation.

Taking all of this together, what restrictions do we have on accessing pointers to "dead" objects?

Addendum

Whilst I've quoted the C99 standard above, I'd be interested to know if the behaviour differs in any of the C++ standards.

Inshrine answered 10/6, 2013 at 13:17 Comment(21)

You cited the Standard in an excellent manner - from those words, it's clear to me that using an invalid pointer in any way, even without dereferencing it, invokes undefined behavior. – Arran 10/6, 2013 at 13:20

I don't see where this should come from. As long as you pass the pointer around, nothing is happening. of course it is bvious, that it doesn't make sense, because you can not use this pointer anyway, but passing it around is virtually the same as having an uninitialized pointer. – Caddaric 10/6, 2013 at 13:29

@Devolus: Yes, that was my intuition too. But the standard seems relatively unambiguous. And AProgrammer made a good point (in the linked discussion), that if segment registers get involved, this really could lead to an HW exception. – Inshrine 10/6, 2013 at 13:31

@Devolus, what we're trying to understand is: "is passing it around safe?" – Spermato 10/6, 2013 at 13:31

free does not modify its argument. The pointer passed to free still points to the same location afterwards. The call to free simply informs the standard library that the object is no longer 'in use' and the storage at that location can be re-used. This is not the same as the object 'reaching the end of its lifetime', which occurs for objects on the stack. – Varix 10/6, 2013 at 13:40

@willj: That's correct. But nevertheless, the standard tells us that the pointer is now indeterminate. – Inshrine 10/6, 2013 at 13:42

C++ recently made this implementation-defined, see DR 1438, because it won't actually trap on all systems – Doting 10/6, 2013 at 13:45

The pointer is indeterminate if the object has reached the end of its lifetime.. where does it say that 'free' causes an object to 'reach the end of its lifetime'? As I can roll my own implementation of malloc and free, I guess that an implementation is not permitted to give them special treatment. – Varix 10/6, 2013 at 13:45

"Rolling your own" malloc and free invokes undefined behavior already. 7.1.3: "If the program declares or defines an identifier in a context in which it is reserved (other than as allowed by 7.1.4), or defines a reserved identifier as a macro name, the behavior is undefined." – Fayth 10/6, 2013 at 13:46

@Oli: ah, then I stand corrected ;) – Varix 10/6, 2013 at 13:49

@R..: I meant that I can roll my own customMalloc() and customFree() - in which case object lifetime would be unaffected. – Varix 10/6, 2013 at 13:51

@willj, it's not about modifying that value. Most probably the pointer still has the same value. However, if that value gets copied somewhere, it may pass through a special pointer register (e.g. segment register in x86) where the hardware could cause a trap due to the pointer being invalid. – Spermato 10/6, 2013 at 14:3

@Oli Charlesworth I think you are reading things "between the lines" a bit. The standard tells that a pointer is indeterminate if the object pointed at reaches the end of its life time. But this is cited from 6.2.4, the chapter about storage duration. One may argue and say that the cited text only refers to a pointer to an object that has reached the end of its scope, since that chapter starts by stating "Allocated storage is described in 7.22.3". In other words, allocated storage is a special case where 6.2.4 doesn't necessarily apply. – Anallise 10/6, 2013 at 14:20

But unfortunately, there's no useful information in 7.22.3 regarding the topic, or what happens with a pointer when you pass it to free() - whether it is formally turning indeterminate or not. – Anallise 10/6, 2013 at 14:21

@Lundin: Hmm, that's not how I interpret it. I don't see allocated storage as a special case, it's simply described in a separate section for convenience. However, if your interpretation is correct, we could simply rewrite both my examples to use pointers to automatic objects that have died... – Inshrine 10/6, 2013 at 14:26

@OliCharlesworth It far from obvious how to interpret it. After a second reading of C11 6.2.4 I found that the chapter defines the lifetime for static and automatic objects (and for thread storage in C11), but not for "allocated" ones. Yet in C11 7.22.3, there is a sentence stating: The lifetime of an allocated object extends from the allocation until the deallocation. That line seems to go well together with the text you cited from 6.2.4. – Anallise 10/6, 2013 at 14:34

@Lundin: It's when the object has reached the end of its lifetime, not (necessarily) the end of it's scope. (Scope is a region of program text over which an identifier is visible.) – Lasseter 10/6, 2013 at 17:39

@OliverCharlesworth can I suggest changing this to a C question? Since C and C++ are considerably different in this area , this question would get confusing if C++ answers were added. There could be a different thread made for the C++ version. (The existing C++ answer that has been posted actually doesn't answer the question at all) – Faraday 7/6, 2015 at 14:57

@MattMcNabb: Sure, if you like. The C++ part of the question was only ever added as an addendum... – Inshrine 7/6, 2015 at 14:58

@MattMcNabb Jonathan Wakely already mentioned DR 1438. Non-dereference use of invalid pointers: "The current Standard says that any use of an invalid pointer value produces undefined behavior (3.7.4.2 [basic.stc.dynamic.deallocation] paragraph 4). This includes not only dereferencing the pointer but even just fetching its value." Nothing to add here. – Eleph 7/6, 2015 at 15:4

@Eleph C++ doesn't clearly define what an invalid pointer is ; the amount of discussion generated on this question suggests that it is not so simple – Faraday 7/6, 2015 at 15:12

Example 2 is invalid. The analysis in your question is correct.

Example 1 is valid. A structure type never holds a trap representation, even if one of its members does. This means that structure assignment, on a system where trap representations would cause problems, must be implemented as a bytewise copy, rather than a member-by-member copy.

6.2.6 Representations of types

6.2.6.1 General

6 [...] The value of a structure or union object is never a t rap representation, even though the value of a member of the structure or union object may be a trap representation.

Drop answered 10/6, 2013 at 13:40 Comment(16)

Ah, that's interesting. I hadn't noticed that clause. Thanks! – Inshrine 10/6, 2013 at 13:47

Since the issue isn't trap representations but indeterminate values, I don't think the issue is resolved by the cited text. Per J.2 (albeit non-normative), UB results if "The value of an object with automatic storage duration is used while it is indeterminate (6.2.4, 6.7.8, 6.8)." However, perhaps in this case it is the value of the member, not the value of the structure, that is indeterminate, in which case the value of the object with indeterminate value is not used. – Fayth 11/6, 2013 at 1:17

@R. J.2 is outdated. The normative text (of C99, anyway) only disallows reading objects that hold trap representations. If they are indeterminate but cannot hold trap representations, reading is allowed. This is important for, for example, unsigned char too. – Drop 11/6, 2013 at 6:49

@R.. There's DR 338 that is supposed to tighten the rules somewhat again, but I don't see it in a draft of C11 (perhaps it was included after the last public draft), so I'm not sure how that affects my answer here. – Drop 11/6, 2013 at 6:56

@hvd: That seems more like how a standard should be written, though I wish writers would further specify that the existence of trap representations or things that behave like them must be implementation-defined, though the consequences need not be. – Flooded 28/4, 2015 at 19:25

@Flooded Any implementation where malloc can succeed for a size of at least 2 must have trap representations: add one byte to malloc's result, and you get a pointer that is not allowed to compare equal to any pointer value that was valid just before malloc was called. Because of that, before malloc was called, that representation was a trap representation. – Drop 28/4, 2015 at 20:20

@hvd: The uses I have seen for the term "trap representation" imply a value which, when read as an rvalue, will disrupt normal program flow in some fashion which would hopefully be recognizable as a trap, but whose particulars are beyond the scope of the C standard. Basically, what I would like to see would be for the Standard to say that an implementation should have to specify under what cases the statement p=q; (given unaliased variables p and q of the same type (any type)) might do anything other than make p hold a value which is at least as well defined as what's in q. – Flooded 28/4, 2015 at 20:32

@Flooded The standard has a very specific definition for a trap representation: it's a representation that doesn't represent a value. C99 6.2.6.1p5: "Certain object representations need not represent a value of the object type. [...] Such a representation is called a trap representation." You mean something else by it. Anyway, as of C11, reading indeterminate values is mostly undefined again, even if the type has no trap representations, so it wouldn't get you much. – Drop 28/4, 2015 at 20:41

@hvd: My point is that there are a lot of cases where it would be acceptable for code which is given invalid data to interpret it as arbitrary gibberish, and in some such cases would also be acceptable for the program to trap in recognizable fashion, but where adherence to the laws of causality is required in any case. The reason that reading a trap representation was defined as Undefined Behavior, rather than merely yielding an unspecified value was to allow for the possibility that such accesses might disrupt program flow in ways outside the scope of the Standard. – Flooded 28/4, 2015 at 20:49

@Flooded Analyzability may be of interest for that (but not for your earlier comments). As of C11, an implementation can define __STDC_ANALYZABLE__ to indicate that the effects of undefined behaviour are limited, except for critical undefined behaviour. And reading trap representations is not critical undefined behaviour: if __STDC_ANALYZABLE__ is defined, it may cause the program to abort, but it may not completely corrupt the execution of the program. – Drop 28/4, 2015 at 20:51

@hvd: Thanks a million for that; I wonder why I've not seen it mentioned anywhere before? If code can safely use a constraint handler to longjmp back to sanity, that's a major help to many optimizations. IMHO, having a program require analyzability would seem like it could in many cases enable much more useful optimizations than would be enabled by letting compilers go crazy. Being able to specify non-trapping could help in a few more cases, e.g. uint32_t x,y,z; ... x=y*z; could fail on systems where int is 33-64 bits, but disabling traps would make sane implementations "just work". – Flooded 28/4, 2015 at 21:6

@Flooded I don't know if any implementations support it, and that may be why I've only rarely seen it mentioned either. As for x=y*z;, I've suggested x=1U*y*z; in the past if an implementation is found where uint32_t exists and promotes to a signed type. Yeah, it's ugly, and it really shouldn't be necessary, but if you want to support common compilers like GCC (known to optimise aggressively), you will end up needing something like that anyway. – Drop 28/4, 2015 at 21:12

@hvd: Too bad people are working harder to break analyzability than support it. A couple abilities analyzability still doesn't seem to provide, but most implementations could in practice provide if trapping were bypassed would be (1) determine whether realloc has moved an allocation (in general, comparisons between live and dead pointers cannot be expected to be meaningful, but in this case it should) (2) given two pointers which have not been modified since they pointed to the same live object, report the what displacement was (in units of char*) when the object was alive. – Flooded 28/4, 2015 at 21:33

@hvd: The above operations should not access unowned memory, and while ideally all operations which would produce an invalid pointer without special "permission" would be trapped, neither operation produces any kind of pointer. As such, even though they involve dead pointers, it should be possible for any platform to perform them safely by, at worst, using memcpy to copy the pointers to a suitably-sized char[], shuffling any bits as required to yield an integer that can be used for the comparison or subtraction (a library macro could exploit UB to do such things faster, though). – Flooded 28/4, 2015 at 21:46

@R..GitHubSTOPHELPINGICE: The language about structures not being trap representations dates back to C89, where the term "Indeterminate value" was defined as "Either a valid value or a trap representation"; the only means by which use of an indeterminate value could invoke UB was if the value in question might happen to be a trap representation--something that was specified as impossible for structures. C99 deliberately added the possibility that non-addressable objects holding indeterminate values of scalar types might not behave as values in range of their types, but nothing in the... – Flooded 22/11, 2023 at 16:47

...rationale suggests any intention of changing the behavior of structures beyond perhaps allowing for the possibility that copying a structure where some members are Indeterminate, may leave those members of the copy Indeterminate under the expanded definition of the term. – Flooded 22/11, 2023 at 16:56

My interpretation is that while only non-character types can have trap representations, any type can have indeterminate value, and that accessing an object with indeterminate value in any way invokes undefined behavior. The most infamous example might be OpenSSL's invalid use of uninitialized objects as a random seed.

So, the answer to your question would be: never.

By the way, an interesting consequence of not just the pointed-to object but the pointer itself being indeterminate after free or realloc is that this idiom invokes undefined behavior:

void *tmp = realloc(ptr, newsize);
if (tmp != ptr) {
    /* ... */
}

Fayth answered 10/6, 2013 at 13:33 Comment(7)

Re "accessing an object ..."; there is a footnote in the standard which I didn't quote above: "Thus, an automatic variable can be initialized to a trap representation without causing undefined behavior, but the value of the variable cannot be used until a proper value is stored in it." It sounds like writing to such an object is acceptable. – Inshrine 10/6, 2013 at 13:36

@OliCharlesworth, of course it is. Otherwise how can you do something like: free(x); x = NULL;? – Spermato 10/6, 2013 at 13:37

@Shahbaz: Indeed! I'm just having trouble parsing the standard in such a way that it allows this kind of thing ;) – Inshrine 10/6, 2013 at 13:37

@OliCharlesworth, I think the part that says: If the stored value of an object has such a representation and is read by an lvalue expression..., shows that it can be written to, but not read from. – Spermato 10/6, 2013 at 14:6

void *tmp = realloc(ptr, newsize); << if realloc does fail, then tmp is valid (NULL) and ptr remains valid as well. This is not UB when tmp==NULL. – Indign 10/6, 2013 at 15:13

@jimmcnamara: Of course. But it's UB in the success case, which was the point. – Fayth 10/6, 2013 at 20:18

The Standard explicitly guarantees that structures will never have trap representations. I would be hard-pressed to identify any case where that would be meaningful if copying a struct whose value was at least partially indeterminate would have any effect beyond producing a copy whose value might likewise be partially indeterminate. – Flooded 3/10, 2017 at 19:47

Saying that the pointer value becomes indeterminate, even if nothing disturbs the bits representing it, is likely an effort to accommodate the "as-if" rule. If there is some sequence of actions whose behavior might be observably affected by a useful optimizing transform, the as-if rule requires that at least one action within that sequence be characterized as invoking Undefined Behavior that would justify any observable quirks stemming from the optimization.

Consider the following function:

void test(int *p1, uint64_t ofs)
{
  int ret;
  int *p2 = malloc(sizeof (int));
  if ((uintptr_t)p1 == (uintptr_t)p2+ofs)
  {
    *p2 = 1;
    *p1 = 2;
    doSomething(*p2);
  }
  free(p2);
  return p2;
}

In most cases where the function might be invoked, replacing the call to doSomething(*p2) with doSomething(2) would improve performance without affecting behavior except in scenarios where p1 is a pointer to a dead region of storage whose address happens to coincide with the address of the new region returned from malloc(). Treating p1 as becoming indeterminate when the storage identified thereby would become eligible for reuse by malloc() would allow a compiler to ignore the possibility that the address might be found to match the address of some future allocation.

Flooded answered 21/11, 2023 at 22:21 Comment(0)

-1

C++ discussion

Short answer: In C++, there is no such thing as accessing "reading" a class instance; you can only "read" non-class object, and this is done by a lvalue-to-rvalue conversion.

Detailed answer:

typedef struct { int *p; } T;

T designates an unnamed class. For the sake of the discussion let's name this class T:

struct T {
    int *p; 
};

Because you did not declare a copy constructor, the compiler implicitly declares one, so the class definition reads:

struct T {
    int *p; 
    T (const T&);
};

So we have:

T a;
T b = a;    // Access through a non-character type?

Yes, indeed; this is initialization by copy constructor, so the copy constructor definition will be generated by the compiler; the definition is equivalent with

inline T::T (const T& rhs) 
    : p(rhs.p) {
}

So you are accessing the value as a pointer, not a bunch of bytes.

If the pointer value is invalid (not initialized, freed), the behavior is not defined.

Eleph answered 16/6, 2013 at 4:23 Comment(7)

Actually an lvalue to rvalue conversion can be done for class lvalues too. The context is when passing a class lvalue through the ellipsis in a function call. – Levenson 16/6, 2013 at 9:33

@JohannesSchaub-litb Yes you can. [conv.lval]"Otherwise, if the glvalue has a class type, the conversion copy-initializes a temporary of type T from the glvalue and the result of the conversion is a prvalue for the temporary" So this conversion is defined in term of the ctor, and we go back to accessing the each member one-by-one, with lvalue-to-rvalue conversion for each one. – Eleph 16/6, 2013 at 10:35

that is correct. At least as far as nonunion class objects are concerned. Unions are copied "bitwise". – Levenson 16/6, 2013 at 11:20

This all has nothing to do with the question except for the last sentence ... which you give no justification for. – Faraday 7/6, 2015 at 14:30

@MattMcNabb Hug? This has everything to do with the question... I don't know what you are trying to say. – Eleph 7/6, 2015 at 15:1

The examples in the question are about using a pointer after the space it points to has been freed . In your code you copy an uninitialized pointer, which is different. Also, all the stuff about the class is irrelevant, you could equally well have written int *a; int *b = a; – Faraday 7/6, 2015 at 15:5

"all the stuff about the class is irrelevant" all the stuff about the class relates to "Example 1" in the question! – Eleph 15/6, 2015 at 23:8

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

C++ discussion

Recommended topics

Hot tags