When is it valid to access a pointer to a "dead" object?
Asked Answered
I

4

53

First, to clarify, I am not talking about dereferencing invalid pointers!

Consider the following two examples.

Example 1

typedef struct { int *p; } T;

T a = { malloc(sizeof(int) };
free(a.p);  // a.p is now indeterminate?
T b = a;    // Access through a non-character type?

Example 2

void foo(int *p) {}

int *p = malloc(sizeof(int));
free(p);   // p is now indeterminate?
foo(p);    // Access through a non-character type?

Question

Do either of the above examples invoke undefined behaviour?

Context

This question is posed in response to this discussion. The suggestion was that, for example, pointer arguments may be passed to a function via x86 segment registers, which could cause a hardware exception.

From the C99 standard, we learn the following (emphasis mine):

[3.17] indeterminate value - either an unspecified value or a trap representation

and then:

[6.2.4 p2] The value of a pointer becomes indeterminate when the object it points to reaches the end of its lifetime.

and then:

[6.2.6.1 p5] Certain object representations need not represent a value of the object type. If the stored value of an object has such a representation and is read by an lvalue expression that does not have character type, the behavior is undefined. If such a representation is produced by a side effect that modifies all or any part of the object by an lvalue expression that does not have character type, the behavior is undefined. Such a representation is called a trap representation.

Taking all of this together, what restrictions do we have on accessing pointers to "dead" objects?

Addendum

Whilst I've quoted the C99 standard above, I'd be interested to know if the behaviour differs in any of the C++ standards.

Inshrine answered 10/6, 2013 at 13:17 Comment(21)
You cited the Standard in an excellent manner - from those words, it's clear to me that using an invalid pointer in any way, even without dereferencing it, invokes undefined behavior.Arran
I don't see where this should come from. As long as you pass the pointer around, nothing is happening. of course it is bvious, that it doesn't make sense, because you can not use this pointer anyway, but passing it around is virtually the same as having an uninitialized pointer.Caddaric
@Devolus: Yes, that was my intuition too. But the standard seems relatively unambiguous. And AProgrammer made a good point (in the linked discussion), that if segment registers get involved, this really could lead to an HW exception.Inshrine
@Devolus, what we're trying to understand is: "is passing it around safe?"Spermato
free does not modify its argument. The pointer passed to free still points to the same location afterwards. The call to free simply informs the standard library that the object is no longer 'in use' and the storage at that location can be re-used. This is not the same as the object 'reaching the end of its lifetime', which occurs for objects on the stack.Varix
@willj: That's correct. But nevertheless, the standard tells us that the pointer is now indeterminate.Inshrine
C++ recently made this implementation-defined, see DR 1438, because it won't actually trap on all systemsDoting
The pointer is indeterminate if the object has reached the end of its lifetime.. where does it say that 'free' causes an object to 'reach the end of its lifetime'? As I can roll my own implementation of malloc and free, I guess that an implementation is not permitted to give them special treatment.Varix
"Rolling your own" malloc and free invokes undefined behavior already. 7.1.3: "If the program declares or defines an identifier in a context in which it is reserved (other than as allowed by 7.1.4), or defines a reserved identifier as a macro name, the behavior is undefined."Fayth
@Oli: ah, then I stand corrected ;)Varix
@R..: I meant that I can roll my own customMalloc() and customFree() - in which case object lifetime would be unaffected.Varix
@willj, it's not about modifying that value. Most probably the pointer still has the same value. However, if that value gets copied somewhere, it may pass through a special pointer register (e.g. segment register in x86) where the hardware could cause a trap due to the pointer being invalid.Spermato
@Oli Charlesworth I think you are reading things "between the lines" a bit. The standard tells that a pointer is indeterminate if the object pointed at reaches the end of its life time. But this is cited from 6.2.4, the chapter about storage duration. One may argue and say that the cited text only refers to a pointer to an object that has reached the end of its scope, since that chapter starts by stating "Allocated storage is described in 7.22.3". In other words, allocated storage is a special case where 6.2.4 doesn't necessarily apply.Anallise
But unfortunately, there's no useful information in 7.22.3 regarding the topic, or what happens with a pointer when you pass it to free() - whether it is formally turning indeterminate or not.Anallise
@Lundin: Hmm, that's not how I interpret it. I don't see allocated storage as a special case, it's simply described in a separate section for convenience. However, if your interpretation is correct, we could simply rewrite both my examples to use pointers to automatic objects that have died...Inshrine
@OliCharlesworth It far from obvious how to interpret it. After a second reading of C11 6.2.4 I found that the chapter defines the lifetime for static and automatic objects (and for thread storage in C11), but not for "allocated" ones. Yet in C11 7.22.3, there is a sentence stating: The lifetime of an allocated object extends from the allocation until the deallocation. That line seems to go well together with the text you cited from 6.2.4.Anallise
@Lundin: It's when the object has reached the end of its lifetime, not (necessarily) the end of it's scope. (Scope is a region of program text over which an identifier is visible.)Lasseter
@OliverCharlesworth can I suggest changing this to a C question? Since C and C++ are considerably different in this area , this question would get confusing if C++ answers were added. There could be a different thread made for the C++ version. (The existing C++ answer that has been posted actually doesn't answer the question at all)Faraday
@MattMcNabb: Sure, if you like. The C++ part of the question was only ever added as an addendum...Inshrine
@MattMcNabb Jonathan Wakely already mentioned DR 1438. Non-dereference use of invalid pointers: "The current Standard says that any use of an invalid pointer value produces undefined behavior (3.7.4.2 [basic.stc.dynamic.deallocation] paragraph 4). This includes not only dereferencing the pointer but even just fetching its value." Nothing to add here.Eleph
@Eleph C++ doesn't clearly define what an invalid pointer is ; the amount of discussion generated on this question suggests that it is not so simpleFaraday
D
31

Example 2 is invalid. The analysis in your question is correct.

Example 1 is valid. A structure type never holds a trap representation, even if one of its members does. This means that structure assignment, on a system where trap representations would cause problems, must be implemented as a bytewise copy, rather than a member-by-member copy.

6.2.6 Representations of types

6.2.6.1 General

6 [...] The value of a structure or union object is never a t rap representation, even though the value of a member of the structure or union object may be a trap representation.

Drop answered 10/6, 2013 at 13:40 Comment(16)
Ah, that's interesting. I hadn't noticed that clause. Thanks!Inshrine
Since the issue isn't trap representations but indeterminate values, I don't think the issue is resolved by the cited text. Per J.2 (albeit non-normative), UB results if "The value of an object with automatic storage duration is used while it is indeterminate (6.2.4, 6.7.8, 6.8)." However, perhaps in this case it is the value of the member, not the value of the structure, that is indeterminate, in which case the value of the object with indeterminate value is not used.Fayth
@R. J.2 is outdated. The normative text (of C99, anyway) only disallows reading objects that hold trap representations. If they are indeterminate but cannot hold trap representations, reading is allowed. This is important for, for example, unsigned char too.Drop
@R.. There's DR 338 that is supposed to tighten the rules somewhat again, but I don't see it in a draft of C11 (perhaps it was included after the last public draft), so I'm not sure how that affects my answer here.Drop
@hvd: That seems more like how a standard should be written, though I wish writers would further specify that the existence of trap representations or things that behave like them must be implementation-defined, though the consequences need not be.Flooded
@Flooded Any implementation where malloc can succeed for a size of at least 2 must have trap representations: add one byte to malloc's result, and you get a pointer that is not allowed to compare equal to any pointer value that was valid just before malloc was called. Because of that, before malloc was called, that representation was a trap representation.Drop
@hvd: The uses I have seen for the term "trap representation" imply a value which, when read as an rvalue, will disrupt normal program flow in some fashion which would hopefully be recognizable as a trap, but whose particulars are beyond the scope of the C standard. Basically, what I would like to see would be for the Standard to say that an implementation should have to specify under what cases the statement p=q; (given unaliased variables p and q of the same type (any type)) might do anything other than make p hold a value which is at least as well defined as what's in q.Flooded
@Flooded The standard has a very specific definition for a trap representation: it's a representation that doesn't represent a value. C99 6.2.6.1p5: "Certain object representations need not represent a value of the object type. [...] Such a representation is called a trap representation." You mean something else by it. Anyway, as of C11, reading indeterminate values is mostly undefined again, even if the type has no trap representations, so it wouldn't get you much.Drop
@hvd: My point is that there are a lot of cases where it would be acceptable for code which is given invalid data to interpret it as arbitrary gibberish, and in some such cases would also be acceptable for the program to trap in recognizable fashion, but where adherence to the laws of causality is required in any case. The reason that reading a trap representation was defined as Undefined Behavior, rather than merely yielding an unspecified value was to allow for the possibility that such accesses might disrupt program flow in ways outside the scope of the Standard.Flooded
@Flooded Analyzability may be of interest for that (but not for your earlier comments). As of C11, an implementation can define __STDC_ANALYZABLE__ to indicate that the effects of undefined behaviour are limited, except for critical undefined behaviour. And reading trap representations is not critical undefined behaviour: if __STDC_ANALYZABLE__ is defined, it may cause the program to abort, but it may not completely corrupt the execution of the program.Drop
@hvd: Thanks a million for that; I wonder why I've not seen it mentioned anywhere before? If code can safely use a constraint handler to longjmp back to sanity, that's a major help to many optimizations. IMHO, having a program require analyzability would seem like it could in many cases enable much more useful optimizations than would be enabled by letting compilers go crazy. Being able to specify non-trapping could help in a few more cases, e.g. uint32_t x,y,z; ... x=y*z; could fail on systems where int is 33-64 bits, but disabling traps would make sane implementations "just work".Flooded
@Flooded I don't know if any implementations support it, and that may be why I've only rarely seen it mentioned either. As for x=y*z;, I've suggested x=1U*y*z; in the past if an implementation is found where uint32_t exists and promotes to a signed type. Yeah, it's ugly, and it really shouldn't be necessary, but if you want to support common compilers like GCC (known to optimise aggressively), you will end up needing something like that anyway.Drop
@hvd: Too bad people are working harder to break analyzability than support it. A couple abilities analyzability still doesn't seem to provide, but most implementations could in practice provide if trapping were bypassed would be (1) determine whether realloc has moved an allocation (in general, comparisons between live and dead pointers cannot be expected to be meaningful, but in this case it should) (2) given two pointers which have not been modified since they pointed to the same live object, report the what displacement was (in units of char*) when the object was alive.Flooded
@hvd: The above operations should not access unowned memory, and while ideally all operations which would produce an invalid pointer without special "permission" would be trapped, neither operation produces any kind of pointer. As such, even though they involve dead pointers, it should be possible for any platform to perform them safely by, at worst, using memcpy to copy the pointers to a suitably-sized char[], shuffling any bits as required to yield an integer that can be used for the comparison or subtraction (a library macro could exploit UB to do such things faster, though).Flooded
@R..GitHubSTOPHELPINGICE: The language about structures not being trap representations dates back to C89, where the term "Indeterminate value" was defined as "Either a valid value or a trap representation"; the only means by which use of an indeterminate value could invoke UB was if the value in question might happen to be a trap representation--something that was specified as impossible for structures. C99 deliberately added the possibility that non-addressable objects holding indeterminate values of scalar types might not behave as values in range of their types, but nothing in the...Flooded
...rationale suggests any intention of changing the behavior of structures beyond perhaps allowing for the possibility that copying a structure where some members are Indeterminate, may leave those members of the copy Indeterminate under the expanded definition of the term.Flooded
F
15

My interpretation is that while only non-character types can have trap representations, any type can have indeterminate value, and that accessing an object with indeterminate value in any way invokes undefined behavior. The most infamous example might be OpenSSL's invalid use of uninitialized objects as a random seed.

So, the answer to your question would be: never.

By the way, an interesting consequence of not just the pointed-to object but the pointer itself being indeterminate after free or realloc is that this idiom invokes undefined behavior:

void *tmp = realloc(ptr, newsize);
if (tmp != ptr) {
    /* ... */
}
Fayth answered 10/6, 2013 at 13:33 Comment(7)
Re "accessing an object ..."; there is a footnote in the standard which I didn't quote above: "Thus, an automatic variable can be initialized to a trap representation without causing undefined behavior, but the value of the variable cannot be used until a proper value is stored in it." It sounds like writing to such an object is acceptable.Inshrine
@OliCharlesworth, of course it is. Otherwise how can you do something like: free(x); x = NULL;?Spermato
@Shahbaz: Indeed! I'm just having trouble parsing the standard in such a way that it allows this kind of thing ;)Inshrine
@OliCharlesworth, I think the part that says: If the stored value of an object has such a representation and is read by an lvalue expression..., shows that it can be written to, but not read from.Spermato
void *tmp = realloc(ptr, newsize); << if realloc does fail, then tmp is valid (NULL) and ptr remains valid as well. This is not UB when tmp==NULL.Indign
@jimmcnamara: Of course. But it's UB in the success case, which was the point.Fayth
The Standard explicitly guarantees that structures will never have trap representations. I would be hard-pressed to identify any case where that would be meaningful if copying a struct whose value was at least partially indeterminate would have any effect beyond producing a copy whose value might likewise be partially indeterminate.Flooded
F
0

Saying that the pointer value becomes indeterminate, even if nothing disturbs the bits representing it, is likely an effort to accommodate the "as-if" rule. If there is some sequence of actions whose behavior might be observably affected by a useful optimizing transform, the as-if rule requires that at least one action within that sequence be characterized as invoking Undefined Behavior that would justify any observable quirks stemming from the optimization.

Consider the following function:

void test(int *p1, uint64_t ofs)
{
  int ret;
  int *p2 = malloc(sizeof (int));
  if ((uintptr_t)p1 == (uintptr_t)p2+ofs)
  {
    *p2 = 1;
    *p1 = 2;
    doSomething(*p2);
  }
  free(p2);
  return p2;
}

In most cases where the function might be invoked, replacing the call to doSomething(*p2) with doSomething(2) would improve performance without affecting behavior except in scenarios where p1 is a pointer to a dead region of storage whose address happens to coincide with the address of the new region returned from malloc(). Treating p1 as becoming indeterminate when the storage identified thereby would become eligible for reuse by malloc() would allow a compiler to ignore the possibility that the address might be found to match the address of some future allocation.

Flooded answered 21/11, 2023 at 22:21 Comment(0)
E
-1

C++ discussion

Short answer: In C++, there is no such thing as accessing "reading" a class instance; you can only "read" non-class object, and this is done by a lvalue-to-rvalue conversion.

Detailed answer:

typedef struct { int *p; } T;

T designates an unnamed class. For the sake of the discussion let's name this class T:

struct T {
    int *p; 
};

Because you did not declare a copy constructor, the compiler implicitly declares one, so the class definition reads:

struct T {
    int *p; 
    T (const T&);
};

So we have:

T a;
T b = a;    // Access through a non-character type?

Yes, indeed; this is initialization by copy constructor, so the copy constructor definition will be generated by the compiler; the definition is equivalent with

inline T::T (const T& rhs) 
    : p(rhs.p) {
}

So you are accessing the value as a pointer, not a bunch of bytes.

If the pointer value is invalid (not initialized, freed), the behavior is not defined.

Eleph answered 16/6, 2013 at 4:23 Comment(7)
Actually an lvalue to rvalue conversion can be done for class lvalues too. The context is when passing a class lvalue through the ellipsis in a function call.Levenson
@JohannesSchaub-litb Yes you can. [conv.lval]"Otherwise, if the glvalue has a class type, the conversion copy-initializes a temporary of type T from the glvalue and the result of the conversion is a prvalue for the temporary" So this conversion is defined in term of the ctor, and we go back to accessing the each member one-by-one, with lvalue-to-rvalue conversion for each one.Eleph
that is correct. At least as far as nonunion class objects are concerned. Unions are copied "bitwise".Levenson
This all has nothing to do with the question except for the last sentence ... which you give no justification for.Faraday
@MattMcNabb Hug? This has everything to do with the question... I don't know what you are trying to say.Eleph
The examples in the question are about using a pointer after the space it points to has been freed . In your code you copy an uninitialized pointer, which is different. Also, all the stuff about the class is irrelevant, you could equally well have written int *a; int *b = a;Faraday
"all the stuff about the class is irrelevant" all the stuff about the class relates to "Example 1" in the question!Eleph

© 2022 - 2024 — McMap. All rights reserved.