Understand C++ pointer lifetime / zombie pointers
Asked Answered
B

3

5

After watching CppCons Will Your Code Survive the Attack of the Zombie Pointers? I'm a bit confused about pointer lifetime and need some clarification.

First some basic understanding. Please correct me if any comments are wrong:

int* p = new int(1);
int* q = p;
// 1) p and q are valid and one can do *p, *q
delete q;
// 2) both are invalid and cannot be dereferenced. Also the value of both is unspecified
q = new int(42); // assume it returns the same memory
// 3) Both pointers are valid again and can be dereferenced

I'm puzzled by 2). Obviously they cannot be dereferenced, but why can't their values be used (e.g. to compare one against another, even unrelated and valid pointer?) This is stated in around 25:38. I can't find anything about this on cppreference, which is where I got 3) from.

Note: The assumption, that the same memory is returned, cannot be generalized as it may or may not happen. For this example it should be taken as granted that it "randomly" returned the same memory as is the case in the video example and (maybe?) required for the below code to break.

The multi-threaded example code from the LIFO list can be put simulated in a single thread as:

Node* top = ... //NodeA allocated before;
Node* newnodeC = new Node(v); 
newnodeC->next = top;
delete top; top = nullptr;
// newnodeC->next is a zombie pointer
Node* newnodeD = new Node(u); // assume same memory as NodeA is returned
top = newnodeD;
if(top == newnodeC->next) // true
  top = newnodeC;
// Now top->next is (still) a zombie pointer

This should be valid, unless Node contains nonstatic const members or references according to the rules under

If a new object is created at the address that was occupied by another object, then all pointers, references, and the name of the original object will automatically refer to the new object and, once the lifetime of the new object begins, can be used to manipulate the new object, but only if the following conditions are satisfied [which they are]

So why is this a zombie pointer and supposedly UB?

Could the (single-threaded) condensed code above be fixed (in case there are const-members) by a newnodeC->next = std::launder(newnodeC->next) as of

If the conditions listed above are not met, a valid pointer to the new object may still be obtained by applying the pointer optimization barrier std::launder

I'd expect this to fix the "zombie pointer" and compilers to not emit instructions for the assignment but simply treat it as an optimization barrier (e.g. when the function is inlined and a const-member is accessed again)

So in summary: I haven't heard of "zombie pointers" before. Am I correct that any pointer to a destroyed/deleted object cannot be used (for reading [the pointers value] and dereferencing [read of the pointee]) unless the pointer is reassigned or the memory reallocated with the same object type recreated there (and without const/reference members)? Can't this be fixed by C++17 std::launder already? (baring multi-threaded issues)

Also: At 3) in the first code would if(p==q) even be generally valid? Because from my understanding of the (second part of the) video it is not valid to read p.

Edit: As an explanation where I'm pretty sure UB happens: Again assume that by pure chance the same memory is returned with the new:

// Global
struct Node{
  const int value;
};
Node* globalPtr = nullptr;
// In some func
Node* ptr = new Node{42};
globalPtr = ptr;
const int value = ptr->value;
foo(value);
// Possibly on another thread (if multi-threaded assume proper synchronisation so that this scenario happens)
delete globalPtr;
globalPtr = new Node{1337}; // Assume same memory returned
// First thread again (and maybe on serial code too)
if(ptr == globalPtr)
  foo(ptr->value);
else
  foo(globalPtr->value);

According to the video, after delete globalPtr also the ptr is a "zombie pointer" and cannot be used (aka "would be UB"). A sufficiently optimizing compiler can make use of this and assume the pointee was never freed (especially when the delete/new happens on another functon/thread/...) and optimize foo(ptr->value) to foo(42)

Note also the mentioned Defect Report 260:

Where a pointer value becomes indeterminate because the object pointed to has reached the end of its lifetime, all objects whose effective type is a pointer and that point to the same object acquire an indeterminate value. Thus p at point X, and p, q, and r at point Z, can all change their value.

I think this is the definitive explanation: After delete globalPtr the value of ptr is also indeterminate. But how can this align with

If a new object is created at the address that was occupied by another object, then all pointers [...] of the original object will automatically refer to the new object and[...] can be used to manipulate the new object

Biogenesis answered 16/10, 2019 at 13:17 Comment(13)
"assume it returns the same memory" - You could add: "3) if(p==q) Both pointers are valid again and can be dereferenced, right?"Fluctuate
The assumption in the video example was, that the same pointer value is returned aka the same memory so the if succeeds. Can easily happen for allocators that simply look if they can serve a request with recently freed memory chunks. On if(p==q): The point is: Can I do this? The object pointed to by p was deleted, so p is a "zombie pointer". I really recommend (at least) the first part of the video. Quite entertaining :)Biogenesis
(3) is wrong. It makes q dereferencable, but not p.Burschenschaft
Why? A "new object is created at [that] address[...] so all pointers [...] of the original object will automatically refer to the new object", see link and quote above.Biogenesis
@Biogenesis Because new returns the address of the newly created object, which is not guaranteed to be the same as the previous deleted one. So after your second new, p and q cannot be considered to point at the same location in memory. See my answer for more explanation.Pointsman
@HolyBlackCat, I believe that your claim that "3 is wrong" is not true because Flamefire asked us to imagine that, by fluke, the allocation that is assigned to q happens by chance to return the same virtual address that is already stored in p. This can certainly happen. If it does, the address stored in p will therefore become a valid virtual memory address again (will point to that newly allocated object) which holds the value 42, and so will q.Mckie
When you say cannot be dereferenced, it means that you might get old value (if nothing has happened to that memory yet, or you might get an unexpected value if the memory has been reallocated for another purpose, or you might cause an access violation if the paging mechanism has unmapped the physical memory associated with that virtual address. You can certainly dereference the pointer though at great peril - one of the three outcomes I described will occur. Writing to this memory is very dangerous and will likely corrupt the data that resides there and cause unexpected behaviour.Mckie
@Mckie I admit I didn't notice the "assume it returns the same memory" part when posting that, but I think my claim still holds (not necessary in practice, but at least formally). When memory is deallocated, values of all pointers pointing to it become invalid ( #5002555 ). I doubt they can ever become valid again, unless you assign new values to them.Burschenschaft
So, if(std::memcmp(&p, &q, sizeof(void*)) == 0) would be a safe version of if(p==q)? Still unclear if p is valid if that condition is true.Fluctuate
@TedLyngmo as asked, I've undeleted my answer.Pointsman
@Pointsman Aha, ok... In the first comment I suggested that OP should make it clearer. I had a feeling it could be misinterpreted.Fluctuate
// assume it returns the same memory How are you gonna define what does «returns the same memory» mean? How are you gonna check it?Minnesinger
By running a debugger and inspecting the return value / value of the pointer which is a defined value at this point. Or by interpreting the code as "portable assembly". Or by using a custom allocation function which literally returns the same memory (e.g. by reusing the last deleted memory chunk). I didn't want to discuss how or why, hence I used "assume" and I think to most it was clear what that meant.Biogenesis
B
4

A deleted pointer value has an invalid pointer value, not an unspecified value.

As of C++17 the behaviour of invalid pointer values is defined in [basic.stc]/4:

Indirection through an invalid pointer value and passing an invalid pointer value to a deallocation function have undefined behavior. Any other use of an invalid pointer value has implementation-defined behavior.

So, trying to compare two invalid pointers has implementation-defined behaviour. There is a footnote clarifying that this behaviour could include a runtime fault.

In your first snippet p has an invalid pointer value after the delete; anything you might do to q has no bearing on this. Invalid pointer values cannot magically become valid again. There is no way (within the confines of Standard C++) to determine whether a new allocation is at the "same location" as a previous allocation.

std::launder is no help; again you use an invalid pointer value and therefore trigger implementaiton-defined behaviour.

Perhaps you could consult the documentation for your implementation and see what it defines this behaviour as.

In your question you mention C DR 260, however that is irrelevant. C is a different language to C++. In C++, deleted pointers have invalid pointer value, not indeterminate value.

Belleslettres answered 24/10, 2019 at 2:39 Comment(4)
Ok so the correct wording is "invalid pointer value" not "unspecified value". However as any use is either UB or implementation-defined this difference does not really matter. (Or is there anywhere specified which value an "invalid pointer value" is?) There is no way to determine whether a new allocation is at the "same location" Ouch, this is insane. Now an operation on a copy of an object (a pointer) has side-effects on another copy of the object/value possibly making it unusable. Try to explain this to a beginner/intermediate programmer. This is like `a = b; b -= c; //a may be invalidBiogenesis
@Biogenesis "invalid pointer value" is the "which value". The difference between UB and implementation-defined behaviour is significant. The mysteries of C++ can be deep although perhaps not in this case; if I copy a file handle and close it, is it mysterious that the original handle is no longer useful? E.g. that original handle might match a new handle we get from opening a different file.Belleslettres
Imagine a system where the hardware automatically checks any pointer register against the MMU's list of active allocationsBelleslettres
It is easy to understand that an invalid handle/pointer cannot be dereferenced, but that its value cannot be used anymore is hard. The file handle analogy is great though. I cannot come up with a use case where you'd want to hold on to a file handle that might be closed by someone else even though it could be reopened and point to (another or the same) file. any pointer register against the MMU's list of active allocations well the allocation might be active again. It just contains something else.Biogenesis
P
1

I would disagree with this:

// 3) Both pointers are valid again and can be dereferenced

In fact, you got fooled because at the second new, the program usually reallocate the same memory block at it just became available, but one cannot rely on this (it is not guaranteed that the same memory block will be reused).

For example, if you use the good practice at setting deleted pointers to nullptr, the following program:

int main()
{
    int * p = new int(1);
    int * q = p;

    std::cout << (p == q) << std::endl;

    delete q;
    q = nullptr;
    p = nullptr;

    q = new int(42);

    std::cout << (p == q) << std::endl;

    delete q;

    return 0;
}

would result in:

1
0

Pointsman answered 16/10, 2019 at 13:38 Comment(3)
It's not clarified in the question, but in the comments it was made clear that the assumption is that new returns the same pointer in both cases.Deprecate
it is not guaranteed that the same memory block will be reused, right. It is an example which assumes that in this situation the same memory block is used, as is in the video. I'll update the question to make this clear.Biogenesis
@MaxLanghof Ah ok, my bad then, I indeed did not understand.Pointsman
M
1

Starting with the contrivance described in the question, an already allocated int* p and:

int* q = p;
q = new int(42); // assume it returns the same memory

We now have a scenario that's identical to this:

int* p = new int(42);
int* q = p;

Because it meets the preconditions that we can assume p and q point to the same memory location; that being the case the fact that this location was allocated something then deleted then allocated again doesn't matter. Nothing that happened before this point matters because we are assuming the two pointers are in a state identical to the one just described.

With regard to "the value of both is unspecified" in step #2, I would say the value of q is unspecified at that point because it was passed to delete, but the value of p is unchanged.

The behavior of delete here is actually not undefined under C++14, it's implementation defined; a bit from some documentation of delete:

Any use of a pointer that became invalid in this manner, even copying the pointer value into another variable, is undefined behavior. (until C++14)

Indirection through a pointer that became invalid in this manner and passing it to a deallocation function (double-delete) is undefined behavior. Any other use is implementation-defined. (since C++14)

https://en.cppreference.com/w/cpp/memory/new/operator_delete

So, to answer what I think is your question, in that circumstance, then no, neither on is a zombie pointer.

So why is this a zombie pointer and supposedly UB?

It's not, so what ever has lead you to that conclusion is a misunderstanding or misinformation.

Martz answered 16/10, 2019 at 14:22 Comment(5)
I think you assume to much that they are "relying" on any feature. The question is merely "what if it happens to returns the same virtual address". This was articulated as "assume it returns the same memory" - meaning "for the purposes of this discussion, please consider what happens in this case". The poster recognizes explicitly that this doesn't mean we can generally assume that the same virtual address will be returned. It's like saying "flip a coin and assume it's heads". We're asked to explore the heads case, not to assume that the coin will always turn up heads.Mckie
Right, that (the heads case) is the second half of the answer here, which asserts it doesn't matter how two such pointers came to be because they are identical to something very normative.Martz
> It's not, so what ever has lead you to that conclusion is a misunderstanding or misinformation. This stems from the linked video, e.g. first part. Allocation happens to return the same pointer so a copy of the pointer that was gotten before will still be used and is called a "zombie pointer" because its pointee was deleted. I think this IS in fact UB for the case of constant members of the pointee: The compiler may "cache" the value of the const member, but a newly allocated instance might have another value there. I'll put that into the questionBiogenesis
Your first snippet does not make sense as p is not defined. Also you talk about p2 but do not define that anywhere.Belleslettres
"Starting with a contrivance..." and p and q are from the question as originally (and currently) written. The contrivance from the question is that "the compiler did it". It doesn't matter whether they got that way according to clear laws of the language (as in the second case above) or through happenstance as asserted in the question. They are two pointers of the same type that point to the same memory location. I suppose then referring to them as p and p2 is a bit confusing, I'll change that (thanks).Martz

© 2022 - 2024 — McMap. All rights reserved.