How can deleting a void pointer do anything other than invoke the global delete operator?
Asked Answered
M

5

6

The C++ standard very clearly and explicitly states that using delete or delete[] on a void-pointer is undefined behavior, as quoted in this answer:

This implies that an object cannot be deleted using a pointer of type void* because there are no objects of type void.

However, as I understand it, delete and delete[] do just two things:

  • Call the appropriate destructor(s)
  • Invoke the appropriate operator delete function, typically the global one

There is a single-argument operator delete (as well as operator delete[]), and that single argument is void* ptr.

So, when the compiler encounters a delete-expression with a void* operand, it of course could maliciously do some completely unrelated operation, or simply output no code for that expression. Better yet, it could emit a diagnostic message and refuse to compile, though the versions of MSVS, Clang, and GCC I've tested don't do this. (The latter two emit a warning with -Wall; MSVS with /W3 does not.)

But there's really only one sensible way to deal with each of the above steps in the delete operation:

  • void* specifies no destructor, so no destructors are invoked.
  • void is not a type and therefore cannot have a specific corresponding operator delete, so the global operator delete (or the [] version) must be invoked. Since the argument to the function is void*, no type conversion is necessary, and the operator function must behavior correctly.

So, can common compiler implementations (which, presumably, are not malicious, or else we could not even trust them to adhere to the standard anyway) be relied on to follow the above steps (freeing memory without invoking destructors) when encountering such delete expressions? If not, why not? If so, is it safe to use delete this way when the actual type of the data has no destructors (e.g. it's an array of primitives, like long[64])?

Can the global delete operator, void operator delete(void* ptr) (and the corresponding array version), be safely invoked directly for void* data (assuming, again, that no destructors ought to be called)?

Mohun answered 1/8, 2018 at 0:5 Comment(17)
I wouldn't take "this answer", which I wrote a long time ago, as normative.Beetlebrowed
@NeilButterworth Well, it does quote the standard, does it not? Are you implying that a more recent standards might have changed the status of this operation?Mohun
Yes, it's entirely possible. I don't think it has changed, but I no longer track the standard.Beetlebrowed
Sure, why not? The language specification does not impose any requirements (that’s what “undefined behavior” means), so go ahead and guess what your implementation might do. What could go wrong?Colossian
The standard says it is UB. UB can't happen in conforming code. The optimized can take advantage of this to remove the entire code path that contains UB. See examples: en.cppreference.com/w/cpp/language/ub and blog.llvm.org/2011/05/what-every-c-programmer-should-know.htmlKirsti
@RichardCritten And you wouldn't consider that to be malicious compliance?Mohun
No. Writing an optimized is hard. Writing an optimised that does something sensible with undiagnosed invalid code (definition of UB) is impossible.Kirsti
@PeteBecker To counter snark with snark: was anything safe in C++ before 1998? More to the point, has there ever been, or do you think there ever will be, a perfectly conforming implementation of the standard? Less snarkily: I am not asking whether this is a good idea. But it's still worth knowing; in my case, I am working with legacy code that uses this antipattern, and need to know how urgent it is that we fix it.Mohun
The question seems to boil down to "Can I trust this particular compiler to know what I meant?" . And I'm not sure how anyone can really help with that.Albertinaalbertine
@M.M. I did not mention a particular compiler (except in my edit, which is tangential). And I don't think the question has anything to do with what's "meant" in a mind-reading sense. When I say I can't think of any non-malicious way for a compiler to generate code from such an expression that would be unsafe, I mean that literally.Mohun
fwiw, i worked with this anti-pattern (against my will) for several years using MSVC. On Windows CE devices I fought heap corruption issues the entire time, while the desktop client seemed to work fine. Of course, that won't tell you much since our CE OS was customized by raving lunatics and the general quality of the code was so-so. I ended up embedding checksums in the larger structures :( Awful.Bracket
@KyleStrand — “working with legacy code” makes this an entirely different question. The question is not whether this is theoretically sound; it’s how can you best ensure that your application will work correctly. You have two choices: fix it, or write a bunch of tests and cross your fingers for luck. The latter is scary, but may be necessary. Good luck!Colossian
I made a further comment in the chat.Corpora
@RichardCritten: Writing an optimizer that can handle code written in dialects that supplement the behaviors defined by the Standard with those which are commonly used in fields like embedded and systems programming isn't particularly difficult if one makes any bona fide effort whatsoever to do so.Comber
@M.M: No mind reading is required. If one part of the Standard or an implementation's documentation describes the behavior of a piece of code, but it falls in a general category of actions which another part of the Standard says is undefined, quality implementations targeting a particular platform and field should give precedence to the part that treats it as defined in cases where doing so is likely to be useful and practical for code targeting that platform and field.Comber
@Comber Thank you for that last comment--it really gets at the heart of why I asked this. It's not a logical inconsistency in the standard per se, but it really seems like (at least for arrays, where allocation-size metadata is required) the requirements that are imposed by the standard would make it trickier not to provide a reasonable behavior than to just do the deallocation.Mohun
@KyleStrand: The authors of C89 and all C or C++ standards since have regarded as equivalent actions whose behavior is actually defined, and actions whose behavior obviously should be defined but actually isn't. In all of them, for example, struct S {int i;} s; s.i=1; violates a runtime constraint since it uses an lvalue or glvalue of type int to access a struct S, but such behavior would be sufficiently obviously absurd that even the authors of gcc and clang would recognize it as stupid.Comber
S
3

A void* is a pointer to an object of unknown type. If you do not know the type of something, you cannot possibly know how that something is to be destroyed. So I would argue that, no, there is not "really only one sensible way to deal with such a delete operation". The only sensible way to deal with such a delete operation, is to not deal with it. Because there is simply no way you could possibly deal with it correctly.

Therefore, as the original answer you linked to said: deleting a void* is undefined behavior ([expr.delete] §2). The footnote mentioned in that answer remains essentially unchanged to this day. I'm honestly a bit astonished that this is simply specified as undefined behavior rather than making it ill-formed, since I cannot think of any situation in which this could not be detected at compile time.

Note that, starting with C++14, a new expression does not necessarily imply a call to an allocation function. And neither does a delete expression necessarily imply a call to a deallocation function. The compiler may call an allocation function to obtain storage for an object created with a new expression. In some cases, the compiler is allowed to omit such a call and use storage allocated in other ways. This, e.g., enables the compiler to sometimes pack multiple objects created with new into one allocation.

Is it safe to call the global deallocation function on a void* instead of using a delete expression? Only if the storage was allocated with the corresponding global allocation function. In general, you can't know that for sure unless you called the allocation function yourself. If you got your pointer from a new expression, you generally don't know if that pointer would even be a valid argument to a deallocation function, since it may not even point to storage obtained from calling an allocation function. Note that knowing which allocation function must've been used by a new expression is basically equivalent to knowing the dynamic type of whatever your void* points to. And if you knew that, you could also just static_cast<> to the actual type and delete it…

Is it safe to deallocate the storage of an object with trivial destructor without explicitly calling the destructor first? Based on, [basic.life] §1.4, I would say yes. Note that, if that object is an array, you might still have to call the destructors of any array elements first. Unless they are also trivial.

Can you rely on common compiler implementations to produce the behavior you deem reasonable? No. Having a formal definition of what exactly you can rely on is literally the whole point of having a standard in the first place. Assuming you have a standard-conforming implementation, you can rely on the guarantees the standard gives you. You can also rely on any additional guarantees the documentation of a particular compiler may give you, so long as you use that particular version of that particular compiler to compile your code. Beyond that, all bets are off…

Sylvia answered 1/8, 2018 at 1:41 Comment(14)
You might add to your first and second sentences, "From the point of view of the compiler" or the like, to avoid the impression you are misunderstanding the OP's stipulations. I think this is an excellent answer. We understand this is a subset of void *. The compiler does not.Bracket
"Calling no destructor would be just as good as calling any random destructor" is a stretch. I struggle to find any case where I'd rather call a random destructor over no destructor.Secularism
Second, when considering whether it's safe to call the global deallocation function, it's useful to keep in mind that while you don't really know, the compiler doesn't really know, either. I haven't thought it through, but my intuition is that this determination is uncomputable, and the optimization benefits of knowing are negligible, so it's extremely unlikely that the compiler will care. Of course, it's important that you don't screw up, but most environments are controlled enough that there aren't that many deallocation functions to choose from.Secularism
" a bit astonished that this is simply specified as undefined behavior rather than making it ill-formed" - a reasonable guess would be because Undefined Behavior does not require a diagnostic, and as this question points out there is a reasonable behavior.Gebelein
It would not be difficult for a compiler's new operator to include within the allocation information about what destructors, if any, the object has, and for delete to make use of that information regardless of the pointer type fed to it. I would not be at all surprised if some compilers actually do that. If some compilers support a construct in useful fashion but others don't the Standard will usually allow compilers to support the behavior or not at their leisure, hopefully based upon what will benefit their customers.Comber
I like most of this answer, but I completely agree with @zneak's criticisms, and I still hold by my assertion in my question (and supercat's comment and answer) that the "reasonable" behavior would in no way be "magical".Mohun
The statement that calling no destructor would be "just as good as calling any random one" was meant to be hyperbole. I just wanted to emphasize that what has simply been declared as "the only reasonable behavior" is not necessarily all that reasonable once you stop to think about it. The compiler cannot know which destructor or which deallocation function to call. Not calling a destructor and calling a likely wrong deallocation function is not something I would agree to call a reasonable choice. But I can see why someone might take issue with that statement, so I removed those sentences.Sylvia
@Secularism As has already been pointed out by supercat, the compiler could simply store information about which destructor and which deallocation function to call for each allocation, so it doesn't have to be uncomputable. Doing so would, however, certainly go against the "you don't pay for what you don't use" philosophy. Concerning the "controlled environment", consider that C++ supports user-defined allocation/deallocation functions, so the set of deallocation functions is potentially arbitrarily large…Sylvia
@KyleStrand calling it "magical" was indeed unnecessary. I got carried away. I removed that bit…Sylvia
@MichaelKenzel, in this context, "uncomputable" means that the compiler cannot make that analysis, and that therefore it can't perform optimizations based on it. The compiler can defer verification work to the runtime, where every value is concretized, but if it did that, then we wouldn't have this problem in the first place: you'd always get the right destructor and the right deallocator.Secularism
Thanks. I think the answer is much better now.Mohun
....though I would quibble with how much of a problem it would be in practice to just assume that the allocated memory came from the default allocator. As far as I know, having multiple allocators in a program is quite niche.Mohun
@Secularism ok, I misunderstood that, I thought you meant uncomputable in general. Just for the compiler, I would also think it's uncomputable.Sylvia
@MichaelKenzel: The cost need not be very great. A single pointer attached to the allocation would easily suffice, and in some memory-management implementations the cost could be zero for types without destructors.Comber
M
1

If you want to invoke the deallocation function, then just call the deallocation function.

This is good:

void* p = ::operator new(size);

::operator delete(p);  // only requires that p was returned by ::operator new()

This is not:

void* p = new long(42);

delete p;  // forbidden: static and dynamic type of *p do not match, and static type is not polymorphic

But note, this also is not safe:

void* p = new long[42];

::operator delete(p); // p was not obtained from allocator ::operator new()
Mervinmerwin answered 1/8, 2018 at 0:31 Comment(5)
Why is the last bit of code not safe? Also, I'd really like a concrete explanation or example of how undesirable behavior could actually be triggered in practice.Mohun
@KyleStrand - Cell-based allocation could trigger undesirable behavior pretty easily if you decided not to bother storing the allocation size. (You lookup the cell size based on the object size, and that's how you know what allocation table to go to). It's a little bit of a stretch, for sure..but I've seen some fairly bizarre behavior from memory allocation routines. (Granted, none in commercial compilers)Bracket
@KyleStrand: Array new can put metadata (Standardese: supplemental information) at the beginning of the allocation, in front of the content. Then the operator delete[](void*) call needs the address of the allocation, and passing the address of the content, which is different, will fail.Mervinmerwin
@KyleStrand: Even when not dealing with an array, there is potentially trouble, because the Standard allows new long(42) to invoke either of two allocators, with or without an extra alignment argument, and the deallocator is required to match.Mervinmerwin
Here's the quote from the Standard which is important: " When a delete-expression is executed, the selected deallocation function shall be called with the address of the most-derived object in the delete object case, or the address of the object suitably adjusted for the array allocation overhead (8.3.4) in the delete array case, as its first argument."Mervinmerwin
C
1

While the Standard would allow an implementation to use the type passed to delete to decide how to clean up the object in question, it does not require that implementations do so. The Standard would also allow an alternative (and arguably superior) approach based on having the memory-allocating new store cleanup information in the space immediately preceding the returned address, and having delete implemented as a call to something like:

typedef void(*__cleanup_function)(void*);
void __delete(void*p)
{
  *(((__cleanup_function*)p)[-1])(p);
}

In most cases, the cost of implementing new/delete in such fashion would be relatively trivial, and the approach would offer some semantic benefit. The only significant downside of such an approach is that it would require that implementations that document the inner workings of their new/delete implementation, and whose implementations can't support a type-agnostic delete, would have to break any code that relies upon their documented inner workings.

Note that if passing a void* to delete were a constraint violation, that would forbid implementations from providing a type-agnostic delete even if they would be easily capable of doing so, and even if some code written for them would relies upon such ability. The fact that code relies upon such an ability would make it portable only to implementations that can provide it, of course, but allowing implementations to support such abilities if they choose to do so is more useful than making it a constraint violation.

Personally, I would have liked to see the Standard offer implementations two specific choices:

  1. Allow passing a void* to delete and delete the object using whatever type had been passed to new, and define a macro indicating support for such a construct.

  2. Issue a diagnostic if a void* is passed to delete, and define a macro indicating it does not support such a construct.

Programmers whose implementations supported type-agnostic delete could then decide whether the benefit they could receive from such feature would justify the portability limitations imposed by using it, and implementers could decide whether the benefits of supporting a wider range of programs would be sufficient to justify the small cost of supporting the feature.

Comber answered 1/8, 2018 at 17:26 Comment(4)
I think storing cleanup information in an allocation is, in general, not possible for a conforming implementation. [expr.new] §11 and §12 are, unfortunately, quite specific when it comes to the size of allocations made by a new expression. Arrays are basically the only exception where the compiler is allowed to request additional storage to what would be needed just to hold the created objects.Sylvia
@MichaelKenzel: If a user-supplied allocation function has not been installed, an implementation would so far as I can tell be free to allocate things as it sees fit (arguably it would be free to do so even when such a function has been called, though quality implementations should use typically call a user-provided function in preference to requesting their own heap allocation) . It would seem, though, that the intend of the Standard is to forbid what had been previously been a useful approach.Comber
Note that these guarantees concerning the size of allocations made by new were already present in the C++03 standard, so that's not really a new addition. I'm not aware of any implementations that would actually have been doing anything like this "previously". If there were, I would argue that they could only have been doing so in violation of the standard. An implementation could always simply keep its own datastructures like, e.g., a map to track cleanup info for all active allocations. That would introduce a quite significant overhead of course…Sylvia
@MichaelKenzel: Many implementations of C++ predated the first published Standard. Implementing delete by using information stored by new for all types would not have been difficult, and pre-standard implementations did quite a few interesting things that have since fallen by the wayside.Comber
M
0

void* specifies no destructor, so no destructors are invoked.

That is most likely one of the reasons it's not permitted. Deallocating the memory that backs a class instance without calling the destructor for said class is just all around a really really bad idea.

Suppose, for example, the class contains a std::map that has a few hundred thousand elements in it. That represents a significant amount of memory. Doing what you're proposing would leak all of that memory.

Mildamilde answered 1/8, 2018 at 0:33 Comment(2)
My question explicitly specifies that I'm only interested in the case where no destructors would be involved even in the correct delete expression. This means no non-POD classes.Mohun
Note, though, that you're correct; this is indeed the stated rationale for that footnote in the standard (and for the GCC and Clang warnings).Mohun
G
0

A void doesn't have a size, so the compiler has no way of knowing how much memory to deallocate.

How should the compiler handle the following?

struct s
{
    int arr[100];
};

void* p1 = new int;
void* p2 = new s;
delete p1;
delete p2;
Garden answered 1/8, 2018 at 2:46 Comment(3)
As I noted in my question, the deallocator function (operator delete) takes void*, so the size data is stored in memory at runtime rather than inferred from the type system.Mohun
@KyleStrand then why does the standard require both delete and delete[]? Surely if there's runtime information recorded then the difference is redundant.Garden
I suppose I can imagine an implementation relying on type information to delete single items, but I'm not sure how it would handle a pointer to a base class, which can legally be used to delete an instance of a derived class.Mohun

© 2022 - 2024 — McMap. All rights reserved.