Why doesn't GCC optimize out deletion of null pointers in C++?

Asked 15/8, 2017 at 9:1 Answered 15/8, 2017 at 14:4

c++gcc null compiler-optimization delete-operator

Consider a simple program:

int main() {
  int* ptr = nullptr;
  delete ptr;
}

With GCC (7.2), there is a call instruction regarding to operator delete in the resulting program. With Clang and Intel compilers, there are no such instructions, the null pointer deletion is completely optimized out (-O2 in all cases). You can test here: https://godbolt.org/g/JmdoJi.

I wonder whether such an optimization can be somehow turned on with GCC? (My broader motivation stems from a problem of custom swap vs std::swap for movable types, where deletion of null pointers can represent a performance penalty in the second case; see https://mcmap.net/q/88980/-move-semantics-custom-swap-function-obsolete for details.)

UPDATE

To clarify my motivation for the question: If I use just delete ptr; without if (ptr) guard in a move assignment operator and a destructor of some class, then std::swap with objects of that class yields 3 call instructions with GCC. This might be a considerable performance penalty, e.g., when sorting an array of such objects.

Moreover, I can write if (ptr) delete ptr; everywhere, but wonder, whether this cannot be a performance penalty as well, since delete expression needs to check ptr as well. But, here, I guess, compilers will generate a single check only.

Also, I really like the possibility to call delete without the guard and it was a surprise for me, that it could yield different (performance) outcomes.

UPDATE

I just did a simple benchmark, namely sorting objects, which invoke delete in their move assignment operator and destructor. The source is here: https://godbolt.org/g/7zGUvo

Running times of std::sort measured with GCC 7.1 and -O2 flag on Xeon E2680v3:

There is a bug in the linked code, it compares pointers, not pointed values. Corrected results are as follows:

without if guard: ~~17.6 [s]~~ 40.8 [s],
with if guard: ~~10.6 [s]~~ 31.5 [s],
with if guard and custom swap: ~~10.4 [s]~~ 31.3 [s].

These results were absolutely consistent across many runs with minimal deviation. The performance difference between first two cases is significant and I wouldn't say that this is some "exceedingly rare corner case" like code.

Melonie answered 15/8, 2017 at 9:1 Comment(10)

I believe, creating optimizer-dependent code is a very-very bad idea. There is no guarantee that this behavior will not change in future versions of Clang/Intel compilers. – Wirework 15/8, 2017 at 9:8

@DmitriyKalugin-Balashov It's not about optimizer-dependent code. It's about optimizer-dependent program performance. There are many optimizations that are "standard" and coders count on them on a daily basis, such as copy elision or inlining. – Melonie 15/8, 2017 at 9:12

Have you checked what code gets generated if you add explicit if before delete? – Placeeda 15/8, 2017 at 9:22

Could it be related to possibility to overload operator delete? – Handicraft 15/8, 2017 at 9:26

@DanielLangr technically copy elision, RVO or inlining are optimizations that aren't. They may happen with no optimization flags being active. Especially copy elision. – Revert 15/8, 2017 at 9:29

Another possibly interesting thing to test, change ptr to be constexpr, and see if gcc does the optimization then. – Placeeda 15/8, 2017 at 9:31

@Placeeda Then, it's optimized out as well with GCC. It works also in my custom swap vs std::swap case, when use if both in destructor and move assignment operator. I just liked the possibility not to write such a guard. – Melonie 15/8, 2017 at 9:39

Compiler writers have budgets too. We aren't obliged to optimise out every possible inane case. Only the ones that actually occur in meaningful code. – Rocray 15/8, 2017 at 10:4

@EJP I think that especially this can happen in meaningful code. Not the sample code above, of course. Have you looked at the discussion about swap vs std::swap? – Melonie 15/8, 2017 at 10:8

@cmaster That sounds a bit like a troll post. It doesn't even have anything to do with templates, but instead with RAII, which is considered a best practice in modern C++. What you are saying is that one has to decide between abstractions and performance, which is a false dichotomy to a large extent despite the grain of truth that if you want to have both, you can't take either to extremes. However, OP's question in my opinon is not extreme, but is reasonable and practical. It's a simple optimization that may help the performance of some RAII classes with straightforward implementation. – Gown 15/8, 2017 at 13:38

According to C++14 [expr.delete]/7:

If the value of the operand of the delete-expression is not a null pointer value, then:

[ ...omitted... ]

Otherwise, it is unspecified whether the deallocation function will be called.

So both compilers do comply with the standard, because it's unspecified whether operator delete is called for deletion of a null pointer.

Note that the godbolt online compiler just compiles the source file without linking. So the compiler at that stage must allow for the possibility that operator delete will be replaced by another source file.

As already speculated in another answer -- gcc may be angling for consistent behaviour in the case of a replacement operator delete; this implementation would mean that someone can overload that function for debug purposes and break on all invocations of the delete expression, even when it happened to be deleting a null pointer.

UPDATED: Removed speculation that this might not be a practical issue, since OP provided benchmarks showing that it in fact is.

Aisne answered 15/8, 2017 at 10:43 Comment(10)

Nice explanation, thanks. However, If I use delete ptr; instead of if (ptr) delete ptr; in move assignment operator and destructor of my class, this will, actually, result in unnecessary invocation of 3 call instructions with GCC when invoking std::swap with objects of that class. And, i.e., many many unnecessary call instructions in, e.g., within sorting. Such a case, I guess, can be determined by static analysis. – Melonie 15/8, 2017 at 11:33

My point is that then, it seems that providing custom swap function is reasonable even for movable classes. Moreover, std::sort implementation typically does not invoke custom swap only, but use move semantics directly as well (libstdc++, libc++, Microsoft). – Melonie 15/8, 2017 at 11:34

That's a good point. Will do some benchmark, if I have time, just out of curiosity. – Melonie 15/8, 2017 at 13:5

I updated the question to include benchmark results. The performance impact is quite big IMO. – Melonie 15/8, 2017 at 13:52

@DanielLangr OK, interesting measurement. Perhaps the gcc developers would be interested to see this example. Also do you get the same behaviour with -O3 ? – Aisne 15/8, 2017 at 14:9

The very same. (Note that there was a bug in my code, it compared pointers not pointed values. However, corrected code gives same slow-down behavior.) I also tried to employ unique_ptr for holding stored integers — the sorting time was 32.3 [s] then. – Melonie 15/8, 2017 at 14:33

"The pessimization of your test case is not really an issue in practice, since there is no reason to delete an known null pointer in the first place. In reality, delete will be called on some pointer variable whose state can't be determined by static analysis, in which case an entirely appropriate implementation (especially for pointer to non-class type) would be to call the deallocation function and let it test whether or not the pointer is null." - this might not be true after code inlining for example. That said it can be WAR as if (!(__builtin_constant_p(ptr) && ptr == NULL)) delete ptr. – Cloven 15/8, 2017 at 19:27

But it's at best angling, as types with virtual dtors would make it impossible, dtor+deallocate have to be bundled then. – Loath 15/8, 2017 at 22:4

@Aisne I have noticed GCC developers: gcc.gnu.org/ml/gcc/2017-08/msg00170.html (hope, this is the right way). – Melonie 16/8, 2017 at 11:36

The speculation about consistency is wrong, the real reason is lack of time / manpower. gcc removes free(NULL), and it will eventually remove delete when someone teaches it that that function is magic (not some arbitrary function). For the same reason, gcc removes free(malloc(42)) but not delete new int, etc. – Dissimilitude 17/8, 2017 at 11:40

Standard actually states when allocation and deallocation functions shall be called and where they not. This clause (@ n4296)

The library provides default definitions for the global allocation and deallocation functions. Some global allocation and deallocation functions are replaceable (18.6.1). A C++ program shall provide at most one definition of a replaceable allocation or deallocation function. Any such function definition replaces the default version provided in the library (17.6.4.6). The following allocation and deallocation functions (18.6) are implicitly declared in global scope in each translation unit of a program.

probably would be main reason why those function calls aren't omitted arbitrary. If they were, the replacement of their implementation of library would cause incoherent function of compiled program.

In the first alternative (delete object), the value of the operand of delete may be a null pointer value, a pointer to a non-array object created by a previous new-expression, or a pointer to a subobject (1.8) representing a base class of such an object (Clause 10). If not, the behavior is undefined.

If the argument given to a deallocation function in the standard library is a pointer that is not the null pointer value (4.10), the deallocation function shall deallocate the storage referenced by the pointer, rendering invalid all pointers referring to any part of the deallocated storage. Indirection through an invalid pointer value and passing an invalid pointer value to a deallocation function have undefined behavior. Any other use of an invalid pointer value has implementation-defined behavior.

...

If the value of the operand of the delete-expression is not a null pointer value, then

If the allocation call for the new-expression for the object to be deleted was not omitted and the allocation was not extended (5.3.4), the delete-expression shall call a deallocation function (3.7.4.2). The value returned from the allocation call of the new-expression shall be passed as the first argument to the deallocation function.

Otherwise, if the allocation was extended or was provided by extending the allocation of another newexpression, and the delete-expression for every other pointer value produced by a new-expression that had storage provided by the extended new-expression has been evaluated, the delete-expression shall call a deallocation function. The value returned from the allocation call of the extended new-expression shall be passed as the first argument to the deallocation function.

Otherwise, the delete-expression will not call a deallocation function

Otherwise, it is unspecified whether the deallocation function will be called.

Standard states what should be done if pointer is NOT null. Implying that delete in that case is noop, but to what end, is not specified.

Revert answered 15/8, 2017 at 9:41 Comment(2)

Saying what should be done if pointer is NOT null does not imply anything about when the pointer IS null. – Uranium 15/8, 2017 at 10:46

@RustyX it states that null argument is defined behavior and it states what shall done if it is not null. Nothing to be done if it is null then, but it's that vague in form, and was vague since C++0x. At least it isn't strewn around document like the definition of what happens when null pointer is dereferenced. – Revert 15/8, 2017 at 11:0

It's a QOI issue. clang does indeed elide the test:

https://godbolt.org/g/nBSykD

main:                                   # @main
        xor     eax, eax
        ret

Appendage answered 15/8, 2017 at 10:8 Comment(8)

one would expect that it could collapse whole function to noop, even excluding the call? XD – Revert 15/8, 2017 at 10:11

@Swift it has. xor eax,eax is so that main() returns 0, as required by the standard. – Appendage 15/8, 2017 at 10:12

Ah, it's main(), I missed that bit, pardon me. – Revert 15/8, 2017 at 10:16

The micro benchmark is so small that the compiler can see that no one can change the pointer. I haven't observed that in real code. – Sensualist 15/8, 2017 at 20:4

How is this answering the question? It sounds like you're just repeating a piece of background information that's stated in the question. – Mceachern 26/1, 2018 at 12:22

@DonHatch looking at the edit history of the question, it seems that the question has changed substantially since I posted this answer. – Appendage 27/1, 2018 at 16:48

@RichardHodges are you sure? I don't see how your answer answers anything that was asked in any of the 10 revisions. The fact that clang does indeed elide the test was stated in the very first revision, and that never changed. What is the question you were answering? – Mceachern 30/1, 2018 at 0:42

@DonHatch (from memory), the original question seemed to be about whether there was a standard way to get gcc/libc++ to elide the check. My view was an implied no since the standard makes no mention of it. Therefore it's a QOI issue rather than a standard one. "It's a QOI issue" on its own did not seem to me to be a fulfilling answer so I qualified it somewhat. Since then, I am happy to see that much better answers have been posted. – Appendage 30/1, 2018 at 6:43

It's always safe (for correctness) to let your program call operator delete with a nullptr.

For performance, it's very rare that having the compiler-generated asm actually do an extra test and conditional branch to skip a call to operator delete will be a win. (You can help gcc optimize away compile-time nullptr deletion without adding a runtime check, though; see below).

First of all, larger code-size outside of a real hot-spot increases pressure on the L1I cache, and the even smaller decoded-uop cache on x86 CPUs that have one (Intel SnB-family, AMD Ryzen).

Second, extra conditional branches use up entries in the branch-prediction caches (BTB = Branch Target Buffer and so on). Depending on the CPU, even a branch that's never taken may worsen predictions for other branches if it aliases them in the BTB. (On others, such a branch never gets an entry in the BTB, to save entries for branches where the default static prediction of fall-through is accurate.) See https://xania.org/201602/bpu-part-one.

If nullptr is rare in a given code path, then on average checking & branch to avoid the call ends up with your program spending more time on the check than the check saves.

If profiling shows you have a hot-spot that includes a delete, and instrumentation / logging shows that it often actually calls delete with a nullptr, then it's worth trying
if (ptr) delete ptr; instead of just delete ptr;

Branch prediction might have better luck in that one call site than for the branch inside operator delete, especially if there's any correlation with other nearby branches. (Apparently modern BPUs don't just look at each branch in isolation.) This is on top of saving the unconditional call into the library function (plus another jmp from the PLT stub, from dynamic linking overhead on Unix/Linux).

If you are checking for null for any other reason, then it could make sense to put the delete inside the non-null branch of your code.

You can avoid delete calls in cases where gcc can prove (after inlining) that a pointer is null, but without doing a runtime check if not:

static inline bool 
is_compiletime_null(const void *ptr) {
#ifdef   __GNUC__
    // __builtin_constant_p(ptr) is false even for nullptr,
    // but the checking the result of booleanizing works.
    return __builtin_constant_p(!ptr) && !ptr;
#else
    return false;
#endif
}

It will always return false with clang because it evaluates __builtin_constant_p before inlining. But since clang already skips delete calls when it can prove a pointer is null, you don't need it.

This might actually help in std::move cases, and you can safely use it anywhere with (in theory) no performance downside. I always compiles to if(true) or if(false), so it's very different from if(ptr), which is likely to result in a runtime branch because the compiler probably can't prove the pointer is non-null in most cases either. (A dereference might, though, because a null deref would be UB, and modern compilers optimized based on the assumption that the code doesn't contain any UB).

You could make this a macro to avoid bloating non-optimized builds (and so it would "work" without having to inline first). You can use a GNU C statement-expression to avoid double-evaluating the macro arg (see examples for GNU C min() and max()). For the fallback for compilers without GNU extensions, you could write ((ptr), false) or something to evaluate the arg once for side effects while producing a false result.

Demonstration: asm from gcc6.3 -O3 on the Godbolt compiler explorer

void foo(int *ptr) {
    if (!is_compiletime_null(ptr))
        delete ptr;
}

    # compiles to a tailcall of operator delete
    jmp     operator delete(void*)


void bar() {
    foo(nullptr);
}

    # optimizes out the delete
    rep ret

It compiles correctly with MSVC (also on the compiler explorer link), but with the test always returning false, bar() is:

    # MSVC doesn't support GNU C extensions, and doesn't skip nullptr deletes itself
    mov      edx, 4
    xor      ecx, ecx
    jmp      ??3@YAXPEAX_K@Z      ; operator delete

Interesting to note that MSVC's operator delete takes the object size as a function arg (mov edx, 4), but gcc/Linux/libstdc++ code just passes the pointer.

Related: I found this blog post, using C11 (not C++11) _Generic to try to portably do something like __builtin_constant_p null-pointer checks inside static initializers.

Everick answered 15/8, 2017 at 12:54 Comment(0)

I think, the compiler has no knowledge about "delete", especially that "delete null" is a NOOP.

You may write it explicit, so the compiler does not need to imply knowledge about delete.

WARNING: I do not recommend this as general implementation. The following example should show, how you could "convince" a limited compiler to remove code anyway in that very special and limited program

int main() {
 int* ptr = nullptr;

 if (ptr != nullptr) {
    delete ptr;
 }
}

Where I remember right, there is a way to replace "delete" with an own function. And in the case an optimization by the compiler would went wrong.

@RichardHodges: Why should it be an de-optimization when one give the compiler the hint to remove a call?

delete null is in general a NOOP (no operation). However, since it is possible to replace or overwrite delete there is no guarantee for all cases.

So it is up to the compiler to know and to decide whether to use the knowledge that delete null could always removed. There are good arguments for both choises

However, the compiler is always allowed to remove dead code, this "if (false) {...}" or "if (nullptr != nullptr) {...}"

So a compiler will remove dead code and then when using explicit checking, it looks like

int main() {
 int* ptr = nullptr;

 // dead code    if (ptr != nullptr) {
 //        delete ptr;
 //     }
}

Please tell me, where is there a de-optimization?

I call my proposal a defensive style of coding, but not a de-optimization

If someone may argue, that now the non-nullptr will causes two-times checking on nullptr, I have to reply

Sorry, this wasn't the original question
if the compiler knows about delete, especially that delete null is a noop, than the compiler could remove the outer if either. However, I would not expect compilers to be so specific

@Peter Cordes: I agree guarding with an if is not an general optimization rule. However, general optimization was NOT the question of the opener. The question was why some compiler do not elimate the delete in a very short, non-sense program. I showed a way to make the compiler to eliminate it anyway.

If a situation happen like in that short program, probably something other is wrong. In general I would try to avoid new/delete (malloc/free) as the calls are rather expensive. If possible I prefer to use the stack (auto).

When I take a look at the meanwhile documented real case, I would say, class X is designed wrong, causing poor performance and too much memory. (https://godbolt.org/g/7zGUvo)

Instead of

class X {
  int* i_;
  public:
  ...

in would design

class X {
  int i;
  bool valid;
  public:
  ...

or more earlier, I would ask of the sense of sorting empty/invalid items. In the end I would like to get rid of "valid", too.

Congratulation answered 15/8, 2017 at 9:26 Comment(9)

it's operator delete[] and operator delete – Revert 15/8, 2017 at 9:31

delete explicitly checks for nullptr. This code is a de-optimisation. – Appendage 15/8, 2017 at 10:11

@RichardHodges I guess it might work around the optimization issue at hand here , i.e. avoiding a function call even in the null pointer case – Aisne 15/8, 2017 at 10:32

@Aisne I wouldn't call it a bug. More like a QOI issue. – Appendage 15/8, 2017 at 10:34

@stefan: It's a de-optimization to use if(ptr) delete ptr; because usually the compiler doesn't know whether the pointer is null or not. In those cases, it has to emit instructions that actually check, because that code requires that delete not be called with a null pointer. See my answer. It's hard to imagine any real case where that block of code would only ever run with a compile-time-constant null pointer, so your first example isn't useful. – Everick 15/8, 2017 at 17:47

@Peter Cordes: I explain more in the answer – Congratulation 15/8, 2017 at 19:22

@stefanbachert Sure, the class X is nonsense, it is just a benchmark code. But, the behavior would be the very same for any class that has an owning member pointer (such as a string class holding pointer to characters, a vector class holding pointer to its elements, etc.). The question is not about meaning of class X, its about generic cases where move assignment operators and destructors need to call delete, which then is in std::swap called 3 times on a null pointer. – Melonie 15/8, 2017 at 19:43

@DanielLangr: in that case, letting the compiler optimize it away when it can prove (after inlining) that a pointer is already nullptr sounds useful. Try the code with __builtin_constant_p(!ptr) in my answer, and see if it improves gcc's -O3 asm output in a case like you're talking about. – Everick 16/8, 2017 at 4:56

@PeterCordes I will if I have some time, you may try as well :). But I prefer the if (ptr) delete ptr; solution, since it works with GCC and is portable. – Melonie 16/8, 2017 at 6:18

First of all, I'll just agree with some previous answerers in that it's not a bug, and GCC may do as it pleases here. That said, I was wondering whether this means that some common and simple RAII code may be slower on GCC than Clang because a straightforward optimization is not done.

So I wrote a small test case for RAII:

struct A
{
    explicit A() : ptr(nullptr) {}
    A(A &&from)
        : ptr(from.ptr)
    {
        from.ptr = nullptr;
    }

    A &operator =(A &&from)
    {
        if ( &from != this )
        {
            delete ptr;
            ptr = from.ptr;
            from.ptr = nullptr;
        }
        return *this;
    }

    int *ptr;
};

A a1;

A getA2();

void setA1()
{
    a1 = getA2();
}

As you may see here, GCC does elide the second call to delete in setA1 (for the moved-from temporary that was created in the call to getA2). The first call is necessary for program correctness because a1 or a1.ptr may have been previously assigned to.

Obviously I would prefer more "rhyme and reason" – why is the optimization done sometimes but not always – but I'm not willing to sprinkle redundant if ( ptr != nullptr ) checks all over my RAII code just yet.

Gown answered 15/8, 2017 at 14:4 Comment(2)

What you're seeing the the effect of RVO. – Appendage 16/8, 2017 at 5:39

"I'm not willing to sprinkle redundant...", if the piece of code is likely to do null deletes often, and if object is of such nature that that there are likely to be large collections of them, then it just might be non-premature optimization to add the check, because a check is so cheap compared to a function call... The damage done in the worst case is basically immeasurably small, while benefit is potentially large looking at latest question edit (benchmarks). – Placeeda 16/8, 2017 at 8:4

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags