When can garbage collection be faster than manual memory management? [closed]

Asked 8/8, 2011 at 1:14 Answered 8/8, 2011 at 8:16

Under what circumstances is garbage collection more efficient than manual memory management? (Here manual could mean using malloc and free as in C, or the cleaner RAII and smart pointer techniques popularized by C++)

I like how garbage collection can remove some accidental complexity from writing software, but I was even more pleased at how RAII and smart pointers can eliminate that complexity while also working on resources other than memory, being deterministic, and providing performance guarantees and being more efficient overall. So I thought I could safely ignore garbage collection. However, I've noticed that people have been saying that garbage collection is faster than the tight resource management used in C++, such as when there is a lot of extra memory available.

So when exactly can garbage collection outperform manual memory management? I like RAII and smart pointers but would happy to accept garbage collection as another tool if it is faster.

Luciferase answered 8/8, 2011 at 1:14 Comment(7)

One time when garbage collection can be faster is when nothing ever needs to be freed. If your program doesn't use a lot of memory resources, the garbage collector may never need to kick in. And the allocator in a GC system may be as simple as a move of a pointer. – Impermissible 8/8, 2011 at 1:30

GC performance is highly dependent on the algo chosen. Same goes for 'manual' memory managers. We're comparing apples to oranges at best. – Phippen 8/8, 2011 at 5:42

@Michael Burr: In that case, manual management might be equally cheap. You're essentially describing the case in which deallocations are batched, and the program terminates before one entire batch of allocations has been deallocated. That's possible with both GC and malloc/free implementations. – Laurasia 8/8, 2011 at 12:31

I honestly think you're asking the wrong question. The better question would be, "Does a GC meet or exceed the performance budget of my application and what advantages does a GC confer?" – Stenophagous 8/8, 2011 at 14:50

@MSalters: that's true if all the allocation/deallocation code is under your control, but if you're using library code (whether your own or someone else's) or using RAII, you usually lose control of the deallocation policies to some extent. Anyway, I put it in a comment because I don't think it's really an important performance consideration. It may come into play in some theoretic way in smaller programs/utilities, but I don't think that in those situations it's going to be any real performance 'win'. – Impermissible 8/8, 2011 at 16:16

@Michael Burr: If nothing needs to be freed, then you can basically do the same with a memory pool using placement new. In which case, it'll be the same as a pointer bump, and you won't even get GC interrupting to scan stack and trace reachables. GC is slower in this case. – Phippen 9/8, 2011 at 3:0

Unfortunately this question is way too open-ended for SO to deal with properly. You're not specifying which type of C++ flavor, runtime, or platform you're comparing with, which means there are plenty of ways to implement "manual" memory management. The current answers are also wildly disagreeing with each other. This question invites discussion, opinion, and is phrased in such a way that it can hardly be answered by facts. – Eolic 9/8, 2011 at 8:28

GC advantages:

a GC allocates simply by incrementing a pointer, heap allocators must take counter-measures to avoid heap fragmentation
a GC improves cpu cache locality, a big deal on modern processors
a GC requires no extra code to release memory, low odds for leaks
a generational GG can collect garbage concurrently, making memory management close to free on a multi-core cpu.

GC disadvantages:

difficult to make efficient in a language that treats pointers as first class types
uses more virtual memory space due to collection latency
leaky abstraction for operating system resources other than memory
can cause pauses in program operation in some cases while garbage is being collected.

It is a slamdunk for perf, a GC beats a heap allocator easily without effort. The Chinese Dictionary programming contest between Rico Mariani and Raymond Chen is often quoted, overview is here. Raymond's C++ program eventually won but only after rewriting it several times and giving up on the standard C++ library.

Monoceros answered 8/8, 2011 at 2:15 Comment(6)

GC can't simply bump a pointer to allocate memory unless Mark-Sweep-Compact algo is used (i.e. MS .NET GC). In non C++/CLR (i.e. traditionally C++), you won't be able to do the Compact part. – Phippen 8/8, 2011 at 5:39

Also, note that Generational GC is an orthogonal concept to concurrent GC. – Phippen 8/8, 2011 at 6:2

GC still needs to release memory to the OS when a page is no longer in use. Similarly, a heap memory manager can also retain this unused page for future use. – Phippen 8/8, 2011 at 6:4

GC doesn't improve cache locality vs a decent heap memory manager (e.g. low frag win32 heap mem manager). However, it increases data locality such that less page fault will occur hence increasing performance. – Phippen 8/8, 2011 at 6:6

Your conclusion is misleading at best. Very little of the Mariani/Chen comparison had anything to do with memory management at all. The little that did was specifically because C++ at that time didn't have move constructors (which it now does). In the end, it's doubtful that anybody who knew what they were doing would ever write code similar to Raymond Chen's initial version (or most intermediate steps) except as a straw man to later knock over. Finally, the code in question is for a sufficiently oddball task that it's doubtful whether it ever really meant anything to anybody. – Capacitor 8/8, 2011 at 10:18

Well, this answer gets hated on about as much as I expected. You can arbitrarily dismiss Chen as a "bad programmer" but that just completely misses the point. He eventually did make a version that beat Mariani's version. The point is that it took him multiple versions and weeks to get there. Managed code is a productivity enhancer, that's all. As far as C++ getting move semantics after being stuck-in-the-mud for many years, maybe you ought to consider thanking strong competition for that :) – Monoceros 22/5, 2014 at 13:45

Never, and I can prove it.

First, let's assume we're using the best algorithms in either case. Using a sub-optimal algorithm can prove anything.

Secondly, let's assume the best GC algorithm uses times T0...Tn to decide whether memory allocation i should be freed at a certain moment. The total then is Σ(Ti)

Now, an equivalent manual memory management algorithm exists that uses the same GC algorithm, but only considers memory blocks which have been manually marked to be freed. Therefore, the running time is Σ(δiTi) (with δi=0 when block i wasn't freed, and =1 when it was).

Obviously, Σ(δiTi) ≤ Σ(Ti) : there's a manual algorithm that's strictly not worse than a GC algorithm.

How does that contrast with other answers?

"With RAII, you have to deallocate every time you allocate." - No, that's a mistaken assumption. We should compare either batched or non-batched operations. GC would be far worse if you'd also do a GC run every time something went out of scope.
"Improved locality" - well, unless you discount the fact that non-GC languages can use the stack far more often, and that stack has an excellent locality of reference.
"low odds for leaks" - that's entirely true, but we were discussing runtime performance. (BTW: good to point out that it's low but non-zero odds).

Laurasia answered 8/8, 2011 at 8:16 Comment(14)

it still is GC, but with manual help. And it loses performance guarantees of manual memory management. But for throughput it probubly is the best combination. – Surveillance 8/8, 2011 at 11:46

Those "performance guarantees" of manual memory management are similarly dependent on the tradeoffs made behind the scenes. E.g. you can trivially show that the efficiency of free(ptr) can be improved by batching N deallocations, if only for the improved locality of reference. As for "GC with manual help", well, no, that's just an artifact of my math. It was not necessary to prove that the Ti time bounds per (de)allocation can be improved, so I didn't prove it. – Laurasia 8/8, 2011 at 12:27

You're only addressing deallocation (free) with your maths. How's that enough proof for GC being inferior on the whole? – Phippen 9/8, 2011 at 2:51

@Zach Saw: For the equivalence proof, you'd have to use the same allocation mechanism in both cases. So the allocation was (perhaps not explicit enough) addressed too. – Laurasia 9/8, 2011 at 7:41

@MSalters: Perhaps you'd like to explicitly address it then? Particularly on the book keeping part of manual memory managers? – Phippen 9/8, 2011 at 7:55

@MSalters: "only considers memory blocks which have manually marked to be freed" - you aren't accounting for the time it takes to do this marking in the first place, time which doesn't have to be spent at all in a GC language. Wouldn't that need to be addressed for this proof to work? – Mountain 13/2, 2014 at 17:23

@antinome: That's typically done right when the block has been used for the last time, meaning that it's in L1 cache. The write-back is then pretty much free. Furthermore, note that I've assumed the GC needs no time to determine whether block i needs to freed either. I strongly believe the zero-time assumption for both cases is not biased against the GC case. – Laurasia 13/2, 2014 at 19:56

@MSalters: "assume we're using the best algorithms in either case" (what is "best" anyway?), you can't "use the same allocation mechanism in both cases" - just doesn't make sense. GC allocation time is a pointer-bump with Mark-Sweep-Compact. Manual memory management requires more complicated allocation strategies. – Phippen 21/6, 2016 at 2:18

@MSalters: The math you presented here is completely flawed, because the assumptions were incorrect to begin with. – Phippen 21/6, 2016 at 2:21

@MSalters, consider a program that allocates 100 objects, then shortly afterwards discards 90 of them and keeps the remaining 10. Your manual memory allocator costs 100*Am + 90*Fm (where Am and Fm are the costs of the manual allocation and manual free operations). A traditional generational mark-sweep collector will cost 100*Ac + 10Mc + 10Cc (where Ac is the cost of garbage collected allocation, Mc is the cost of the collector's marking algorithm (to identify "live" objects), and Cc is the collector's copying algorithm, which will likely include the cost of Ac into the next zone. – Somehow 13/11, 2017 at 17:54

This is clever, but -1 because this answer is actually proving that "garbage collection combined with manual freeing" can be superior to garbage collection by itself, which is very different from normal manual memory management, and is not representative of real-world, practical manual memory management. If there exists an algorithm which is more efficient than normal manual memory management, but the only way I can beat it is to re-implement that same garbage collection algorithm's logic on top of manual memory management, the result is garbage collection with the internals showing. – Lallage 15/2, 2019 at 22:40

Though you could make an argument that explicit garbage collection on top of manual memory management is the best way to implement garbage collection, because of the benefits of explicit, caller-driven abstractions over implicit and invisible ones. And that true manual memory management allows for the implementation of different, swappable explicit abstractions for garbage collection, that code could choose to use or not as needed. In this sense, manual memory management is a superior "foundation" feature, and can allow access to the fastest possible algorithm for the specific situation. – Lallage 15/2, 2019 at 22:51

Actually, it has been proven than in some cases, GC is faster than manual memory allocation. Simply put, memory alloc is two operations per memory segment: alloc/dealloc. But GC can just wipe out entire parts of the memory with one operation. When you have a small live part of memory, GC is insanely fast. – Contrabandist 9/8, 2019 at 11:14

@jeffythedragonslayer it would be nice unaccepting an answer that is demonstrably false – Contrabandist 14/8, 2019 at 13:44

There's one particular scenario I know in which GC pointer is much faster than a smart pointer (reference counted pointer) when both are implemented in traditional C++ (i.e. not C++/CLR as we won't have the same luxury as to Compact the memory after Mark-Sweep, and we're trying to compare apples to apples here as much as we can) assuming GC uses the same underlying heap memory manager. It is when your object assignment time significantly overwhelms object creation time.

Examples include sorting, array reversal etc.

See here for more info on my test with a GC framework I implemented using traditional C++ vs reference counted pointers. Outcome of the array reversal test was GcString was 16.5 times faster than ref counted String.

This could be attributed to the painfully slow bus lock semantics in a reference counted pointer (unless you're targeting purely single threaded apps, locks are required for thread safety). From my experience of implementing a high performance precise GC in traditional C++, I can say that there are more opportunities for optimizations with a GC vs 'manual' (as per OP's definition of manual) memory management (i.e. smart pointers).

Phippen answered 8/8, 2011 at 6:0 Comment(5)

You should use atomic operations for the reference count, not locks. – Aceae 8/8, 2011 at 6:14

@GMan: Atomic operations won't lock unless you prefix lock to the mnemonic. Or were you suggesting using implicit lock opcodes such as xchg? Either way, they're locks. – Phippen 8/8, 2011 at 6:16

'Lock' is usually defined to mean an OS lock, while atomic operations might be considered a "CPU lock". In any case, category wasn't the point, it was implementation. An OS lock is overkill (the only appropriate one, if it were necessary, would be a spin lock). – Aceae 8/8, 2011 at 6:18

@GMan: Oh I now see what you're trying to suggest. You were thinking the ref-counted pointer was using a critical section? LOL... – Phippen 8/8, 2011 at 6:19

Like I said, 'lock' is usually defined to mean 'OS lock', while you would use the phase 'atomic operation' to mean a 'CPU "lock"'. – Aceae 8/8, 2011 at 6:21

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags