How does a C++ library implementation allocate memory but not free it when the program exits?
Asked Answered
T

3

2

The code is fairly simple:

#include <vector>
int main() {
    std::vector<int> v;
}

Then I build and run it with Valgrind on Linux:

g++ test.cc && valgrind ./a.out
==8511== Memcheck, a memory error detector
...
==8511== HEAP SUMMARY:
==8511==     in use at exit: 72,704 bytes in 1 blocks
==8511==   total heap usage: 1 allocs, 0 frees, 72,704 bytes allocated
==8511==
==8511== LEAK SUMMARY:
==8511==    definitely lost: 0 bytes in 0 blocks
==8511==    indirectly lost: 0 bytes in 0 blocks
==8511==      possibly lost: 0 bytes in 0 blocks
==8511==    still reachable: 72,704 bytes in 1 blocks
==8511==         suppressed: 0 bytes in 0 blocks
...
==8511== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

Here, there is no memory leak, even though there is 1 alloc and 0 free. The answer to this question quotes this paragraph from Valgrind's FAQ for explanation -

Many implementations of the C++ standard libraries use their own memory pool allocators. Memory for quite a number of destructed objects is not immediately freed and given back to the OS, but kept in the pool(s) for later re-use.

My main question is:

How does the C++ library implementation achieve that? Does it keep around a separate process in the background that handles all allocation requests from its standard templates, so that when the program exits (a.out here), the memory is not immediately given back to the OS? If so, when will it give back, and how can I check the process indeed exists? If not, what is the "magic" behind the scene?

Another question:

There is 71 KB allocated. Why this number?

Thanks:)

Teillo answered 7/8, 2017 at 0:37 Comment(4)
Possible duplicate of Valgrind shows std::vector<> times of alloc is more than free, but no memory leakMonitory
Additionally, there is 72 KiB allocated, not 71 bytes.Monitory
@Monitory it is not a duplicate. That question was mine as well but from a different angle.Teillo
@Monitory oh the bytes.. sorry for the typo. I just corrected it.Teillo
N
1

How does the C++ library implementation achieve that?

It doesn't. The valgrind information is outdated, I don't think any modern C++ implementations do that.

Does it keep around a separate process in the background that handles all allocation requests from its standard templates, so that when the program exits (a.out here), the memory is not immediately given back to the OS?

No, you've misunderstood. The valgrind docs aren't talking about keeping memory around that outlives the process. It's just talking about keeping memory pools within the process so that memory allocated and then deallocated by the process is kept in a pool and reused (by the same process!) later, instead of calling free immediately. But nobody does that for std::allocator nowadays, because std::allocator needs to be general purpose and perform reasonably well in all scenarios, and a good malloc implementation should meet those needs anyway. It's also fairly easy for users to override the default system malloc with an alternative like tcmalloc or jemalloc, so if std::allocator just forwards to malloc then it gets all the benefits of that replacement malloc.

If so, when will it give back, and how can I check the process indeed exists? If not, what is the "magic" behind the scene?

All memory in a process is returned to the OS when the process exits. There is no magic.

But the allocation you're seeing has nothing to do with this anyway.

There is 71 KB allocated. Why this number?

The 72kb you're seeing is allocated by the C++ runtime for its "emergency exception-handling pool". This pool is used to be able to allocate exception objects (such as bad_alloc exceptions) even when malloc can no longer allocate anything. We pre-allocate at startup, so if malloc runs out of memory we can still throw bad_alloc exceptions.

The specific number comes from this code:

       // Allocate the arena - we could add a GLIBCXX_EH_ARENA_SIZE environment
       // to make this tunable.
       arena_size = (EMERGENCY_OBJ_SIZE * EMERGENCY_OBJ_COUNT
                     + EMERGENCY_OBJ_COUNT * sizeof (__cxa_dependent_exception));
       arena = (char *)malloc (arena_size);

See https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=libstdc%2B%2B-v3/libsupc%2B%2B/eh_alloc.cc;h=005c28dbb1146c28715ac69f013ae41e3492f992;hb=HEAD#l117

Newer versions of valgrind know about this emergency EH pool and call a special function to free it right before the process exits, so that you don't see in use at exit: 72,704 bytes in 1 blocks. This was done because too many people fail to understand that memory still in use (and still reachable) is not a leak, and people kept complaining about it. So now valgrind frees it, just to stop people complaining. When not running under valgrind the pool doesn't get freed, because doing so is unnecessary (the OS will reclaim it when the process exits anyway).

Normalie answered 5/4, 2019 at 12:57 Comment(4)
I don't agree that malloc can do a better job of allocation and de-allocation compared to pooled allocators or other tailor-made allocators. Of course malloc is the most general and usually the more appropriate solution but that generality comes at a cost. At a minimum it is a function call, malloc doesn't receive the size of the deallocated block and can't take advantage of higher level knowledge like "these nodes will probably all be destroyed at once". All of those mean that custom allocators can be significantly faster. I rarely use them though, malloc is good enough for most stuff.Rout
Good catch on the actual cause of the issue!Rout
@Rout I agree that custom allocators can be faster, but std::allocator has to be almost as general purpose as malloc. I'll edit the answer to clarify that I'm only talking about why std::allocator and new don't do their own pooling.Normalie
Additionally, even if pooling allocators can be better in theory, unless a clever std::allocator is actually measurably better than malloc in practice, it might be better to just forward directly to malloc (and invest time in improving malloc instead). That's why libstdc++ switched away from trying to beat malloc. A decent malloc takes a lot of work to beat.Normalie
R
5

First, you aren't testing anything with that unused vector. Compilers are smart, and both gcc and clang at -O2 compile the code above to an empty main() (other than a single xor eax, eax to set the return value. See the assembly here. Also, the default constructor for most vector implementations (including gcc and clang) won't even allocate anything - it will wait until the first element is added before taking the expensive step of allocation.

To get a more concrete result, allocate a BIG vector (so you can distinguish it from the noise) and pass it to a method in another translation unit (or defined in a separate .cpp file), like this:

#include <vector>

void sink(std::vector<int>& v);

int main() {
    std::vector<int> v(12345678);
    sink(v);
}

Now when you check the assembly, you see it is actually doing something.

So the ~72,000 bytes you are seeing reported by Valgrind has nothing to do with your std::vector<int> v and you'd probably see the same figure with a completely empty main.

Still the idea of the question and the quoted documentation stands apart from that issue and I'll answer it below.

All memory is generally freed back to the OS when the program exits, and it is the OS that enforces this, not the standard library. The OS simply cleans up all resources used by the process, including an unshared memory allocation. When Valgrind refers to "in use at exit" it is talking about before this OS cleanup occurs, since that's what you want to know to see if you are forgetting to free anything.

You don't need any separate process to handle this. It is implemented by having Valgrind track malloc and free calls, and perhaps some other standard allocation routines.

The comment you quoted from the FAQ about many standard library using "use their own memory pool allocators" is referring to the idea that a standard library may use another caching allocation layer on top of those which calls one of the known allocations calls like malloc or operator new initially when memory is needed, but when the memory is de-allocated it saves it internally in some list rather than calling the corresponding de-allocation routine (such as free or delete).

On subsequent allocations it will use the stuff in its internal lists in preference to going back to the standard methods (if the list is exhausted, it has to call the standard routines). This would make it invisible to Valgrind, which would consider the memory still "in use" by the application.

Because of the somewhat useless definitions of the std::allocator stuff in old versions of C++ this wasn't heavily used, and I don't agree that "many" standard libraries use this type of pool allocator by default - at least today: I am not in fact aware of any that does this anymore between the major standard library implementations, although some did in the past. However, the allocator argument is a template parameter of each container class, so end users may also perform this customization, especially since the allocator interface has been improved in newer standards.

Big wins in practice for such pooled allocators are (a) using thread-local, fixed size allocations for a container as all contained objects are the same size and (b) allowing the allocator to free everything in one operation when the container is destroyed rather than freeing element by element.

The documentation you quoted is a bit confusing because it talks about (not) retuning memory to the OS - but it should really say "retuning to the standard allocation routines". Valgrind does not need memory to be returned to the OS to see it as freed - it hooks all the standard routines and knows when you have freed at that level. The standard routines themselves heavily "cache" allocated memory as described above (this is common, unlike allocator routine caching which is uncommon) so if Valgrind required memory to be returned to the OS it would be quite useless at reporting "allocated memory at exit".

Rout answered 7/8, 2017 at 0:42 Comment(12)
There might also be some globals allocated.Monitory
@ara Indeed, I very much doubt (and hope) that the runtime isn't allocating 72,000 bytes to satisfy a vector creation needing no allocation at all (or a very small one, depending on what the default constructor does).Rout
But what is a "standard allocation routine"? Is it an additional layer between the OS and my program? If it is, is it a part of the kernel process? If it is, it should be within the OS (because kernel is in the OS), not between the OS and my program. If it is not part of the kernel process, and not part of my program (because my program just terminates), who is maintaining it? I feel lost..Teillo
@user8385554 - standard allocation routines are pretty much malloc and free and a few variants of those. Precisely in this context it's just the things that Valgrind knows about and "hooks" into - it knows whenever you got memory from such a routine and when you free it. It is not part of the kernel/OS - it is part of the userland standard library, something like libc on Unix-like platforms or the various msvcrt libraries when using Visual Studio, etc. These are the primary ways C and C++ processes request memory (most of the C++ allocation routines boil down to malloc).Rout
@user8385554 - of course, malloc will internally need to call OS routines from time to time to reserve chunks of memory that it can hand out to satisfy this request, e.g., using sbrk or mmap on Unix-like platforms or HeapAlloc on Windows, but it usually does this for "large" chunks since it is relatively slow to make a kernel call, and then divides it up into the smaller chunks the application wants. Valgrind doesn't directly support hooking all of these OS-level calls, see here for example.Rout
@Rout So, from the OS's point of view, it has some memory taken away but never returned back by the requester voluntarily. The OS either has to get the memory back after the program terminates (most modern OSes do), or let the memory leak away. Is my understanding correct?Teillo
@user8385554 - more or less correct, except for the "leak away" part. The memory isn't leaking per-se, it just is only available to that process, not to all processes, but it is still reused within the process. If you repeatedly allocate and free a 1 MB buffer, 1000 times, it will only request about 1 MB from the OS and then re-use it internally (that's the job of malloc), but yes, even if you end with a free the OS usually doesn't get back the memory you allocate.Rout
That's really a whole separate question though, so if you want to ask "why don't memory allocators actively return freed memory to the OS" go ahead and ask it, link it here, and I'll take a shot. One thing to note is that they do sometimes return this memory, and with some allocators you can ask them to do so if you want, so it's not hopeless.Rout
@Rout ok i's asking it - question thanksTeillo
The std::vector default constructor is noexcept, so it's not just gcc and clang that don't allocate until something is inserted, no implementation does.Normalie
Also, the G++ and Clang standard libraries do not do any caching of memory in std::allocator, all memory allocations/deallocations go directly to new/delete which go directly to malloc/free.Normalie
@JonathanWakely - correct and I didn't mean to imply otherwise. I think some standard libraries did try to use caching allocators above malloc in the past, IIRC older g++ versions did - but I guess they have since abandoned that approach, maybe because whatever performance quirk they were trying to solve was solved by malloc.Rout
B
4

I think you misunderstood this. The memory is given back to the os if the app terminates. But the memory is not given back to the os, just because the object is destroyed

Brookner answered 7/8, 2017 at 0:46 Comment(3)
If the memory is not given back when the program terminates, why isn't there a memory leak?Teillo
@user8358554 the operating system knows how to clean up all the resources belonging to the entire process. It is not considered a leak at program termination.Scaramouch
It may or may not be a "leak" - it depends on the semantics of the application - for example, the application may not bother or even be able to easily clean up long-lived singleton or global objects, and may these have a "fixed" footprint, so you might not consider it a leak. On the other hand, you could also be unexpectedly accumulating memory which would be a true leak.Rout
N
1

How does the C++ library implementation achieve that?

It doesn't. The valgrind information is outdated, I don't think any modern C++ implementations do that.

Does it keep around a separate process in the background that handles all allocation requests from its standard templates, so that when the program exits (a.out here), the memory is not immediately given back to the OS?

No, you've misunderstood. The valgrind docs aren't talking about keeping memory around that outlives the process. It's just talking about keeping memory pools within the process so that memory allocated and then deallocated by the process is kept in a pool and reused (by the same process!) later, instead of calling free immediately. But nobody does that for std::allocator nowadays, because std::allocator needs to be general purpose and perform reasonably well in all scenarios, and a good malloc implementation should meet those needs anyway. It's also fairly easy for users to override the default system malloc with an alternative like tcmalloc or jemalloc, so if std::allocator just forwards to malloc then it gets all the benefits of that replacement malloc.

If so, when will it give back, and how can I check the process indeed exists? If not, what is the "magic" behind the scene?

All memory in a process is returned to the OS when the process exits. There is no magic.

But the allocation you're seeing has nothing to do with this anyway.

There is 71 KB allocated. Why this number?

The 72kb you're seeing is allocated by the C++ runtime for its "emergency exception-handling pool". This pool is used to be able to allocate exception objects (such as bad_alloc exceptions) even when malloc can no longer allocate anything. We pre-allocate at startup, so if malloc runs out of memory we can still throw bad_alloc exceptions.

The specific number comes from this code:

       // Allocate the arena - we could add a GLIBCXX_EH_ARENA_SIZE environment
       // to make this tunable.
       arena_size = (EMERGENCY_OBJ_SIZE * EMERGENCY_OBJ_COUNT
                     + EMERGENCY_OBJ_COUNT * sizeof (__cxa_dependent_exception));
       arena = (char *)malloc (arena_size);

See https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=libstdc%2B%2B-v3/libsupc%2B%2B/eh_alloc.cc;h=005c28dbb1146c28715ac69f013ae41e3492f992;hb=HEAD#l117

Newer versions of valgrind know about this emergency EH pool and call a special function to free it right before the process exits, so that you don't see in use at exit: 72,704 bytes in 1 blocks. This was done because too many people fail to understand that memory still in use (and still reachable) is not a leak, and people kept complaining about it. So now valgrind frees it, just to stop people complaining. When not running under valgrind the pool doesn't get freed, because doing so is unnecessary (the OS will reclaim it when the process exits anyway).

Normalie answered 5/4, 2019 at 12:57 Comment(4)
I don't agree that malloc can do a better job of allocation and de-allocation compared to pooled allocators or other tailor-made allocators. Of course malloc is the most general and usually the more appropriate solution but that generality comes at a cost. At a minimum it is a function call, malloc doesn't receive the size of the deallocated block and can't take advantage of higher level knowledge like "these nodes will probably all be destroyed at once". All of those mean that custom allocators can be significantly faster. I rarely use them though, malloc is good enough for most stuff.Rout
Good catch on the actual cause of the issue!Rout
@Rout I agree that custom allocators can be faster, but std::allocator has to be almost as general purpose as malloc. I'll edit the answer to clarify that I'm only talking about why std::allocator and new don't do their own pooling.Normalie
Additionally, even if pooling allocators can be better in theory, unless a clever std::allocator is actually measurably better than malloc in practice, it might be better to just forward directly to malloc (and invest time in improving malloc instead). That's why libstdc++ switched away from trying to beat malloc. A decent malloc takes a lot of work to beat.Normalie

© 2022 - 2024 — McMap. All rights reserved.