Reducing the heap size of a C++ program after large calculation
Asked Answered
L

3

10

Consider an MPI application based on two steps which we shall call load and globalReduce. Just for simplicity the software is being described as such yet there is a lot more going on, so it is not purely a Map/Reduce problem.

During the load step, all ranks in each given node are queued so that one and only one rank has full access to all memory of the node. The reason for this design arises from the fact that during the load stage, there is a set of large IO blocks being read, and they all need to be loaded in memory before a local reduction can take place. We shall call the result of this local reduction a named variable myRankVector. Once the myRankVector variable is obtained, the IO blocks are released. The variable myRankVector itself uses little memory, so while during its creation the node can be using all the memory, after completion the rank only needs to use 2-3 GB to hold myRankVector.

During the globalReduce stage in the node, it is expected all ranks in the node had loaded their corresponding globalReduce.

So here is my problem, while I have ensured that there are absolutely not any memory leaks (I program using shared pointers, I double checked with Valgrind, etc.), I am positive that the heap remains expanded even after all the destructors have released the IO blocks. When the next rank in the queue comes to do its job, it starts asking for lots of memory just as the previous rank did and of course the program gets the Linux kill yielding "Out of memory: Kill process xxx (xxxxxxxx) score xxxx or sacrifice child". It is clear why this is the case, the second rank in the queue wants to use all the memory yet the first rank remains with a large heap.

So, after the setting the context of this question: is there a way to manually reduce the heap size in C++ to truly release memory not being used?

Thanks.

Lentigo answered 31/12, 2015 at 1:4 Comment(6)
May not be helpful but you could fork/exec a child program to do the big calculation, then its heap would be "truly freed" when it exited.Geof
We'd need to see the code. The question is why the second rank doesn't reuse the freed memory.Olympic
Why not have a single process on each node that in a loop over all ranks will: 1) get the rank vector, 2) launch a separate thread, pinned to a different core, with access to the rank vector? Then all the major memory usage is in the same process, solving your issue while still using parallelism.Presage
May need to check those shared pointers. Are you sure the resources are released after the job is done? Do you need to call sp.reset()?Eglantine
@gxy, the shared pointers are all destroyed properly, I ran Valgrind on the program and also all test units for individual classes run with Valgrind as part of the regular testing procedures.Lentigo
@dylan-kirkby, your suggestion is good, but in an MPI context the forking of is notoriously inefficient as it messes up the hundreds of channels for inter-process communication among all MPI ranks.Lentigo
S
2

Heaps are implemented using mmap on linux, and you would need to use your own heap, which you can dispose and munmap completely.

The munmap would free the space required.

Look at code in boost : pool for an implementation which would allow you to manage the underlying heaps independently.

In my experience, it is very difficult to manage std containers with custom allocators, as they are class derived, rather than instance derived.

Saurel answered 31/12, 2015 at 12:16 Comment(0)
S
0

So, after the setting the context of this question: is there a way to manually reduce the heap size in C++ to truly release memory not being used?

That's operating system dependent, but most probably not possible.

Most operating systems leave you with memory allocations you've done from a single process until that process is completely done and killed.

Servomotor answered 31/12, 2015 at 1:8 Comment(0)
K
0

Could shared memory solve your problem (even if you do not want to share this memory)? You can allocate a block of shared memory in your "load" phase and unattach it after "myRankVector" is calculated.

(see shmget, shmat, shmdt, shmctl( ..., IPC_RMID, . ) )

Kristin answered 31/12, 2015 at 16:21 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.