Consider an MPI application based on two steps which we shall call load and globalReduce. Just for simplicity the software is being described as such yet there is a lot more going on, so it is not purely a Map/Reduce problem.
During the load step, all ranks in each given node are queued so that one and only one rank has full access to all memory of the node. The reason for this design arises from the fact that during the load stage, there is a set of large IO blocks being read, and they all need to be loaded in memory before a local reduction can take place. We shall call the result of this local reduction a named variable myRankVector. Once the myRankVector variable is obtained, the IO blocks are released. The variable myRankVector itself uses little memory, so while during its creation the node can be using all the memory, after completion the rank only needs to use 2-3 GB to hold myRankVector.
During the globalReduce stage in the node, it is expected all ranks in the node had loaded their corresponding globalReduce.
So here is my problem, while I have ensured that there are absolutely not any memory leaks (I program using shared pointers, I double checked with Valgrind, etc.), I am positive that the heap remains expanded even after all the destructors have released the IO blocks. When the next rank in the queue comes to do its job, it starts asking for lots of memory just as the previous rank did and of course the program gets the Linux kill yielding "Out of memory: Kill process xxx (xxxxxxxx) score xxxx or sacrifice child". It is clear why this is the case, the second rank in the queue wants to use all the memory yet the first rank remains with a large heap.
So, after the setting the context of this question: is there a way to manually reduce the heap size in C++ to truly release memory not being used?
Thanks.