How to profile multiple subprocesses using Python multiprocessing and memory_profiler?

I have a utility that spawns multiple workers using the Python multiprocessing module, and I'd like to be able to track their memory usage via the excellent memory_profiler utility, which does everything I want - particularly sampling memory usage over time and plotting the final result (I'm not concerned with the line-by-line memory profiling for this question).

In order to setup this question, I have created a simpler version of the script, that has a worker function which allocates memory similar to the example given in the memory_profiler library. The worker is as follows:

import time

X6 = 10 ** 6
X7 = 10 ** 7

def worker(num, wait, amt=X6):
    """
    A function that allocates memory over time.
    """
    frame = []

    for idx in range(num):
        frame.extend([1] * amt)
        time.sleep(wait)

    del frame

Given a sequential workload of 4 workers as follows:

if __name__ == '__main__':
    worker(5, 5, X6)
    worker(5, 2, X7)
    worker(5, 5, X6)
    worker(5, 2, X7)

Running the mprof executable to profile my script takes 70 seconds having each worker run one after the other. The script, run as follows:

$ mprof run python myscript.py

Produces the following memory usage graph:

Having these workers go in parallel with multiprocessing means that the script will finish as slow as the slowest worker (25 seconds). That script is as follows:

import multiprocessing as mp

if __name__ == '__main__':
    pool    = mp.Pool(processes=4)
    tasks   = [
        pool.apply_async(worker, args) for args in
        [(5, 5, X6), (5, 2, X7), (5, 5, X6), (5, 2, X7)]
    ]

    results = [p.get() for p in tasks]

Memory profiler does indeed work, or at least there are no errors when using mprof but the results are a bit strange:

A quick look at Activity Monitor shows that in fact there are 6 Python processes, one for mprof one for python myscript.py and then one for each worker subprocess. It appears that mprof is only measuring the memory usage for the python myscript.py process.

The memory_profiler library is highly customizable, and I'm pretty confident that I should be able to capture the memory of each process and possibly write them out to separate log files by using the library itself. I'm just not sure where to begin or how to approach that level of customization.

EDIT

After reading through the mprof script I did discover the -C flag which sums up the memory usage of all child (forked) processes. This leads to a (much improved) graph as follows:

But what I'm looking for is the memory usage of each individual subprocess over time so that I can plot all workers (and the master) on the same graph. My idea is to have each subprocess memory_usage written to a different log file, which I can then visualize.

mprof run --help Usage: mprof run [options] Options: --version show program's version number and exit -h, --help show this help message and exit --python Activates extra features when the profiling executable is a Python program (currently: function timestamping.) --nopython Disables extra features when the profiled executable is a Python program (currently: function timestamping.) -T INTERVAL, --interval=INTERVAL Sampling period (in seconds), defaults to 0.1 -C, --include-children Monitors forked processes as well (sum up all process memory) -M, --multiprocess Monitors forked processes creating individual plots for each child

Recommended topics

Hot tags