Joblib Parallel doesn't terminate processes
Asked Answered
I

3

11

I run the code in parallel in the following fashion:

grouped_data = Parallel(n_jobs=14)(delayed(function)(group) for group in grouped_data)

After the computation is done I can see all the spawned processes are still active and memory consuming in a system monitor:

enter image description here

And all these processes are not killed till the main process is terminated what leads to memory leak. If I do the same with multiprocessing.Pool in the following way:

pool = Pool(14)
pool.map(apply_wrapper, np.array_split(groups, 14))
pool.close()
pool.join()

Then I see that all the spawned processed are terminated in the end and no memory is leaked. However, I need joblib and it's loky backend since it allows to serialize some local functions.

How can I forcefully kill processes spawned by joblib.Parallel and release memory? My environment is the following: Python 3.8, Ubuntu Linux.

Israelite answered 11/5, 2021 at 22:50 Comment(6)
Аs a temporary solution I've explicitly iterated all over subprocesses of main process and kill them. It is not good solution, since in the general case I can kill processes unrelated to joblib.Parallel. Also, loky output some scary logs to stdout and I haven't found a way to supress it.Israelite
Any updates on this issue? I've the same problem.Synergy
@PetrasPurlys yes, I've investigated a bit deepy joblib code and it seems that keeping the processes alive is intentional solution for loky backend - they represent something like implicit pool and await for another invokation. You can adhere my temporary solution (see above) and enhance it with tracking of spawned processes PIDs. However I think the correct way is to ensure that everything is cleaned up within each process before completion. In that way it won't consume memory after completion.Israelite
Also as I get it loky allows to share some memory among all the processes and main process. Thus you can easily do memory leak, for example by opening pyplot figure using global pyplot variable in the process and not closing the figure after it. I think the same stuff can be done with database connection, http session and so on.Israelite
Same problem here. Can you share some code for track and kill the process spawned by joblib? I've found this question, but not seems do anything github.com/joblib/joblib/issues/945Crofoot
@Crofoot I've written an answer below summarizing everything. Hope it helpsIsraelite
I
12

What I can wrap-up after invesigating this myself:

  1. joblib.Parallel is not obliged to terminate processes after successfull single invocation
  2. Loky backend doesn't terminate workers physically and it is intentinal design explained by authors: Loky Code Line
  3. If you want explicitly release workers you can use my snippet:
    import psutil
    current_process = psutil.Process()
    subproc_before = set([p.pid for p in current_process.children(recursive=True)])
    grouped_data = Parallel(n_jobs=14)(delayed(function)(group) for group in grouped_data)
    subproc_after = set([p.pid for p in current_process.children(recursive=True)])
    for subproc in subproc_after - subproc_before:
        print('Killing process with pid {}'.format(subproc))
        psutil.Process(subproc).terminate()
  1. The code above is not thread/process save. If you have another source of spawning subprocesses you should block it's execution.
  2. Everything is valid for joblib version 1.0.1
Israelite answered 6/6, 2021 at 15:15 Comment(1)
I tried various other workaround and unfortunately nothing else works - Loky is indeed as badly designed as described. Also note that processes might be created lazily so don't call subproc_after until you have completed all the work.Luncheonette
S
3

So, taking into account point 2 of Иван Судос's answer, would it be judicious to create a new class wrapped around the class LokyBackend and which overloads the terminate() function? e.g.,

class MyLokyWrapper(LokyBackend):

    def terminate(self):
        if self._workers is not None:            
            self._workers.terminate(kill_workers=False)
            #if kill_workers, joblib terminates "brutally" the remaining workers 
            #and their descendants using SIGKILL
            self._workers = None

        self.reset_batch_stats()
Stairway answered 5/9, 2022 at 5:5 Comment(0)
D
3

With Joblib and the Loki backend you could do this:

from joblib.externals.loky import get_reusable_executor
get_reusable_executor().shutdown(wait=True)

Or you could wait 5 minutes and the pool shuts down by itself.

Detrital answered 18/9, 2023 at 21:30 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.