I run the code in parallel in the following fashion:
grouped_data = Parallel(n_jobs=14)(delayed(function)(group) for group in grouped_data)
After the computation is done I can see all the spawned processes are still active and memory consuming in a system monitor:
And all these processes are not killed till the main process is terminated what leads to memory leak. If I do the same with multiprocessing.Pool in the following way:
pool = Pool(14)
pool.map(apply_wrapper, np.array_split(groups, 14))
pool.close()
pool.join()
Then I see that all the spawned processed are terminated in the end and no memory is leaked. However, I need joblib and it's loky backend since it allows to serialize some local functions.
How can I forcefully kill processes spawned by joblib.Parallel and release memory? My environment is the following: Python 3.8, Ubuntu Linux.