multiprocessing -> pathos.multiprocessing and windows
Asked Answered
M

1

8

I'm currently using the standard multiprocessing in python to generate a bunch of processes that will run indefinitely. I'm not particularly concerned with performance; each thread is simply watching for a different change on the filesystem, and will take the appropriate action when a file is modified.

Currently, I have a solution that works, for my needs, in Linux. I have a dictionary of functions and arguments that looks like:

 job_dict['func1'] = {'target': func1, 'args': (args,)}

For each, I create a process:

 import multiprocessing
 for k in job_dict.keys():
     jobs[k] = multiprocessing.Process(target=job_dict[k]['target'],
                                       args=job_dict[k]['args'])

With this, I can keep track of each one that is running, and, if necessary, restart a job that crashes for any reason.

This does not work in Windows. Many of the functions I'm using are wrappers, using various functools functions, and I get messages about not being able to serialize the functions (see What can multiprocessing and dill do together?). I have not figured out why I do not get this error in Linux, but do in Windows.

If I import dill before starting my processes in Windows, I do not get the serialization error. However, the processes do not actually do anything. I cannot figure out why.

I then switched to the multiprocessing implementation in pathos, but did not find an analog to the simple Process class within the standard multiprocessing module. I was able to generate threads for each job using pathos.pools.ThreadPool. This is not the intended use for map, I'm sure, but it started all the threads, and they ran in Windows:

import pathos
tp = pathos.pools.ThreadPool()
for k in job_dict.keys():
    tp.uimap(job_dict[k]['target'], job_dict[k]['args'])

However, now I'm not sure how to monitor whether a thread is still active, which I'm looking for so that I can restart threads that crash for some reason or another. Any suggestions?

Manton answered 30/7, 2015 at 20:2 Comment(0)
G
8

I'm the pathos and dill author. The Process class is buried deep within pathos at pathos.helpers.mp.process.Process, where mp itself is the actual fork of the multiprocessing library. Everything in multiprocessing should be accessible from there.

Another thing to know about pathos is that it keeps the pool alive for you until you remove it from the held state. This helps reduce overhead in creating "new" pools. To remove a pool, you do:

>>> # create
>>> p = pathos.pools.ProcessPool()
>>> # remove
>>> p.clear()

There's no such mechanism for a Process however.

For multiprocessing, windows is different than Linux and Macintosh… because windows doesn't have a proper fork like on linux… linux can share objects across processes, while on windows there is no sharing… it's basically a fully independent new process created… and therefore the serialization has to be better for the object to pass across to the other process -- just as if you would send the object to another computer. On, linux, you'd have to do this to get the same behavior:

def check(obj, *args, **kwds):
    """check pickling of an object across another process"""
    import subprocess
    fail = True
    try:
        _x = dill.dumps(x, *args, **kwds)
        fail = False
    finally:
        if fail:
            print "DUMP FAILED"
    msg = "python -c import dill; print dill.loads(%s)" % repr(_x)
    print "SUCCESS" if not subprocess.call(msg.split(None,2)) else "LOAD FAILED"
Gracchus answered 30/7, 2015 at 20:29 Comment(3)
Thank you. I had read you describe pathos as a fork of the multiprocessing library, and had looked for it in pathos, but had not noticed it tucked inside helpers. I also appreciate the explanation of why multiprocessing behaves differently in windows and linux. I replaced my mutliprocessing.Process calls with ones from pathos, but got the same behavior as when I imported dill before the call with the standard multiprocessing. I'm going to keep playing with it, but may also re-evaluate the way I'm approaching this problem more generally.Manton
the reason you got the same behavior when using multiprocessing and first importing dill, is that dill can override python pickle, but multiprocessing uses C-python pickle… so it has to be forked to use dill.Gracchus
…and maybe I have the fork buried too deeply. However, it is available as a standalone package called multiprocess. I'll have to think about whether it makes sense to bubble it up in pathos a little bit more or not.Gracchus

© 2022 - 2024 — McMap. All rights reserved.