Multiprocessing Large Objects Using Pathos in Python
Asked Answered
R

0

2

I am trying to make use of my computer's multiple CPUs. However, the BeautifulSoup object returned by my function as part of an SQLAlchemy object is not picklable with pickle or cPickle so I am using pathos, a fork of the multiprocssing package that uses dill such that it can pickle any python object. I tested dill on the object that I could not pickle and it worked, so I thought my problem would be solved. However, when I use pathos' pool.map I have the same problem that I did before, mainly that the function completes but the result is not returned. I confirmed this by using results = pool.amap(myfunc, myarglist) which completes, but results.get() which does not. Unfortunately, I cannot post the html for the page (it is not publicly available), and I have been unable to find a reproducible example of the problem. This answer includes a function for troubleshooting multiprocessing of large objects, but unfortunately it uses Queue which does not seem to be implemented for pathos by itself (only presumably under the hood within the pool.map function). I am using the 0.2a1.dev version of pathos (with dependencies installed with pip prior to compiling from source) on python 2.7. Here is the traceback for the keyboard interrupt:

Process PoolWorker-2:
Process PoolWorker-7:
Traceback (most recent call last):
Process PoolWorker-8:Process PoolWorker-6:Process PoolWorker-3:Process PoolWorker-5:Process PoolWorker-4:Traceback (most recent call last):

  File "/usr/local/lib/python2.7/dist-packages/processing-0.52_pathos-py2.7-linux-x86_64.egg/processing/process.py", line 227, in _bootstrap



Traceback (most recent call last):
Traceback (most recent call last):
Traceback (most recent call last):
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/processing-0.52_pathos-py2.7-linux-x86_64.egg/processing/process.py", line 227, in _bootstrap
  File "/usr/local/lib/python2.7/dist-packages/processing-0.52_pathos-py2.7-linux-x86_64.egg/processing/process.py", line 227, in _bootstrap
  File "/usr/local/lib/python2.7/dist-packages/processing-0.52_pathos-py2.7-linux-x86_64.egg/processing/process.py", line 227, in _bootstrap
  File "/usr/local/lib/python2.7/dist-packages/processing-0.52_pathos-py2.7-linux-x86_64.egg/processing/process.py", line 227, in _bootstrap
  File "/usr/local/lib/python2.7/dist-packages/processing-0.52_pathos-py2.7-linux-x86_64.egg/processing/process.py", line 227, in _bootstrap
    self.run()
  File "/usr/local/lib/python2.7/dist-packages/processing-0.52_pathos-py2.7-linux-x86_64.egg/processing/process.py", line 85, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python2.7/dist-packages/processing-0.52_pathos-py2.7-linux-x86_64.egg/processing/pool.py", line 59, in worker
    self.run()
    self.run()
    self.run()
  File "/usr/local/lib/python2.7/dist-packages/processing-0.52_pathos-py2.7-linux-x86_64.egg/processing/process.py", line 85, in run
  File "/usr/local/lib/python2.7/dist-packages/processing-0.52_pathos-py2.7-linux-x86_64.egg/processing/process.py", line 85, in run
  File "/usr/local/lib/python2.7/dist-packages/processing-0.52_pathos-py2.7-linux-x86_64.egg/processing/process.py", line 85, in run
    self._target(*self._args, **self._kwargs)
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python2.7/dist-packages/processing-0.52_pathos-py2.7-linux-x86_64.egg/processing/pool.py", line 54, in worker
  File "/usr/local/lib/python2.7/dist-packages/processing-0.52_pathos-py2.7-linux-x86_64.egg/processing/pool.py", line 54, in worker
    self._target(*self._args, **self._kwargs)
    self.run()
    self.run()
  File "/usr/local/lib/python2.7/dist-packages/processing-0.52_pathos-py2.7-linux-x86_64.egg/processing/pool.py", line 54, in worker
  File "/usr/local/lib/python2.7/dist-packages/processing-0.52_pathos-py2.7-linux-x86_64.egg/processing/process.py", line 85, in run
  File "/usr/local/lib/python2.7/dist-packages/processing-0.52_pathos-py2.7-linux-x86_64.egg/processing/process.py", line 85, in run
    put((job, i, result))
  File "/usr/local/lib/python2.7/dist-packages/processing-0.52_pathos-py2.7-linux-x86_64.egg/processing/queue.py", line 339, in put
    self._target(*self._args, **self._kwargs)
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python2.7/dist-packages/processing-0.52_pathos-py2.7-linux-x86_64.egg/processing/pool.py", line 54, in worker
  File "/usr/local/lib/python2.7/dist-packages/processing-0.52_pathos-py2.7-linux-x86_64.egg/processing/pool.py", line 54, in worker
    for job, i, func, args, kwds in iter(inqueue.get, None):
    for job, i, func, args, kwds in iter(inqueue.get, None):
  File "/usr/local/lib/python2.7/dist-packages/processing-0.52_pathos-py2.7-linux-x86_64.egg/processing/queue.py", line 325, in get
  File "/usr/local/lib/python2.7/dist-packages/processing-0.52_pathos-py2.7-linux-x86_64.egg/processing/queue.py", line 325, in get
    wacquire()
KeyboardInterrupt
    for job, i, func, args, kwds in iter(inqueue.get, None):
  File "/usr/local/lib/python2.7/dist-packages/processing-0.52_pathos-py2.7-linux-x86_64.egg/processing/queue.py", line 325, in get
    racquire()
    racquire()
    for job, i, func, args, kwds in iter(inqueue.get, None):
    for job, i, func, args, kwds in iter(inqueue.get, None):
KeyboardInterrupt
KeyboardInterrupt
  File "/usr/local/lib/python2.7/dist-packages/processing-0.52_pathos-py2.7-linux-x86_64.egg/processing/queue.py", line 325, in get
  File "/usr/local/lib/python2.7/dist-packages/processing-0.52_pathos-py2.7-linux-x86_64.egg/processing/queue.py", line 325, in get
    racquire()
KeyboardInterrupt
    racquire()
    racquire()
KeyboardInterrupt
KeyboardInterrupt

Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/processing-0.52_pathos-py2.7-linux-x86_64.egg/processing/process.py", line 227, in _bootstrap
    self.run()
  File "/usr/local/lib/python2.7/dist-packages/processing-0.52_pathos-py2.7-linux-x86_64.egg/processing/process.py", line 85, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python2.7/dist-packages/processing-0.52_pathos-py2.7-linux-x86_64.egg/processing/pool.py", line 54, in worker
    for job, i, func, args, kwds in iter(inqueue.get, None):
  File "/usr/local/lib/python2.7/dist-packages/processing-0.52_pathos-py2.7-linux-x86_64.egg/processing/queue.py", line 327, in get
    return recv()
KeyboardInterrupt
Raynold answered 7/7, 2014 at 20:50 Comment(9)
I'd suggest you update to the most recent pathos from github, I'm unsure if that will help you or not. Also are you using pathos.multiprocessing.Pool or ProcessingPool? Pool uses dill instead of pickle, but doesn't have the rest of the augments that ProcessingPool has. If your function is calling Queue as you have indicated elsewhere, you may be out of luck. You could possibly use shared memory in multiprocessing with ctypes. I don't know, hard to say without seeing your code. There is an option in dill that provides compression, but it's turned off at the moment...Nilgai
0.2a1.dev is not the most recent version? I installed from github source this morning. My function does not call queue, only multiprocessing. I was using pathos.multiprocessing.ProcessingPool which is used in the pathos documentation. Doesn't that not use dill? At any rate I just tried pathos.multiprocessing.Pool and got the same result.Raynold
Whoops. I didn't see your version info in the question. Sorry. Yes, that uses dill, both do. You are using it as intended, it seems. Sorry for my confusion. Looks the size of the pickle causes an issue, as your trace says. dill and pathos have some compression options that I could try, given an example. There's also shared memory as I mentioned.Nilgai
Where is that indicated in the trace? Other than shared memory, is there a workaround within the pathos multiprocessing package to pickle large objects differently? The biggest issue for me is that because the script simply hangs I cannot figure out how to catch this as an error so my script crashes.Raynold
I just tested dill on one of the object that crashes and it turns out it does not work. When I call dill.dumps(myobject) it hangs.Raynold
I have compression turned off in dill, but it is exposed in another package. It's hard to tell if it's compression or size, or what without seeing a sample. Would it be possible to post or send the code?Nilgai
I still have never seen dill just hang on a dump. There are several methods to try in dill.detect that give you information on what is happening. If you still can't post a reduced example of your code for whatever reason, you could at least try some of the dill.detect methods, and maybe find some clue to what the error is.Nilgai
I fixed it by getting rid of the attributes of the object that could not be pickled using cPickle and just using the main multiprocessing package. Sorry I cannot help you reproduce the error to debug the package, but if you've never seen an issue it cannot be affecting that many people. https://mcmap.net/q/1021304/-beautifulsoup-object-will-not-pickle-causes-interpreter-to-silently-crashRaynold
If you got rid of the unpicklable attributes of your object ahead of time, then there is no need for dill or pathos.multiprocessingcPickle would simply work. Sorry I couldn't be of more help.Nilgai

© 2022 - 2024 — McMap. All rights reserved.