How can python subprocess.Popen see select.poll and then later not? (select 'module' object has no attribute 'poll')
Asked Answered
R

2

11

I'm using the (awesome) mrjob library from Yelp to run my python programs in Amazon's Elastic Map Reduce. It depends on subprocess in the standard python library. From my mac running python2.7.2, everything works as expected

However, when I switched to using the exact same code on Ubuntu LTS 11.04 also with python2.7.2, I encountered something strange:

mrjob loads the job, and then attempts to communicate with its child processes using subprocess and generates this error:

      File "/usr/local/lib/python2.7/dist-packages/mrjob-0.3.1-py2.7.egg/mrjob/emr.py", line 1212, in _build_steps
        steps = self._get_steps()
      File "/usr/local/lib/python2.7/dist-packages/mrjob-0.3.1-py2.7.egg/mrjob/runner.py", line 1003, in _get_steps
        stdout, stderr = steps_proc.communicate()
      File "/usr/lib/python2.7/subprocess.py", line 754, in communicate
        return self._communicate(input)
      File "/usr/lib/python2.7/subprocess.py", line 1302, in _communicate
        stdout, stderr = self._communicate_with_poll(input)
      File "/usr/lib/python2.7/subprocess.py", line 1332, in _communicate_with_poll
        poller = select.poll()
    AttributeError: 'module' object has no attribute 'poll'

This appears to be a problem with subprocess and not mrjob.

I dug into /usr/lib/python2.7/subprocess.py and found that during import it runs:

    if mswindows:
        ... snip ...
    else:
        import select
        _has_poll = hasattr(select, 'poll')

By editing that, I verified that it really does set _has_poll==True. And this is correct; easily verified on the command line.

However, when execution progresses to using Popen._communicate_with_poll somehow the select module has changed! This is generated by printing dir(select) right before it attempts to use select.poll().

    ['EPOLLERR', 'EPOLLET', 'EPOLLHUP', 'EPOLLIN', 'EPOLLMSG', 
    'EPOLLONESHOT', 'EPOLLOUT', 'EPOLLPRI', 'EPOLLRDBAND', 
    'EPOLLRDNORM', 'EPOLLWRBAND', 'EPOLLWRNORM', 'PIPE_BUF', 
    'POLLERR', 'POLLHUP', 'POLLIN', 'POLLMSG', 'POLLNVAL', 
    'POLLOUT', 'POLLPRI', 'POLLRDBAND', 'POLLRDNORM',
    'POLLWRBAND', 'POLLWRNORM', '__doc__', '__name__', 
    '__package__', 'error', 'select']

no attribute called 'poll'!?!? How did it go away?

So, I hardcoded _has_poll=False and then mrjob happily continues with its work, runs my job in AWS EMR, with subprocess using communicate_with_select... and I'm stuck with a hand-modified standard library...

Any advice? :-)

Rus answered 31/1, 2012 at 21:53 Comment(4)
I would try to put some more trace and try to find where exactly the select module is losing the attribute poll - which by the way seems extremely shady. You sure there are no other versions of Python installed? Can you check the exact directory this select module is coming from?Taw
That really is strange - it's not just poll() which has gone but epoll() too, which should also be there on an Ubuntu system (from Python 2.6 onwards). Also, since the select module is in C and the existence of poll() is determined at compile time then I think something must be executing del select.poll or similar (though I can't possibly imagine why). Seems outlandish, but you could possibly grep for that just in case?Instructions
make sure it is not The name shadowing trapSummers
I recently ran into a similar problem and googling came up with this question, and also this bug report: github.com/gevent/gevent/issues/446 I'm not an experienced a python programmer so I don't fully understand what is being said there, but it looks like the gevents module patches poll out of select.Anther
E
5

I had a similar problem and it turns out that gevent replaces the built-in select module with gevent.select.select which doesn't have a poll method (as it is a blocking method). However for some reason by default gevent doesn't patch subprocess which uses select.poll.

An easy fix is to replace subprocess with gevent.subprocess:

import gevent.monkey
gevent.monkey.patch_all(subprocess=True)

import sys
import gevent.subprocess
sys.modules['subprocess'] = gevent.subprocess

If you do this before importing the mrjob library, it should work fine.

Encrata answered 10/9, 2014 at 20:24 Comment(0)
H
2

Sorry for writing a full answer instead of a comment, otherwise I'd lose code indentation.

I cannot help you directly since something seems very strictly tied to your code, but I can help you find out, by relying on the fact that Python modules can be arbitrary object, try something like that:

class FakeModule(dict):
    def __init__(self, origmodule):
        self._origmodule = origmodule
    self.__all__ = dir(origmodule)

    def __getattr__(self, attr):
    return getattr(self._origmodule, attr)


    def __delattr__(self, attr):
        if attr == "poll":
            raise RuntimeError, "Trying to delete poll!"
        self._origmodule.__delattr__(attr)


def replaceSelect():
    import sys
    import select
    fakeselect = FakeModule(select)

    sys.modules["select"] = fakeselect

replaceSelect()

import select
del select.poll

and you'll get an output like:

Traceback (most recent call last):
  File "domy.py", line 27, in <module>
    del select.poll
  File "domy.py", line 14, in __delattr__
    raise RuntimeError, "Trying to delete poll!"
RuntimeError: Trying to delete poll!

By calling replaceSelect() in your code you should be able to get a traceback of where somebody is deleting poll(), so you can understand why.

I hope my FakeModule implementation is good enough, otherwise you might need to modify it.

Hopfinger answered 25/1, 2013 at 18:25 Comment(1)
Reading this felt like a good day at Hoghwarts, thanks a lot!!Burny

© 2022 - 2024 — McMap. All rights reserved.