I'm using the (awesome) mrjob library from Yelp to run my python programs in Amazon's Elastic Map Reduce. It depends on subprocess in the standard python library. From my mac running python2.7.2, everything works as expected
However, when I switched to using the exact same code on Ubuntu LTS 11.04 also with python2.7.2, I encountered something strange:
mrjob loads the job, and then attempts to communicate with its child processes using subprocess and generates this error:
File "/usr/local/lib/python2.7/dist-packages/mrjob-0.3.1-py2.7.egg/mrjob/emr.py", line 1212, in _build_steps steps = self._get_steps() File "/usr/local/lib/python2.7/dist-packages/mrjob-0.3.1-py2.7.egg/mrjob/runner.py", line 1003, in _get_steps stdout, stderr = steps_proc.communicate() File "/usr/lib/python2.7/subprocess.py", line 754, in communicate return self._communicate(input) File "/usr/lib/python2.7/subprocess.py", line 1302, in _communicate stdout, stderr = self._communicate_with_poll(input) File "/usr/lib/python2.7/subprocess.py", line 1332, in _communicate_with_poll poller = select.poll() AttributeError: 'module' object has no attribute 'poll'
This appears to be a problem with subprocess and not mrjob.
I dug into /usr/lib/python2.7/subprocess.py and found that during import it runs:
if mswindows: ... snip ... else: import select _has_poll = hasattr(select, 'poll')
By editing that, I verified that it really does set _has_poll==True. And this is correct; easily verified on the command line.
However, when execution progresses to using Popen._communicate_with_poll somehow the select module has changed! This is generated by printing dir(select) right before it attempts to use select.poll().
['EPOLLERR', 'EPOLLET', 'EPOLLHUP', 'EPOLLIN', 'EPOLLMSG', 'EPOLLONESHOT', 'EPOLLOUT', 'EPOLLPRI', 'EPOLLRDBAND', 'EPOLLRDNORM', 'EPOLLWRBAND', 'EPOLLWRNORM', 'PIPE_BUF', 'POLLERR', 'POLLHUP', 'POLLIN', 'POLLMSG', 'POLLNVAL', 'POLLOUT', 'POLLPRI', 'POLLRDBAND', 'POLLRDNORM', 'POLLWRBAND', 'POLLWRNORM', '__doc__', '__name__', '__package__', 'error', 'select']
no attribute called 'poll'!?!? How did it go away?
So, I hardcoded _has_poll=False and then mrjob happily continues with its work, runs my job in AWS EMR, with subprocess using communicate_with_select... and I'm stuck with a hand-modified standard library...
Any advice? :-)
poll()
which has gone butepoll()
too, which should also be there on an Ubuntu system (from Python 2.6 onwards). Also, since the select module is in C and the existence ofpoll()
is determined at compile time then I think something must be executingdel select.poll
or similar (though I can't possibly imagine why). Seems outlandish, but you could possibly grep for that just in case? – Instructions