Running a job using hadoop streaming and mrjob: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
Asked Answered
H

5

13

Hey I'm fairly new to the world of Big Data. I came across this tutorial on http://musicmachinery.com/2011/09/04/how-to-process-a-million-songs-in-20-minutes/

It describes in detail of how to run MapReduce job using mrjob both locally and on Elastic Map Reduce.

Well I'm trying to run this on my own Hadoop cluser. I ran the job using the following command.

python density.py tiny.dat -r hadoop --hadoop-bin /usr/bin/hadoop > outputmusic

And this is what I get:

HADOOP: Running job: job_1369345811890_0245
HADOOP: Job job_1369345811890_0245 running in uber mode : false
HADOOP:  map 0% reduce 0%
HADOOP: Task Id : attempt_1369345811890_0245_m_000000_0, Status : FAILED
HADOOP: Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
HADOOP:         at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:320)
HADOOP:         at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:533)
HADOOP:         at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
HADOOP:         at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
HADOOP:         at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
HADOOP:         at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:428)
HADOOP:         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
HADOOP:         at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:157)
HADOOP:         at java.security.AccessController.doPrivileged(Native Method)
HADOOP:         at javax.security.auth.Subject.doAs(Subject.java:415)
HADOOP:         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
HADOOP:         at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:152)
HADOOP:
HADOOP: Task Id : attempt_1369345811890_0245_m_000001_0, Status : FAILED
HADOOP: Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
HADOOP:         at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:320)
HADOOP:         at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:533)
HADOOP:         at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
HADOOP:         at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
HADOOP:         at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
HADOOP:         at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:428)
HADOOP:         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
HADOOP:         at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:157)
HADOOP:         at java.security.AccessController.doPrivileged(Native Method)
HADOOP:         at javax.security.auth.Subject.doAs(Subject.java:415)
HADOOP:         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
HADOOP:         at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:152)
HADOOP:
HADOOP: Task Id : attempt_1369345811890_0245_m_000000_1, Status : FAILED
HADOOP: Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
HADOOP:         at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:320)
HADOOP:         at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:533)
HADOOP:         at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
HADOOP:         at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
HADOOP:         at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
HADOOP:         at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:428)
HADOOP:         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
HADOOP:         at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:157)
HADOOP:         at java.security.AccessController.doPrivileged(Native Method)
HADOOP:         at javax.security.auth.Subject.doAs(Subject.java:415)
HADOOP:         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
HADOOP:         at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:152)
HADOOP:
HADOOP: Container killed by the ApplicationMaster.
HADOOP:
HADOOP:
HADOOP: Task Id : attempt_1369345811890_0245_m_000001_1, Status : FAILED
HADOOP: Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
HADOOP:         at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:320)
HADOOP:         at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:533)
HADOOP:         at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
HADOOP:         at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
HADOOP:         at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
HADOOP:         at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:428)
HADOOP:         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
HADOOP:         at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:157)
HADOOP:         at java.security.AccessController.doPrivileged(Native Method)
HADOOP:         at javax.security.auth.Subject.doAs(Subject.java:415)
HADOOP:         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
HADOOP:         at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:152)
HADOOP:
HADOOP: Task Id : attempt_1369345811890_0245_m_000000_2, Status : FAILED
HADOOP: Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
HADOOP:         at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:320)
HADOOP:         at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:533)
HADOOP:         at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
HADOOP:         at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
HADOOP:         at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
HADOOP:         at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:428)
HADOOP:         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
HADOOP:         at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:157)
HADOOP:         at java.security.AccessController.doPrivileged(Native Method)
HADOOP:         at javax.security.auth.Subject.doAs(Subject.java:415)
HADOOP:         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
HADOOP:         at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:152)
HADOOP:
HADOOP: Task Id : attempt_1369345811890_0245_m_000001_2, Status : FAILED
HADOOP: Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
HADOOP:         at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:320)
HADOOP:         at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:533)
HADOOP:         at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
HADOOP:         at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
HADOOP:         at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
HADOOP:         at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:428)
HADOOP:         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
HADOOP:         at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:157)
HADOOP:         at java.security.AccessController.doPrivileged(Native Method)
HADOOP:         at javax.security.auth.Subject.doAs(Subject.java:415)
HADOOP:         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
HADOOP:         at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:152)
HADOOP:
HADOOP:  map 100% reduce 0%
HADOOP: Job job_1369345811890_0245 failed with state FAILED due to: Task failed task_1369345811890_0245_m_000001
HADOOP: Job failed as tasks failed. failedMaps:1 failedReduces:0
HADOOP:
HADOOP: Counters: 6
HADOOP:         Job Counters
HADOOP:                 Failed map tasks=7
HADOOP:                 Launched map tasks=8
HADOOP:                 Other local map tasks=6
HADOOP:                 Data-local map tasks=2
HADOOP:                 Total time spent by all maps in occupied slots (ms)=32379
HADOOP:                 Total time spent by all reduces in occupied slots (ms)=0
HADOOP: Job not Successful!
HADOOP: Streaming Command Failed!
STDOUT: packageJobJar: [] [/usr/lib/hadoop-mapreduce/hadoop-streaming-2.0.0-cdh4.2.1.jar] /tmp/streamjob3272348678857116023.jar tmpDir=null
Traceback (most recent call last):
  File "density.py", line 34, in <module>
    MRDensity.run()
  File "/usr/lib/python2.6/site-packages/mrjob-0.2.4-py2.6.egg/mrjob/job.py", line 344, in run
    mr_job.run_job()
  File "/usr/lib/python2.6/site-packages/mrjob-0.2.4-py2.6.egg/mrjob/job.py", line 381, in run_job
    runner.run()
  File "/usr/lib/python2.6/site-packages/mrjob-0.2.4-py2.6.egg/mrjob/runner.py", line 316, in run
    self._run()
  File "/usr/lib/python2.6/site-packages/mrjob-0.2.4-py2.6.egg/mrjob/hadoop.py", line 175, in _run
    self._run_job_in_hadoop()
  File "/usr/lib/python2.6/site-packages/mrjob-0.2.4-py2.6.egg/mrjob/hadoop.py", line 325, in _run_job_in_hadoop
    raise CalledProcessError(step_proc.returncode, streaming_args)
subprocess.CalledProcessError: Command '['/usr/bin/hadoop', 'jar', '/usr/lib/hadoop-0.20-mapreduce/contrib/streaming/hadoop-streaming-2.0.0-mr1-cdh4.2.1.jar', '-cmdenv', 'PYTHONPATH=mrjob.tar.gz', '-input', 'hdfs:///user/E824259/tmp/mrjob/density.E824259.20130611.053850.343441/input', '-output', 'hdfs:///user/E824259/tmp/mrjob/density.E824259.20130611.053850.343441/output', '-cacheFile', 'hdfs:///user/E824259/tmp/mrjob/density.E824259.20130611.053850.343441/files/density.py#density.py', '-cacheArchive', 'hdfs:///user/E824259/tmp/mrjob/density.E824259.20130611.053850.343441/files/mrjob.tar.gz#mrjob.tar.gz', '-mapper', 'python density.py --step-num=0 --mapper --protocol json --output-protocol json --input-protocol raw_value', '-jobconf', 'mapred.reduce.tasks=0']' returned non-zero exit status 1

Note: As suggested in some other forums I've included

#! /usr/bin/python

at the beginning of both my python files density.py and track.py. It seems to have worked for most people but I still continue getting the above exceprions.

Edit: I included the definition of one of the functions being used in the original density.py which was definied in another file track.py in density.py itself. The job ran succesfully. But it would really be helpful if someone knows why this is happening.

Hsining answered 11/6, 2013 at 5:50 Comment(0)
T
20

Error code 1 is a generic error for Hadoop Streaming. You can get this error code for two main reasons:

  • Your Mapper and Reducer scripts are not executable (include the #!/usr/bin/python at the beginning of the script).

  • Your Python program is simply written wrong - you could have a syntax error or logical bug.

Unfortunately, error code 1 does not give you any details to see exactly what is wrong with your Python program.

I was stuck with error code 1 for a while myself, and the way I figured it out was to simply run my Mapper script as a standalone python program: python mapper.py

After doing this, I got a regular Python error that told me I was simply giving a function the wrong type of argument. I fixed my syntax error, and everything worked after that. So if possible, I'd run your Mapper or Reducer script as a standalone Python program to see if that gives you any insight on the reasoning for your error.

Tremolo answered 23/7, 2014 at 14:28 Comment(2)
I am also getting error while running python script on cluster hadoop streaming, can you look my question #29791937 I stuck because of error. Can you suggest whats going wrongCambria
For my case, I could not attach external variables like --mapper "python exec.py argv"... It seems I cannot pass external variables...Pensile
D
5

I got the same error, sub-process failed with code 1

[cloudera@quickstart ~]$ hadoop jar /usr/lib/hadoop-mapreduce/hadoop-streaming.jar -input /user/cloudera/input -output /user/cloudera/output_join -mapper /home/cloudera/join1_mapper.py -reducer /home/cloudera/join1_reducer.py
  1. This is primarily because of a hadoop unable to access your input files, or may be you have something in your input which is more than required, or something missing. So, be very very careful with the input directory and files you have in them. I would say, only place exactly required input files in the input directory for the assignment and remove rest of them.

  2. Also make sure your mapper and reducer files are executable. chmod +x mapper.py and chmod +x reducer.py

  3. Run the mapper of reducer python file using cat using only mapper: cat join2_gen*.txt | ./mapper.py | sort using reducer: cat join2_gen*.txt | ./mapper.py | sort | ./reducer.py The reason for running them using cat is because If your input files have any error you can remove them before you run on Hadoop cluster. Sometimes map/reduce jobs cannot find the python errors!!

Dwyer answered 27/11, 2016 at 20:39 Comment(0)
H
1

I faced the same problem when running , my mapper and reducer scripts were not executable.

Adding #! /usr/bin/python at the top of my files fixed the issue.

Homely answered 13/3, 2017 at 7:54 Comment(0)
W
0

Another reason, such as you have an error in your shell script to run the mapper.py and reducer.py. Here is my suggestions:

Firstly you should try to run you mapper.py and reducer.py in the local environment.

Next you could try to track your mapreduce job on your url printed in the stdout log, like this:16:01:56 INFO mapreduce.Job: The url to track the job: http://xxxxxx:8088/proxy/application_xxx/" which has detailed error information. Hope this help!

Whereabouts answered 21/11, 2017 at 8:12 Comment(0)
D
0

Although Andy's answer supported me also to remove the error. Just wanted to add one small thing that worked for me.

My system did not recognise python and instead know only python3. Hence, #! /usr/bin/python might not work; BUT #! /usr/bin/python3 might work for you.

Drusilladrusus answered 29/12, 2022 at 7:54 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.