Python Hadoop streaming on windows, Script not a valid Win32 application
Asked Answered
F

2

6

I have a problem to execute mapreduce python files on Hadoop by using Hadoop streaming.jar.

I use: Windows 10 64bit Python 3.6 and my IDE is spyder 3.2.6, Hadoop 2.3.0 jdk1.8.0_161

I can get answer while my maperducec code is written on java language, but my problem is when I want to mingle python libraries such as tensorflow or other useful machine learning libs on my data.

Installing hadoop 2.3.0: 1. hadoop-env export JAVA_HOME=C:\Java\jdk1.8.0_161 2. I created data -> dfs in hadoop folder

  1. For environment User Variable

    Hadoop_Home = D:\hadoop Java_Home = C:\Java\jdk1.8.0_161 M2_HOME = C:\apache-maven-3.5.2\apache-maven-3.5.2-bin\Maven-3.5.2 Platform = x64

System Varibales: Edit Path as:

D:\hadoop\bin
C:\java\jdk1.8.0_161\bin
C:\ProgramData\Anaconda3

My MapReduce Python code: D:\digit\wordcount-mapper.py

#!/usr/bin/python3
import sys
for line in sys.stdin:    
    line = line.strip()    # remove leading and trailing whitespace
    words = line.split()   # split the line into words
    for word in words:   
        print( '%s\t%s' % (word, 1))

D:\digit\wordcount-reducer.py

#!/usr/bin/python3
from operator import itemgetter
import sys
current_word = None
current_count = 0
word = None
for line in sys.stdin:    
    line = line.strip()   
    word, count = line.split('\t', 1)  
    try:    
        count = int(count)   
    except ValueError:
        continue       
    if current_word == word:    
        current_count += count
    else:
        if current_word:
            print( '%s\t%s' % (current_word, current_count))
        current_count = count
        current_word = word
if current_word == word:    
    print( '%s\t%s' % (current_word, current_count))

When I run my command prompt as administrator:

D:\hadoop\bin> hadoop namenode -format
D:\hadoop\sbin>start-dfs.cmd
D:\hadoop\sbin>start-yarn.cmd

I checked : localhost:8088/ and http://localhost:50070 all is ok.

Then when I enter:

D:\hadoop\sbin>hadoop fs -mkdir -p /input
D:\hadoop\sbin>hadoop fs -copyFromLocal D:\digit\mahsa.txt /input
D:\hadoop\sbin>D:\hadoop\bin\hadoop jar D:\hadoop\share\hadoop\tools\lib\hadoop-streaming-2.3.0.jar -file D:\digit\wordcount-mapper.py -mapper D:\digit\wordcount-mapper.py -file D:\digit\wordcount-reducer.py -reducer D:\digit\wordcount-reducer.py -input /input/mahsa.txt/ -output /output/

I have this error:

18/02/21 21:49:24 WARN streaming.StreamJob: -file option is deprecated, please use generic option -files instead.
packageJobJar: [D:\digit\wordcount-mapper.py, D:\digit\wordcount-reducer.py, /D:/tmp/hadoop-Mahsa/hadoop-unjar7054071292684552905/] [] C:\Users\Mahsa\AppData\Local\Temp\streamjob2327207111481875361.jar tmpDir=null
18/02/21 21:49:25 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
18/02/21 21:49:25 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
18/02/21 21:49:28 INFO mapred.FileInputFormat: Total input paths to process : 1
18/02/21 21:49:28 INFO mapreduce.JobSubmitter: number of splits:2
18/02/21 21:49:29 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1519235874088_0003
18/02/21 21:49:29 INFO impl.YarnClientImpl: Submitted application application_1519235874088_0003
18/02/21 21:49:29 INFO mapreduce.Job: The url to track the job: http://Mahsa:8088/proxy/application_1519235874088_0003/
18/02/21 21:49:29 INFO mapreduce.Job: Running job: job_1519235874088_0003
18/02/21 21:49:42 INFO mapreduce.Job: Job job_1519235874088_0003 running in uber mode : false
18/02/21 21:49:42 INFO mapreduce.Job:  map 0% reduce 0%
18/02/21 21:49:52 INFO mapreduce.Job: Task Id : attempt_1519235874088_0003_m_000001_0, Status : FAILED
Error: java.lang.RuntimeException: Error in configuring object
        at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
        at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
        at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
        at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:426)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
Caused by: java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
        ... 9 more
Caused by: java.lang.RuntimeException: Error in configuring object
        at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
        at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
        at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
        at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:38)
        ... 14 more
Caused by: java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
        ... 17 more
Caused by: java.lang.RuntimeException: configuration exception
        at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:222)
        at org.apache.hadoop.streaming.PipeMapper.configure(PipeMapper.java:66)
        ... 22 more
Caused by: java.io.IOException: Cannot run program "D:\tmp\hadoop-Mahsa\nm-local-dir\usercache\Mahsa\appcache\application_1519235874088_0003\container_1519235874088_0003_01_000003\.\wordcount-mapper.py": CreateProcess error=193, %1 is not a valid Win32 application
        at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
        at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:209)
        ... 23 more
Caused by: java.io.IOException: CreateProcess error=193, %1 is not a valid Win32 application
        at java.lang.ProcessImpl.create(Native Method)
        at java.lang.ProcessImpl.<init>(ProcessImpl.java:386)
        at java.lang.ProcessImpl.start(ProcessImpl.java:137)
        at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
        ... 24 more

18/02/21 21:49:52 INFO mapreduce.Job: Task Id : attempt_1519235874088_0003_m_000000_0, Status : FAILED
Error: java.lang.RuntimeException: Error in configuring object
        at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
        at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
        at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
        at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:426)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
Caused by: java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
        ... 9 more
Caused by: java.lang.RuntimeException: Error in configuring object
        at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
        at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
        at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
        at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:38)
        ... 14 more
Caused by: java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
        ... 17 more
Caused by: java.lang.RuntimeException: configuration exception
        at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:222)
        at org.apache.hadoop.streaming.PipeMapper.configure(PipeMapper.java:66)
        ... 22 more
Caused by: java.io.IOException: Cannot run program "D:\tmp\hadoop-Mahsa\nm-local-dir\usercache\Mahsa\appcache\application_1519235874088_0003\container_1519235874088_0003_01_000002\.\wordcount-mapper.py": CreateProcess error=193, %1 is not a valid Win32 application
        at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
        at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:209)
        ... 23 more
Caused by: java.io.IOException: CreateProcess error=193, %1 is not a valid Win32 application
        at java.lang.ProcessImpl.create(Native Method)
        at java.lang.ProcessImpl.<init>(ProcessImpl.java:386)
        at java.lang.ProcessImpl.start(ProcessImpl.java:137)
        at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
        ... 24 more

18/02/21 21:50:02 INFO mapreduce.Job: Task Id : attempt_1519235874088_0003_m_000001_1, Status : FAILED
Error: java.lang.RuntimeException: Error in configuring object
        at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
        at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
        at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
        at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:426)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
Caused by: java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
        ... 9 more
Caused by: java.lang.RuntimeException: Error in configuring object
        at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
        at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
        at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
        at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:38)
        ... 14 more
Caused by: java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
        ... 17 more
Caused by: java.lang.RuntimeException: configuration exception
        at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:222)
        at org.apache.hadoop.streaming.PipeMapper.configure(PipeMapper.java:66)
        ... 22 more
Caused by: java.io.IOException: Cannot run program "D:\tmp\hadoop-Mahsa\nm-local-dir\usercache\Mahsa\appcache\application_1519235874088_0003\container_1519235874088_0003_01_000004\.\wordcount-mapper.py": CreateProcess error=193, %1 is not a valid Win32 application
        at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
        at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:209)
        ... 23 more
Caused by: java.io.IOException: CreateProcess error=193, %1 is not a valid Win32 application
        at java.lang.ProcessImpl.create(Native Method)
        at java.lang.ProcessImpl.<init>(ProcessImpl.java:386)
        at java.lang.ProcessImpl.start(ProcessImpl.java:137)
        at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
        ... 24 more

18/02/21 21:50:03 INFO mapreduce.Job: Task Id : attempt_1519235874088_0003_m_000000_1, Status : FAILED
Error: java.lang.RuntimeException: Error in configuring object
        at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
        at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
        at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
        at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:426)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
Caused by: java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
        ... 9 more
Caused by: java.lang.RuntimeException: Error in configuring object
        at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
        at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
        at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
        at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:38)
        ... 14 more
Caused by: java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
        ... 17 more
Caused by: java.lang.RuntimeException: configuration exception
        at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:222)
        at org.apache.hadoop.streaming.PipeMapper.configure(PipeMapper.java:66)
        ... 22 more
Caused by: java.io.IOException: Cannot run program "D:\tmp\hadoop-Mahsa\nm-local-dir\usercache\Mahsa\appcache\application_1519235874088_0003\container_1519235874088_0003_01_000005\.\wordcount-mapper.py": CreateProcess error=193, %1 is not a valid Win32 application
        at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
        at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:209)
        ... 23 more
Caused by: java.io.IOException: CreateProcess error=193, %1 is not a valid Win32 application
        at java.lang.ProcessImpl.create(Native Method)
        at java.lang.ProcessImpl.<init>(ProcessImpl.java:386)
        at java.lang.ProcessImpl.start(ProcessImpl.java:137)
        at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
        ... 24 more

18/02/21 21:50:13 INFO mapreduce.Job: Task Id : attempt_1519235874088_0003_m_000001_2, Status : FAILED
Error: java.lang.RuntimeException: Error in configuring object
        at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
        at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
        at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
        at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:426)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
Caused by: java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
        ... 9 more
Caused by: java.lang.RuntimeException: Error in configuring object
        at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
        at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
        at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
        at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:38)
        ... 14 more
Caused by: java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
        ... 17 more
Caused by: java.lang.RuntimeException: configuration exception
        at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:222)
        at org.apache.hadoop.streaming.PipeMapper.configure(PipeMapper.java:66)
        ... 22 more
Caused by: java.io.IOException: Cannot run program "D:\tmp\hadoop-Mahsa\nm-local-dir\usercache\Mahsa\appcache\application_1519235874088_0003\container_1519235874088_0003_01_000007\.\wordcount-mapper.py": CreateProcess error=193, %1 is not a valid Win32 application
        at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
        at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:209)
        ... 23 more
Caused by: java.io.IOException: CreateProcess error=193, %1 is not a valid Win32 application
        at java.lang.ProcessImpl.create(Native Method)
        at java.lang.ProcessImpl.<init>(ProcessImpl.java:386)
        at java.lang.ProcessImpl.start(ProcessImpl.java:137)
        at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
        ... 24 more

18/02/21 21:50:14 INFO mapreduce.Job: Task Id : attempt_1519235874088_0003_m_000000_2, Status : FAILED
Error: java.lang.RuntimeException: Error in configuring object
        at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
        at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
        at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
        at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:426)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
Caused by: java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
        ... 9 more
Caused by: java.lang.RuntimeException: Error in configuring object
        at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
        at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
        at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
        at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:38)
        ... 14 more
Caused by: java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
        ... 17 more
Caused by: java.lang.RuntimeException: configuration exception
        at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:222)
        at org.apache.hadoop.streaming.PipeMapper.configure(PipeMapper.java:66)
        ... 22 more
Caused by: java.io.IOException: Cannot run program "D:\tmp\hadoop-Mahsa\nm-local-dir\usercache\Mahsa\appcache\application_1519235874088_0003\container_1519235874088_0003_01_000008\.\wordcount-mapper.py": CreateProcess error=193, %1 is not a valid Win32 application
        at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
        at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:209)
        ... 23 more
Caused by: java.io.IOException: CreateProcess error=193, %1 is not a valid Win32 application
        at java.lang.ProcessImpl.create(Native Method)
        at java.lang.ProcessImpl.<init>(ProcessImpl.java:386)
        at java.lang.ProcessImpl.start(ProcessImpl.java:137)
        at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
        ... 24 more

18/02/21 21:50:24 INFO mapreduce.Job:  map 100% reduce 100%
18/02/21 21:50:34 INFO mapreduce.Job: Job job_1519235874088_0003 failed with state FAILED due to: Task failed task_1519235874088_0003_m_000001
Job failed as tasks failed. failedMaps:1 failedReduces:0

18/02/21 21:50:34 INFO mapreduce.Job: Counters: 13
        Job Counters
                Failed map tasks=7
                Killed map tasks=1
                Launched map tasks=8
                Other local map tasks=6
                Data-local map tasks=2
                Total time spent by all maps in occupied slots (ms)=66573
                Total time spent by all reduces in occupied slots (ms)=0
                Total time spent by all map tasks (ms)=66573
                Total vcore-seconds taken by all map tasks=66573
                Total megabyte-seconds taken by all map tasks=68170752
        Map-Reduce Framework
                CPU time spent (ms)=0
                Physical memory (bytes) snapshot=0
                Virtual memory (bytes) snapshot=0
18/02/21 21:50:34 ERROR streaming.StreamJob: Job not Successful!
Streaming Command Failed!

I really donot know what is problem it tool my time a lot. Thank you in advanced for your help or any idea?

Fragrant answered 21/2, 2018 at 22:2 Comment(10)
You are running an old version of Hadoop. You might want to try upgrading to at least 2.7. Not sure if it'll help, thoughMelessa
I'm pretty sure shebang only works in UNIX/Linux so your first line of "D:\digit\wordcount-reducer.py" being "#!/usr/bin/python3" will not invoke python.Baler
Thank you @cricket_007 , I gonna test it on Hadoop version 2.7.2 and 2.8.2, I will share my result here. But another question is that: Can I use same configuration for each version?Fragrant
Thank you @Baler for answer, yes my problem is to call python. Indeed I run ( import sys, print(sys.executable) ) on python in order to achieve "#!/usr/bin/python3". So the problem is invoke python from windows cmd via Hadoop.Fragrant
Based on #16217765, I would try: -mapper "python mapper.py" -reducer "python reduce.py".Baler
I think you are right @Baler . In Linux, for example, the python script itself needs to be an executable. In windows, the python executable needs specifically called.Melessa
You @Baler meant that I should remove "#!/usr/bin/python3" and apply python direct on my command? But when I do it I got another error which says there are undefined two words (python and mapper file), I try to test it via upper version of Hadoop or finally I have to execute it on Linux.Fragrant
Since it's Windows, maybe it should be "python3.exe mapper.py" or whatever you use to run mapper.py from your command prompt.Baler
Thank you @Baler I removed "#!/usr/bin/python3" and add just "python mapper.py" on cmd windows.Fragrant
Thank you @cricket_007 I used upper version 2.7.2 and I got answer, I used same conf xml.Fragrant
F
8

Solution:

  1. I used Hadoop version 2.7.2 with almost same configuration for *.xml.
  2. I removed #!/usr/bin/python3 from top of my python code.

I changed my command as:

D:\hadoop\bin\hadoop jar
D:\hadoop\share\hadoop\tools\lib\hadoop-streaming-2.7.2.jar
-file /in/wordcount-mapper.py -mapper "python wordcount-mapper.py"
-file /in/wordcount-reducer.py -reducer "python wordcount-reducer.py"
-input /in/mahsa.txt -output /output

Therefore I could get result.

hadoop fs -cat /output/part-00000
Fragrant answered 24/2, 2018 at 18:56 Comment(1)
wondering where you put py files cause /in/* sounds bizarre due to previous explanations. :thinking:Neils
I
0

Sticking to Python 3.X itself, you can do -

hadoop jar C:/hadoop/share/hadoop/tools/lib/hadoop-streaming-3.3.6.jar -input /word_count_in_python/data.txt -output /word_count_in_python/output -mapper "python C:/hadoop-Documents/mapper123.py" -reducer "python C:/hadoop-Documents/reducer123.py"

just add the word python within your mapper and reducer attributes

Isolate answered 10/9, 2023 at 18:10 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.