Hadoop WordCount example stuck at map 100% reduce 0%
Asked Answered
H

10

12
[hadoop-1.0.2] → hadoop jar hadoop-examples-1.0.2.jar wordcount /user/abhinav/input     /user/abhinav/output
Warning: $HADOOP_HOME is deprecated.

****hdfs://localhost:54310/user/abhinav/input
12/04/15 15:52:31 INFO input.FileInputFormat: Total input paths to process : 1
12/04/15 15:52:31 WARN util.NativeCodeLoader: Unable to load native-hadoop library for     your platform... using builtin-java classes where applicable
12/04/15 15:52:31 WARN snappy.LoadSnappy: Snappy native library not loaded
12/04/15 15:52:31 INFO mapred.JobClient: Running job: job_201204151241_0010
12/04/15 15:52:32 INFO mapred.JobClient:  map 0% reduce 0%
12/04/15 15:52:46 INFO mapred.JobClient:  map 100% reduce 0%

I've set up hadoop on a single node using this guide (http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/#run-the-mapreduce-job) and I'm trying to run a provided example but I'm getting stuck at map 100% reduce 0%. What could be causing this?

Hokanson answered 15/4, 2012 at 19:58 Comment(2)
if you followed michael tutorial, i think you installed hadoop in /usr/local/hadoop. In this directory only, find the tasktracker.log and other log files. Check if any error are their and post it hereBackspace
this could help #32511780Floss
L
14

First of all, open up your job tracker and look at the number of free reducer slots and other running jobs - is there another job running which is consuming all the free reducer slots when then become available.

Once you've proved to yourself that there are some free reducer slots available to run a reducer for you job, locate your job in the job tracker web ui and click on it to open it up. You should now be able to see the number of completed mappers - ensure this reads that you have no running mappers. The % complete in the console sometimes lies and you could have a mapper which is in the process of committing saying it's 100%, but having a problem finalizing.

Once you're satisfied that all your mappers have finished, look at the number of running reducers - does this show 0? If not does it show that some are running - click on the number of running reducers to bring up the running reducers page, now click through on an instance until you get an option to view the logs for the reducer. You'll want to view all the logs for this reducer (not the first / last 100k). This should tell you what your reducer is actually doing - most probably trying to copy the results from the mappers to the reducer node. I imagine this is where your problem is, one of network or disk space, but eitherway, eventually hadoop should fail the reducer instance out and reschedule it to run on another node.

Lozada answered 15/4, 2012 at 23:24 Comment(2)
Great tips. I followed this directions and figured out that my problem was Windows Firewall not letting the file be transferred from the nodes where the mappers had run to the nodes where the reducer was trying to run (even though it was all the same physical machine).Masoretic
Thanks for briefing your cause. I had the same 100% Map/0% Reduce problem and found it was because I was on a SecureVPN connection.Hypercatalectic
S
4

There could be many reasons causing this issue, the most plausible one would be that you have a bug in your mapper (exception, infinite loop, ...)

To debug:

  • Log onto localhost:50030, you should see a list of your jobs. Locate your job that failed (your ID is job_201204151241_0010), and look at the trace (don't forget to click on "All" or else you won't see the full log).
  • Look at your logs disk, they should be under /usr/lib/hadoop/logs or something similar (you'll have to refer to your configurations to find out), and grep for error messages cat /path/to/logs/*.log | grep ERROR and see if this returns something.

If nothing comes out, I advise you to put logging messages in your mapper to debug manually at each step what happens (assuming this runs in pseudo-distirbuted mode).

Let me know how that goes.

EDIT: As Chris noted, the reducer is at 0% so the problem lies actually before the reduce step.

Scotty answered 15/4, 2012 at 22:5 Comment(1)
if the reducer is at 0%, the reducer has yet to enter the reduce phase (it's still waiting to copy data over), so this is not a bug in the the user's reducer implementationLozada
W
4

I also encountered this issue on a host running Suse 11. As Chris notes above, the issue is with the mapper. To solve the issue, I edited the /etc/hosts file and removed the ip address of the host. For example in /etc/hosts

Ip.address.of.your.host      hostname

Change to

127.0.0.1                    hostname

Once I made the change above, and restarted, i was able to run the wordcount program

Withstand answered 10/9, 2012 at 12:23 Comment(0)
N
2

I'm seeing the same issue running a pseudocluster on a Mac 10.7.4. It happens when I wake up from sleep mode. It looks like the mapper IP address has been redefined on wake-up:

syslog:2012-09-14 16:52:06,542 WARN org.apache.hadoop.mapred.ReduceTask: attempt_201209141640_0003_r_000000_0 copy failed: attempt_201209141640_0003_m_000000_0 from 172.19.131.144
syslog:2012-09-14 16:52:06,546 INFO org.apache.hadoop.mapred.ReduceTask: Task attempt_201209141640_0003_r_000000_0: Failed fetch #1 from attempt_201209141640_0003_m_000000_0

So, after waking-up from sleep mode, restarting hadoop via stop-all.sh and start-all.sh fixes this issue for me.

Ninepins answered 15/9, 2012 at 0:15 Comment(0)
A
1

I may have found another reason for "map 100% reduce 0%" issue.

My map task generates a huge amount of records and I'm running hadoop in a pseudo-clustered environment.

I inspected the map task log and it appears that the time between map 100% and the beginning of reduce is being spent to merge intermediate segments.

2013-07-27 03:09:55,302 INFO org.apache.hadoop.mapred.Merger: Merging 10 intermediate segments out of a total of 334
2013-07-27 03:10:15,166 INFO org.apache.hadoop.mapred.Merger: Merging 10 intermediate segments out of a total of 325
2013-07-27 03:10:35,603 INFO org.apache.hadoop.mapred.Merger: Merging 10 intermediate segments out of a total of 316
...
2013-07-27 03:26:18,738 INFO org.apache.hadoop.mapred.Merger: Merging 10 intermediate segments out of a total of 28
2013-07-27 03:29:50,458 INFO org.apache.hadoop.mapred.Merger: Merging 10 intermediate segments out of a total of 19
2013-07-27 03:33:48,368 INFO org.apache.hadoop.mapred.Merger: Down to the last merge-pass, with 10 segments left of total size: 4424592099 bytes

This procedure can take much time depending on the size and number of the segments and on the read/write speed of the disk.

Aside from the log, you can tell this is happening by checking for the disk usage of the machine, which will be constanty high, since lots of data is being merged in new files. I can even notice that the segments are removed after being merged, because disk usage fluctuates, increasing during the merge and decreasing during the deletion.

Agamogenesis answered 27/7, 2013 at 17:29 Comment(0)
M
1

I also encountered this issue.But i do edit file etc/hosts like this:

Ip.address.of.your.host      hostname   

Just add one line below the above one as follow

127.0.1.1     hostname

You should notice that it is 127.0.1.1 (rather than 127.0.0.1),or you will encounter problem after that like "Connect to host some_hostname port 22: Connection timed out"

Mammary answered 28/12, 2013 at 13:30 Comment(0)
E
1

if you'r using Linux and you'r using single node hadoop: go to directory /etc/hosts change your ip addresses with this format

your-ip-address master your-ip-address slave

go to /hadoop/conf directory open masters and type localhost in this (remove all of other addresses!) open slaves and set localhost in this (remove all of other addresses!)

now, run again your program, it should be work correctly.

Exploration answered 22/4, 2014 at 16:49 Comment(0)
A
0

I was having a similar issue (not same). My tasks got stuck at 100% Map and 16% Reduce. I faced this problem for quite a few hours (for different programs: grep, wordcount etc-) until I bumped into this thread and looked at Chris's answer - which basically suggests a good way to debug or pinpoint the issue that one is facing. (Apparently I don't have the reputation to vote up his answer, hence this post).

After looking at the web UI for job tracker and navigating to the exact task thread's log file ( I didn't know this log existed ), I found that my JobTracker was unable to resolve the hostname of a datanode. I added the (ip, hostname) pair to my hosts file and the task that got stuck came back alive and finished successfully.

Anhydride answered 18/5, 2013 at 18:17 Comment(0)
S
0

I faced the similar issue, the issue was there is no room for the reducer task. So I had free up the memory. The best thing is too look at the jobtracker logs: 50030/logs/hadoop-hadoop-jobtracker-localhost.localdomain.log. Log message: "WARN org.apache.hadoop.mapred.JobInProgress: No room for reduce task. Node tracker_localhost.localdomain:localhost.localdomain/127.0.0.1:57829 has 778543104 bytes free; but we expect reduce input to take 1160706716"

Still answered 19/6, 2013 at 21:52 Comment(0)
A
0

I use hadoop2.7.3 and followed the tutorial of this site https://www.tutorialspoint.com/hadoop/hadoop_enviornment_setup.htm, but the version is different. Some other toturials didn't configure yarn-site.xml and mapred-site.xml, so I deleted the properties in these two files, then it worked! I'm a newbie so don't konw the details of inside, maybe the version I'm using don't need to change default configuration?

Agley answered 28/5 at 7:33 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.