Difference between PIG local and mapreduce mode
Asked Answered
W

2

8

What is the actual difference between running PIG scripts locally and on mapreduce? I understand mapreduce mode is when you run it on a cluster that has hdfs installed. Does this mean local mode does not need HDFS and so even mapreduce jobs don't get triggered? What is the difference and when do you the other?

Wavawave answered 26/7, 2012 at 12:33 Comment(0)
I
9

Local mode will build a simulated mapreduce job running off of a local file on disk. In theory equivalent to MapReduce, but it's not a "real" mr job. You shouldn't be able to tell the difference from a user perspective.

Local mode is great for development.

Individualize answered 26/7, 2012 at 16:30 Comment(1)
One thing to note is that there is no support for counters in local mode, but that is due to Hadoop Map/Reduce rather than Pig.Orms
P
5

Local mode: All scripts are run on a single machine without requiring Hadoop MapReduce and HDFS. This can be useful for developing and testing Pig logic. If you’re using a small set of data to developer or test your code, then local mode could be faster than going through the MapReduce infrastructure.

Local mode doesn’t require Hadoop. When you run in Local mode, the Pig program runs in the context of a local Java Virtual Machine, and data access is via the local file system of a single machine. Local mode is actually a local simulation of MapReduce in Hadoop’s LocalJobRunner class.

MapReduce mode (also known as Hadoop mode): Pig is executed on the Hadoop cluster. In this case, the Pig Script gets converted into a series of MapReduce jobs that are then run on the Hadoop cluster. LOcal and Distributed mode of pig

If you have a terabyte of data that you want to perform operations on and you want to interactively develop a program, you may soon find things slowing down considerably, and you may start growing your storage. Local mode allows you to work with a subset of your data in a more interactive manner so that you can figure out the logic (and work out the bugs) of your Pig program.

After you have things set up as you want them and your operations are running smoothly, you can then run the script against the full data set using MapReduce mode.

Principium answered 23/3, 2015 at 20:55 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.