Tez execution engine vs Mapreduce Execution Engine in Hive

Asked 13/1, 2017 at 9:13 Answered 7/1, 2020 at 14:4

Solved amazon-web-services hive mapreduce tez bigdata

What is the difference between Tez engine and Map Reduce engine in Hive and in which process which engine is better to use (for eg:joins, aggregation?)

Minier answered 13/1, 2017 at 9:13 Comment(0)

Tez is a DAG (Directed acyclic graph) architecture. A typical Map reduce job has following steps:

Read data from file -->one disk access
Run mappers
Write map output --> second disk access
Run shuffle and sort --> read map output, third disk access
write shuffle and sort --> write sorted data for reducers --> fourth disk access
Run reducers which reads sorted data --> fifth disk output
Write reducers output -->sixth disk access

Tez works very similar to Spark (Tez was created by Hortonworks well before Spark):

Execute the plan but no need to read data from disk.
Once ready to do some calculations (similar to actions in spark), get the data from disk and perform all steps and produce output.

Only one read and one write.

The efficiency is increased by not going to disk multiple times. Intermediate results are stored in memory (not written to disks)

Metapsychology answered 7/1, 2020 at 14:4 Comment(1)

That said, imagine I'm going to run a Hive query that involves a join: if I use MR engine, then the steps you mentioned are going to be doubled while the pipeline remains only one when using Tez? – Fiscal 9/7, 2022 at 20:33

Tez is a DAG-based system, it's aware of all opération in such a way that it optimizes these operations before starting execution.

MapReduce model simply states that any computation can be performed by two kinds of computation steps – a map step and a reduce step. One pair of map and reduce does one level of aggregation over the data. Complex computations typically require multiple such steps.

Tez is usually running under MaprReduce, so it's just a MapReduce optimized with less and compacted steps.

Paapanen answered 13/1, 2017 at 13:34 Comment(0)

Apache Tez is plug-in compatible with MapReduce but reduces the amount of disk access. Tez is always better than MapReduce.

However, there are also systems better than Hive + Tez, such as Spark SQL.

Berkeley answered 13/1, 2017 at 9:34 Comment(1)

So for Joins and aggregation , which is superior? I was using Mapreduce and then shifted to Tez, but found that Tez was taking more time compared to mapreduce in case of multiple joins. – Minier 13/1, 2017 at 10:20

Recommended topics

Hot tags