What is the difference between HUE, YARN and OOZIE
Asked Answered
P

1

6

I understand the concepts of HDFS and Map Reduce and how it is important to move the processing logic to the data to increase efficiency. I was even able to run a couple of map reduce job on my basic Hadoop cluster. Surrounding these concepts there are a lot of different technologies like YARN, HUE, OOZIE all of which seems to do the same thing (at least from a very high level) which is operation visibility and CRUD abilities for jobs (which can be map-reduce or something else).

Am I correct in making this assumption or is there a much more fundamental difference between them?

Thanks Kay

Potts answered 21/1, 2016 at 21:22 Comment(1)
+1 for starting with I understand how it is important to move the processing logic to the data to increase efficiencyGas
T
7

YARN - Map Reduce is API where you have to implement data processing logic in it. Once the code is compiled you have to submit the jobs using hadoop jar command. YARN is the framework which will keep track of the resources, submit the job on the cluster, execute the job, show/log the progress.

OOZIE - Take a data integration example. You might have to get a data set from one database and other data set from other database, then you want to join, process the data and reload it into a cache or 3rd database. It involves 2 sqoop jobs to pull data from database, a hive/map reduce job to join and process the data, then push into cache/database. All these jobs are dependent on each other, eg: we are supposed to process the data only after data is pulled from source databases. Hence we need to create a workflow to execute complete data integration process. OOZIE can facilitate that. It is map reduce based workflow tool. Workflow it self will be executed as one or more map reduce jobs.

HUE: There are many tools in Hadoop - HDFS (file system), Sqoop, Hive/pig to process the data, Impala, HBase and many many more. To execute the POCs, it can get tedious to connect to the cluster. Also it need some linux skills. To overcome those challenges all the Hadoop eco system tools are consolidate under one umbrella - called Hue.

Tuna answered 21/1, 2016 at 23:30 Comment(1)
Thanks for your explanation. I see that OOZIE workflows are per-dominantly submitted through command line interface. OOZIE UI doesn't seem to offer creation/submission of workflow capabilities. HUE on the other hand seems to have a much slicker interface and allows us to create and submit OOZIE workflows. Is my understanding correct? What gives us more operational visibility into the system (in terms of what jobs/workflows are running, which have failed, who is hogging resources etc) OOZIE or HUEPotts

© 2022 - 2024 — McMap. All rights reserved.