In my experience Airflow is the best data pipeline right now. It's best suited for managing complex, long running workflows. UI and modularity are over the top.
Airflow
- + Python Code for DAGs
- + Has connectors for every major service/cloud provider
- + More versatile
- + Advanced metrics
- + Better UI and API
- + Capable of creating extremely complex workflows
- + Jinja Templating
- + Can be used as an Orchestrator for the Tensorflow Extended ecosystem
- = Can be parallelized
- = Native Connections to HDFS, HIVE, PIG etc..
- = Graph as DAG
Oozie
- --- Java or XML for DAGs
- - hard to build complex pipelines
- - smaller, less active community
- - worse WEB GUI
- - Java API
- = Can be parallelized
- = Native Connections to HDFS, HIVE, PIG etc..
- = Graph as DAG
As you see, Airflow is an easier to use (especially in large heteregenoeus team), more versatile and powerful option than Oozie.
As I said: go with Airflow.
Article you may find interesting