Differences and Use Cases for Apache NiFi, Apache Airflow, and Apache Falcon? [closed]
Asked Answered
D

2

10

I am trying to understand the differences between Apache NiFi, Apache Airflow, and Apache Falcon in the context of data pipeline management. Here is my use case:

  • Hadoop-based architecture: The data pipeline needs to integrate seamlessly with a Hadoop-based ecosystem.
  • Data movement and transformation: The solution should support robust data movement and transformation capabilities.
  • Scheduling and orchestration: Scheduling and orchestrating complex workflows is essential for my requirements.
  • Ease of use and maintenance: The solution should be relatively easy to use and maintain.

Can someone provide insights into the specific functionalities and use cases where each of these tools excels?

Deerhound answered 12/11, 2018 at 6:31 Comment(2)
too broad. but i would use nifi because i know it )Artair
NiFi would be a good fit for this one, it ticks all the boxes.Kuhn
R
16

Apache NiFi is not a workflow manager in the way the Apache Airflow or Apache Oozie are. It is a data flow tool - it routes and transforms data. It is not intended to schedule jobs but rather allows you to collect data from multiple locations, define discrete steps to process that data and route that data to different destinations.

Apache Falcon is again different in that it allows you to more easily define and manage HDFS datasets. It is effectively data management within a HDFS cluster.

Based on your description, NiFi would be useful addition to your requirements. It would be able to collect your XML file, process in it in some manner, store the data in MySQL, and perform REST calls. It would also be easily configurable for new vendors, and tolerates failures well. It performs most functions in parallel and can be scaled into a clustered NiFi with multiple host machines. It was designed with performance and reliability in mind.

What I am unsure about is the ability to perform image processing. There are some processors (extract image metadata, resize image) but otherwise you would need to develop a new processor in Java - which is relatively easy. Or, if the image processing uses Python or some other scripting language, you can use one of the ExecuteScript processors.

'Scheduling jobs' using NiFi is not recommended.

Full disclosure: I am an Apache NiFi contributor.

Rumba answered 13/11, 2018 at 15:30 Comment(3)
Can I please know the reason why it is not recommended to schedule jobs using NiFi?Raddie
I should clarify - I don't believe dataflows within NiFi itself were generally intended to be scheduled. In other words, NiFi was designed for live data feeds. Using an external tool to trigger/schedule dataflows within NiFi via the API (treating NiFi like a batch data processor) was not an intended use case. Using NiFi to schedule external jobs is entirely possible but I doubt it's easy to fully manage external systems - execute, check for success and handle errors using only NiFi processor/s? I'd wager there are better better tools for that.Rumba
Thanks @Rumba for your detailed explanation. I also have same type of requirement and using NIFI.Sianna
U
5

I am using nifi with an OP's similar use case. Regarding scheduling, I like how nifi works with Kafka, I have some scripts scheduled to run with a crontab frequency, just adding the message into Kafka topics, which topic is listened by nifi, then starts the orchestration for loading, transforming, fetching, indexing, storing, etc, also, you can always handle HttpRequest so you can make kinda "webhook receivers" in order to trigger a process from an external HTTP POST, once again, for simple deployments (these ones you plug and play in a single machine) cronjob nails the task. For image processing, I have an OCR image reader with python connected with an ExecuteScript processor and one facial reckon with opencv with ExecuteCommand processor, the automatic nifi's back-pressure has solved many of the problems I ran by only running the python script and the command by itself.

Uppermost answered 14/11, 2018 at 7:41 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.