Apache NIFI for ETL
Asked Answered
H

4

5

How effective is to use Apache NIFI for the ETL process having source as HDFS & destination as Oracle DB. What are the limitations of Apache NIFI compared other ETL tools such as Pentaho,Datastage,etc..

Hirza answered 19/8, 2017 at 20:11 Comment(2)
how to evaluate effectiveness? does it cover the requirements? oracle and hdfs connectors - yes. is it extendible? - yes - scripting with groovy/js/python or building custom processor. list all the limitations? - as for me, to get strict answer list the features that you are expecting.. IHMO: i'll choose nifi as a tool for ETL-like tasks.Contemporize
This is a pretty broad question that is difficult to provide an objective answer to. You can read about use cases for NiFi on their blog.Damsel
S
7

Main advantage of NiFi

The main advantages of NiFi:

  1. Intuitive gui, which allows for easy inspection of the data
  2. Strong delivery guarantees
  3. Low latency, you can support both batch and streaming usecases
  4. It can handle any format, not only limited to SQL tables, but can also move log files etc.
  5. Schema aware, and can share schema with solutions like Kafka, Flink, Spark

Main limitation of NiFi

NiFi is really a tool for moving data around, you can do enrichments of individual records but it is typically mentioned to do 'EtL' with a small t. A typical thing that you would not want to do in NiFi is joining two dynamic data sources.

For joining tables, tools like Spark, Hive, or classical ETL alternatives are often used.

For joining streams, tools like Flink and Spark Streaming are often used.

Conclusion

NiFi is a great tool, you just need to make sure you use it for the right usecase. Where needed you can use other tools to complement it.


Extra strong full disclosure: I am an employee of Cloudera, the company that supports NiFi and other projects such as Spark and Flink. I have used other ETL tools before, but not to the same extent as NiFi.

Swagger answered 11/8, 2020 at 19:47 Comment(0)
S
0

Not sure about sqoop, I can explain the benifits of using Apache Nifi. In your case the data in HDFS could be of any format(Unstructured), Nifi has a capability to process and bring it to format of your choice so that you can directly save it to any RDBMS. Nifi handles back-pressure in vary effective way to have lossless transmission.

Sheela answered 22/2, 2019 at 6:26 Comment(0)
H
0

One of the critical features that NiFi provides that our competitors generally don't is the ability to stop jobs and examine the flow and downstream systems while it's running. For you, this means you can test the flow against a test HDFS folder and a test Oracle DB, let some data go through, pause the flow and poke around Oracle to make sure it's to your liking after a matter of seconds or minutes instead of waiting for a "job to complete." It makes the process extremely agile.

Hussy answered 3/2, 2020 at 14:43 Comment(0)
F
0

Actually Nifi is very good tool. You can easily manipulate processors. In short time you can migrate huge data.

But for destinations such as RDBMS, there are always problems. I used to have a lot of problems about "non-killing" threads, you have to be very careful about stopping processes and the configuration of processors. Some processors like QueryDatabasetable consumes huge memory and the server goes down.

Flatfooted answered 11/3, 2020 at 15:56 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.