How effective is to use Apache NIFI for the ETL process having source as HDFS & destination as Oracle DB. What are the limitations of Apache NIFI compared other ETL tools such as Pentaho,Datastage,etc..
Main advantage of NiFi
The main advantages of NiFi:
- Intuitive gui, which allows for easy inspection of the data
- Strong delivery guarantees
- Low latency, you can support both batch and streaming usecases
- It can handle any format, not only limited to SQL tables, but can also move log files etc.
- Schema aware, and can share schema with solutions like Kafka, Flink, Spark
Main limitation of NiFi
NiFi is really a tool for moving data around, you can do enrichments of individual records but it is typically mentioned to do 'EtL' with a small t. A typical thing that you would not want to do in NiFi is joining two dynamic data sources.
For joining tables, tools like Spark, Hive, or classical ETL alternatives are often used.
For joining streams, tools like Flink and Spark Streaming are often used.
Conclusion
NiFi is a great tool, you just need to make sure you use it for the right usecase. Where needed you can use other tools to complement it.
Extra strong full disclosure: I am an employee of Cloudera, the company that supports NiFi and other projects such as Spark and Flink. I have used other ETL tools before, but not to the same extent as NiFi.
Not sure about sqoop, I can explain the benifits of using Apache Nifi. In your case the data in HDFS could be of any format(Unstructured), Nifi has a capability to process and bring it to format of your choice so that you can directly save it to any RDBMS. Nifi handles back-pressure in vary effective way to have lossless transmission.
One of the critical features that NiFi provides that our competitors generally don't is the ability to stop jobs and examine the flow and downstream systems while it's running. For you, this means you can test the flow against a test HDFS folder and a test Oracle DB, let some data go through, pause the flow and poke around Oracle to make sure it's to your liking after a matter of seconds or minutes instead of waiting for a "job to complete." It makes the process extremely agile.
Actually Nifi is very good tool. You can easily manipulate processors. In short time you can migrate huge data.
But for destinations such as RDBMS, there are always problems. I used to have a lot of problems about "non-killing" threads, you have to be very careful about stopping processes and the configuration of processors. Some processors like QueryDatabasetable consumes huge memory and the server goes down.
© 2022 - 2024 — McMap. All rights reserved.
yes
. is it extendible? -yes
- scripting with groovy/js/python or building custom processor. list all the limitations? - as for me, to get strict answer list the features that you are expecting.. IHMO: i'll choose nifi as a tool for ETL-like tasks. – Contemporize