I'm looking into ETL tools (like Talend) and investigating whether Apache Nifi could be used. Could Nifi be used to perform the following:
- Pick up two CSV files that are placed on local disk
- Join the CSVs on a common column
- Write the joined CSV to disk
I've tried setting up a job in Nifi, but couldn't see how to perform the join of two separate CSV files. Is this task possible in Apache Nifi?
It looks like the QueryDNS processor could be used to perform enrichment of one CSV file using the other, but that seems to be over-complicated for this use case.
Here's an example of the input CSVs, which need to be joined on state_id:
Input files
customers.csv
id | name | address | state_id
---|------|--------------|---------
1 | John | 10 Blue Lane | 100
2 | Bob | 15 Green St. | 200
states.csv
state_id | state
---------|---------
100 | Alabama
200 | New York
Output file
output.csv
id | name | address | state
---|------|--------------|---------
1 | John | 10 Blue Lane | Alabama
2 | Bob | 15 Green St. | New York