How to simply combine flow files in nifi?
Asked Answered
S

1

6

Let's say I have 100 flow files produced by one processor, each of them contains a different line. I want to get a new flow file which contains 100 line. How can I did that?

I have tried MergeContent processor, but it gives me the origin 100 flow files back.

Current config:

enter image description here

Update:

I debugged the output of MergeContent, in the first step JOIN, it seems ok since the data is 576.34 KB which contains 100 line. But the second step ATTRIBUTES_MODIFIED it seems only output 1 line to the final result.

enter image description here

Update:

This is my whole procedure.

  1. Get from kafka one by one.
  2. Convert kafka message to one line string in one flow file.
  3. Merge multiple flow files into one.
  4. PutHDFS.

Now I'm stuck at step 3, I can not merge them one by one. I don't care the order or the attribute, I just need limit the number.

Update:

I have try to set correlation attribute to ${kafka.topic} since all the flow files from the same kafka topic, but they still can not merge:

enter image description here

Seleneselenious answered 26/5, 2019 at 14:26 Comment(8)
Are there something common in those files? Why you aren't using correlation attribute?Disorient
@Disorient They don't have any common things. I just fetch them from different places and I need put them in one file.Seleneselenious
Just limited by number?Disorient
@Disorient Yeah. Just limited by number. I have searched for two days but get no luck...Seleneselenious
@Disorient This is my whole procedure. 1. Get from kafka one by one. 2. Convert kafka message to one line string. 3. Merge multiple flow files into one. 4. PutHDFS. Now I'm stuck at step 3, I can not merge them one by one. I don't care the order or the attribute, I just need limit the number.Seleneselenious
@Disorient I have tried to use kafka topic as correlation attribute, but it does not merge.Seleneselenious
i tried generateFlowFile-MergeContent-LogAttr and all works as expected. could you try to stop merge processor (to have many files in queue) and then start it - will it merge?Disorient
Did you try passing the correlation attribute as ${kafka.topic} (without the quotes). Everything else looks fine to me. let me know if it works. Also, are all your flowfiles on the same node? or they are distributed in the cluster?Merchantman
E
4

Are you using the original or merged relationships from the MergeContent processor? The former will provide the same 100 flowfiles back to you in case you need to do additional processing; the latter will give you a single flowfile with the contents of all the merged flowfiles. It looks from your provenance listing that the merge event is happening successfully, so double check with relationships you are using. If possible, please post a screenshot of your flow.

Experimental answered 28/5, 2019 at 17:31 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.