After doing my best to find previous questions and examples relevant to this question, and still not finding the answers that I'm looking for I figured that I would submit a question myself.
ExecuteStreamCommand seems like the perfect processor for me due to the following reasons:
- I am able to execute any Python script and avoid Jython (in a similar fashion as ExecuteScript). Jython is not an option for me.
- I can take in FlowFiles. This is necessary as my script is made to consume the output of a previous processor. Furthermore I like the idea of keeping the data under "NiFi management".
- It writes an "execution status" which will be useful for routing.
In a nutshell, what I'm trying to do with ExecuteStreamCommand is:
- Ingest the output of a previous processor (a Scrapy spider that outputs a text file with JSON lines to be exact)
- Call a python script (e.g.
python3 my_script.py
) - Load the FlowFile that was ingested in my python script.
- Select the content of the FlowFile.
- Operate on the content of the FlowFile within python.
- Output either an updated version of the original FlowFile or create a new one.
- Continue with my NiFi flow with the updated/new FlowFile.
For clarity's sake I currently don't understand:
- How to call the python script (from the ExecuteStreamCommand Processor)
- How to load up the FlowFile from within Python
- How to update or create a new FlowFile from within Python
- How to output the updated FlowFile from Python back to NiFi.
I have come across various examples for ExecuteScript, but unfortunately these don't exactly translate to the use of the ExecuteStreamCommand.
Thank you in advance. Any advice is appreciated.