I have three different type of jobs running on the data in HDFS.
These three jobs have to be run separately in the current scenario.
Now, we want to run the three jobs together by piping the OUTPUT data of one job to the other job without writing the data in HDFS to improve the architecture and overall performance.
Any suggestions are welcome for this scenario.
PS : Oozie is not fitting for the workflow.Cascading framework is also ruled out because of Scalability issues. Thanks