I use fork/join in Oozie, in order to parallel some sub-workflow actions. My workflow.xml looks like this:
<workflow-app name="myName" xmlns="uri:oozie:workflow:0.5"
<start to="fork1"/>
<kill name="Kill">
<message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<fork name="fork1">
<path start="subworkflow1"/>
<path start="subworkflow2"/>
</fork>
<join name="Completed" to="End"
<action name="subworkflow1">
<sub-workflow>
<app-path>....</app-path>
<propagate-configuration/>
<configuration>
<property>
<name>....</name>
<value>....</value>
</property>
</configuration>
</sub-workflow>
<ok to="Completed"/>
<error to="Completed"/>
</action>
<action name="subworkflow2">
<sub-workflow>
<app-path>....</app-path>
<propagate-configuration/>
<configuration>
<property>
<name>....</name>
<value>....</value>
</property>
</configuration>
</sub-workflow>
<ok to="Completed"/>
<error to="Completed"/>
</action>
<end name="End"></workflow-app>
When subworkflow1 is killed (failed for some reason), It kills subworkflow2 also. I want those two actions to be parallel, but not dependent.
In my workflow, when workflow1 is killed, I see that workflow2 is also killed, but my app succeeded (I check it on Oozie dashboard -> workflows in HUE).
In this case I want that subworkflow1 will be killed, subworkflow2 will succeed, and I don't really care what my entire app will say.
- In my case, subworkflow1 takes longer than subworkflow2, so when I checked my app when it ended, I saw that although it says that subworkflow1+2 were killed, and my app succeeded, what really happened is that subworkflow2 finished its part and even though, it was killed later (it keeps 'running' until all the paths of the fork finish their run). So workflow2 finished its part and than was killed because workflow1 was killed...
What should I do to make each path to get it's own status and continue running even though other path in the same fork is killed?
-Doozie.wf.rerun.failnodes=true
because the decision node will have already run in the first run and will be marked assucceeded
, so the job status after re-running will still beKILLED
because the decision node will not be re-evaluated, and any nodes which haven't yet been run after the decision node will not be executed. You'd have to use skip nodes when re-running to get around this, which is cumbersome when you have a lot of nodes to skip because you have to specify every single one. – Clinton