Elastic Map Reduce: difference between CANCEL_AND_WAIT and CONTINUE?

About

Asked 7/3, 2013 at 21:19 Answered 8/3, 2013 at 19:45

Solved boto elastic-map-reduce amazon-emr

I just found that using Amazon's Elastic Map Reduce, I can specify a step to have one of three ActionOnFailure choices:

TERMINATE_JOB_FLOW
CANCEL_AND_WAIT
CONTINUE

TERMINATE_JOB_FLOW is the default and obvious - it shuts down the entire cluster upon a failure in the step.

What is the difference between CANCEL_AND_WAIT and CONTINUE? It appears to me that both will keep the cluster running and simply move on to the next step when it is added.

Gratia answered 7/3, 2013 at 21:19 Comment(0)

Say you have launched a cluster and added following 3 steps to it:

Step1
Step2
Step3

Now, if Step1 has ActionOnFailure as CANCEL_AND_WAIT, then in the event on failure of Step1, it would cancel all the remaining steps and the cluster will get into a Waiting status. And I guess if you laucng your cluster with --stay-alive option then this is the default behaviour.

if Step1 has ActionOnFailure as CONTINUE, then in the event on failure of Step1, it would continue with the execution of Step2.

if Step1 has ActionOnFailure as TERMINATE_JOB_FLOW, then in the event on failure of Step1, it would shut down the cluster as you mentioned.

Fernando answered 8/3, 2013 at 19:45 Comment(5)

Thanks! That totally makes sense - its the same for me then, since in boto I only add new steps after the previous one completes, so CANCEL_AND_WAIT and CONTINUE are the same from my perspective. – Gratia 8/3, 2013 at 19:48

I think that even with --stay-alive, TERMINATE_JOB_FLOW is the default option. I've launched several stay-alive clusters, and they all terminate when one of the steps fails. – Gratia 8/3, 2013 at 19:48

It doesn't happen that way for me, all the added steps gets into 'CANCELLED' status and cluster is in 'Waiting'. Perhaps there is something we are missing here. – Fernando 8/3, 2013 at 20:28

That is very very interesting. Maybe boto makes TERMINATE_JOB_FLOW the default? In any case, thanks for your answer! – Gratia 8/3, 2013 at 21:37

Also the Instances.KeepJobFlowAliveWhenNoSteps configuration plays a role here. If set to True, cluster will WAIT after executing (ActionOnFailure=CONTINUE) or cancelling (ActionOnFailure=CANCEL_AND_WAIT) steps 2 and 3. When set to False, cluster will TERMINATE after executing or cancelling steps 2 and 3. – Jobey 11/5, 2020 at 15:33

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags