What exactly is a Cadence decision task?

Asked 14/7, 2020 at 21:15 Answered 19/9, 2020 at 2:29

Activity tasks are pretty easy to understand since it's executing an activity...but what is a decision task? Does the worker run through the workflow from beginning (using records of completed activities) until it hits the next "meaningful" thing it needs to do while making a "decision" on what needs to be done next?

Demur answered 14/7, 2020 at 21:15 Comment(0)

My Opinions

Ideally users don't need to understand it!

However, decision/workflow Task a leaked technical details from Cadence/Temporal API. Unfortunately, you won't be able to use Cadence/Temporal well if you don't fully understand it.

Fortunately, using iWF will keep you away from leakage. iWF provides a nice abstraction on top of Cadence/Temporal but keep the same power.

TL;DR

Decision is short for workflow decision.

A decision is a movement from one state to another in a workflow state machine. Essentially, your workflow code defines a state machine. This state machine must be a deterministic state machine for replay, so workflow code must be deterministic.

A decision task is a task for worker to execute workflow code to generate decision.

NOTE: in Temporal, decision is called "command", the workflow decision task is called "workflow task" which generates the "command"

Example

Let say we have this workflow code:

public string sampleWorkflowMethod(...){
  var result = activityStubs.activityA(...)
  if(result.startsWith("x"){
     Workflow.sleep(...)
  }else{
     result = activityStubs.activityB(...)
  }
  
  return result
}

From Cadence/Temporal SDK's point of view, the code is a state machine.

Assuming we have an execution that the result of activityA is xyz, so that the execution will go to the sleep branch.

Then the workflow execution flow is like this graph.

Workflow code defines the state machine, and it's static.
Workflow execution will decide how to move from one state to another during the run time, based on the intput/result/and code logic
Decision is an abstraction in Cadence internal. During the workflow execution, when it change from one state to another, the decision is the result of that movement.
The abstraction is basically to define what needs to be done when execution moves from one state to another --- schedule activity, timer or childWF etc.
The decision needs to be deterministic --- with the same input/result, workflow code should make the same decision --- schedule activityA or B must be the same.

Timeline in the example

What happens during the above workflow execution:

Cadence service schedules the very first decision task, dispatched to a workflow worker
The worker execute the first decision task, and return the decision result of scheduling activityA to Cadence service. Then workflow stay there waiting.
As a result of scheduling activityA, an activity task is generated by Cadence service and the task is dispatched to an activity worker
The activity worker executes the activity and returns a result xyz to Cadence service.
As a result of receiving the activity result, Cadence service schedules the second decision task, and dispatch to a workflow worker.
The workflow worker execute the second decision task, and respond the decision result of scheduling a timer to Cadence service
On receiving the decision task respond, Cadence service schedules a timer
When the timer fires, Cadence service schedules the third decision task and dispatched to workflow worker again
The workflow worker execute the third decision task, and respond the result of completing the workflow execution successfully with result xyz.

Some more facts about decision

Workflow Decision is to orchestrate those other entities like activity/ChildWorkflow/Timer/etc.
Decision(workflow) task is to communicate with Cadence service, telling what is to do next. For example, start/cancel some activities, or complete/fail/continueAsNew a workflow.
There is always at most one outstanding(running/pending) decision task for each workflow execution. It's impossible to start one while another is started but not finished yet.
The nature of the decision task results in some non-determinism issue when writing Cadence workflow. For more details you can refer to the article.
On each decision task, Cadence Client SDK can start from very beginning to "replay" the code, for example, executing activityA. However, this replay mode won't generate the decision of scheduling activityA again. Because client knows that the activityA has been scheduled already.
However, a worker doesn't have to run the code from very beginning. Cadence SDK is smart enough to keep the states in memory, and wake up later to continue on previous states. This is called "Workflow Sticky Cache", because a workflow is sticky on a worker host for a period.

History events of the example:

1. WorkflowStarted
2. DecisionTaskScheduled
3. DecisionTaskStarted
4. DecisionTaskCompleted
5. ActivityTaskScheduled <this schedules activityA>
6. ActivityTaskStarted
7. ActivityTaskCompleted <this records the results of activityA>
8. DecisionTaskScheduled
9. DecisionTaskStarted
10. DecisionTaskCompleted
11. TimerStarted         < this schedules the timer>
12. TimerFired
13. DecisionTaskScheduled
14. DecisionTaskStarted
15. DecisionTaskCompleted
16. WorkflowCompleted

Hiccup answered 19/9, 2020 at 2:29 Comment(0)

TLDR; When a new external event is received a workflow task is responsible for determining which next commands to execute.

Temporal/Cadence workflows are executed by an external worker. So the only way to learn about which next steps a workflow has to take is to ask it every time new information is available. The only way to dispatch such a request to a worker is to put into a workflow task into a task queue. The workflow worker picks it up, gets workflow out of its cache, and applies new events to it. After the new events are applied the workflow executes producing a new set of commands. After the workflow code is blocked and cannot make any forward progress the workflow task is reported as completed back to the service. The list of commands to execute is included in the completion request.

Does the worker run through the workflow from beginning (using records of completed activities) until it hits the next "meaningful" thing it needs to do while making a "decision" on what needs to be done next?

This depends if a worker has the workflow object in its LRU cache. If workflow is in the cache, no recovery is needed and only new events are included in the workflow task. If object is not cached then the whole event history is shipped and the worker has to execute the workflow code from the beginning to get it to its current state. All commands produced while replaying past events are duplicates of previously produced commands and are ignored.

The above means that during a lifetime of a workflow multiple workflow tasks have to be executed. For example for a workflow that calls two activities in a sequence:

a();
b();

The tasks will be executed for every state transition:

-> workflow task at the beginning: command is ScheduleActivity "a"
    a();
-> workflow task when "a" completes: command is ScheduleActivity "b"
    b();
-> workflow task when "b" completes: command is CompleteWorkflowExecution

In the answer, I used terminology adopted by temporal.io fork of Cadence. Here is how the Cadence concepts map to the Temporal ones:

decision task -> workflow task
decision -> command, but it can also mean workflow task in some contexts
task list -> task queue

Loire answered 15/7, 2020 at 15:16 Comment(1)

Thanks! that was very helpful. Appreciate the work you've done on this Maxim – Demur 21/7, 2020 at 16:20