Orchestration engines and frameworks?
Asked Answered
H

3

16

I'm looking for an orchestration framework/engine/toolkit with which to replace/upgrade an existing software, mainly because of scalability limitations. By orchestration I mean asynchronous and distributed execution of generic tasks and workflows.

More specifically the requirements are pretty much these:

  • Wrapping and execution of generic tasks, in Java if language dependent
  • API for tasks and workflows on-demand triggering
  • Scheduling would be nice as well
  • Support for distributed architecture & scalability (mainly for big numbers of small tasks)
  • Persistency and resilience
  • Advanced workflow configuration capabilities (do this, then these 3 tasks in parallel, then this, having priorities, dependencies...)
  • Monitoring and administration UI (or at least API)

The existing system is an old fashion monolithic service (in Java) that has most of that, including the execution logic itself which should remain as untouched as possible.

Does anyone have experience with a similar problem? It seems to me it's supposed to be pretty common, would be strange if I have to implement it myself entirely. I found some questions here (like this and this) discussing the theory of orchestration and choreography systems, but not real examples of tools implementing it. Also I think we're not exactly talking about microservices - the tasks are not prolonged and heavy, they're just many, running in the background executing short jobs of many types. I wouldn't create a service for every job type.

I'm also not looking for cloud and container services at this point - to my understanding the deployment is a different issue.

The closest I got is the Netflix Conductor engine, which answers most of the requirements by running an orchestration server that manages tasks implemented in servlets (or any web services in any language - a plus). However it seems like it's built mainly for arranging heavy tasks in a workflow rather than running a huge number of small tasks, which makes me wonder what would be the overhead of invoking many small tasks in servlets for example.

Does anyone have experience or any input on the Conductor or other tools I could use? Or even my entire approach to the problem?

EDIT: I realize it's kind of a "research advice needed" so let's put it simply in 3 questions:

  1. Am I right to look for an orchestration solution for the requirements above?
  2. Does anyone have experience with the Netflix Conductor? Any feedback on it?
  3. Does it have good competitors?
Homologize answered 23/4, 2018 at 12:44 Comment(1)
Have you looked at Apache Camel? It is the de-facto implementation of EIP, which does all of the things in your list and more?Vadavaden
K
8

The main competitor of Netflix Conductor is Temporal Workflow. It scales better and is more developer-friendly by using code instead of JSON DSL to implement the orchestration logic.

It also works OK with the fine-grained tasks by implementing specific optimizations (local activities) that allow batching multiple small tasks into a single database update.

Temporal has been production hardened for over five years at Uber, Coinbase, HashiCorp, Dagadog, Stripe, and hundreds of other companies.

Katiekatina answered 23/10, 2019 at 4:49 Comment(3)
Just as an FYI, the answer given is from the creator of Temporal, not that this fact lessens its value. The only problem I see with Temporal is the limited SDK support, especially in the .NET ecosystem (e.g. C# language). I would choose Conductor in this case.Adalie
Temporal also needs too much hardware and is tied for high throughput through cassandra and is does not gurantee exactly once semantic which is very important for monetary transactions etc. For race conditions the activities are responsible for ensuring unique transactions.Fideicommissary
Too much compared to what? I'm not aware of any other solution with comparable features that is more efficient. None of the microservice orchestration systems supports exactly ones without two-phase commits which don't scale. The semantic is good enough for Coinbase for example: docs.temporal.io/blog/reliable-crypto-transactions-at-coinbaseKatiekatina
R
6

Perhaps you are looking for something like Airflow https://airflow.apache.org/ ?

Wrapping and execution of generic tasks, in Java if language dependent

https://github.com/apache/incubator-airflow/tree/master/airflow/hooks https://github.com/apache/incubator-airflow/tree/master/airflow/contrib/operators

API for tasks and workflows on-demand triggering

https://airflow.apache.org/api.html (experimental)

Scheduling would be nice as well

think of cron on steroids - https://airflow.apache.org/scheduler.html

Support for distributed architecture & scalability (mainly for big numbers of small tasks)

scale with dask or celery nodes - Airflow + celery or dask. For what, when?

Persistency and resilience

uses a postgres db & rabbitMQ - if your deployment arch is stateless ( eg. repeatable containers & volumes with docker) you should be in good shape with WAL replication if you use Kubernetes or Consul there are other ways to implement more resilience on the other components

Advanced workflow configuration capabilities (do this, then these 3 tasks in parallel, then this, having priorities, dependencies...)

Airflow uses DAG's. The capabilities can be called fairly advanced. You also have parameter sharing using XCOMs if you really need that

Monitoring and administration UI (or at least API)

Has one, shows tasks & schedules & has a gantt view. also can see logs & run details easily & also manually schedule tasks directly from the UI

also look at oozie & azkaban

did this help?

Remand answered 29/5, 2018 at 14:41 Comment(0)
C
2

You could take a look at unify-flowret, a lightweight Java orchestration engine I created as part of developing a new platform in American Express. If you think Netflix Conductor seems like a good fit for your problem, you should definitely take a look at unify-flowret as Netflix Conductor was one of the options which we had evaluated before building unify-flowret.

Unify-flowret provides core orchestration functionality and depends upon the application to provide everything else. You define the workflow in a very simple JSON file using steps and routes. Then, in the application which wants to use flowret, you create certain implementations e.g. an implementation for persisting state to a database (this way it is possible to use any data store). Or an implementation to return an object to flowret on which flowret will invoke the step function. This way, rather than implementing all types of requirements within the orchestration engine, to keep things simple, most are deferred to the application.

Unify-flowret runs in an embedded mode and so is scalable horizontally. It resumes from where it left off. It is resilient in the face of crashes and will resume from the last recorded position. It provides for true technical parallel processing via definition in the workflow JSON. It provides an SLA framework that informs the application of the milestones to be set up in the future. It provides work management functionality in the form of work baskets. And many other features!

We have had great success in using it within American Express for really complex orchestration requirements.

You can checkout unify-flowret on https://github.com/americanexpress/unify-flowret.

Christianize answered 12/10, 2020 at 4:29 Comment(3)
I have been interested in unify-flowret, however, I can't seem to find if it works in a distributed (microservices) based architecture where there could be multiple nodes running for a service that implements the orchestration. Assuming that the database may be a single node or multi-node, will there be any conflicts in workflow orchestration in terms of unique identifiers for representing a case/journey/workflow, for example? Any other limitations? Is the application supposed to handle any such conflicts on its own or flowret implicitly handles that?Earl
I think its more appropriate to say that unify-flowret more of a state management and local orchestration than a distributed platform. This framework makes a great statemachine but not limited to external events to trigger itself. I just got started with unify-flowret, only 15mins spent and this is what I found.Debidebilitate
Hi - author of unify-flowret here. I just happened to see these comments now. SyedMK is right, unify-flowret is an embedded orchestrator i.e. it does local orchestration on the pod. Whatever synchronization is required within unify-flowret, it handles it - but if there is any synchronization required on the app side e.g. stopping orchestration from happening at the same time from multiple pods, then some kind of distributed lock has to be implemented by the application. Similar in case of parallel processing where app data needs to be protected with app side synchronization. Hope this helps.Christianize

© 2022 - 2024 — McMap. All rights reserved.