I have an application that runs as a web service, which submits jobs to Spark on a user request. A job queue needs to be limited per user. I am planning to use Airflow as an orchestration framework to manage job queues but while it supports parallel DAG execution it's optimized for batch processing rather than real time. Is Airflow designed to handle ~200 DAG executions per second with multiple queues (one per user) or should I look for alternatives?
Using airflow for real time job orchestration
Asked Answered
Do you have data move from one task to another? Does time matter here since you mentioned real-time. With Airflow, workflows are expected to be mostly static or slowly changing. Mostly for ETL batch processing, you can speed up the airflow heartbeat, but would be good to have a POC with your use case to test out.
Below is from Airflow official document: https://airflow.apache.org/#beyond-the-horizon
Airflow is not a data streaming solution. Tasks do not move data from one to the other (though tasks can exchange metadata!). Airflow is not in the Spark Streaming or Storm space, it is more comparable to Oozie or Azkaban
© 2022 - 2024 — McMap. All rights reserved.