I am looking at several open source workflow schedulers for a DAG of jobs with heterogeneous RAM usage. The scheduler should not only schedule less than a maximum number of threads, but should also keep the total amount of RAM of all concurrent tasks below the available memory.
In this Luigi Q&A, it was explained that
You can set how many of the resource is available in the config, and then how many of the resource the task consumes as a property on the task. This will then limit you to running
n
of that task at a time.in config:
[resources] api=1
in code for Task:
resources = {"api": 1}
For Airflow, I haven't been able to find the same functionality in its docs. The best that seems possible is to specify a number of available slots in a resource pool, and to also specify that a task instance uses a single slot in a resource pool. However, it appears there is no way to specify that a task instance uses more than one slot in a pool.
Question: specifically for Airflow, how can I specify a quantitative resource usage of a task instance?