Airflow schedule_interval and start_date to get it to always fire the next interval
Asked Answered
C

1

4

How can I configure airflow (mwaa) so that it will fire at the same time (6am PST) every day regards of when the dag is deployed?

I have tried what makes sense to me:

  1. set the schedule_interval to 0 6 * * *.
  2. set the start date to:
now = datetime.utcnow()
now = now.replace(tzinfo=pendulum.timezone('America/Los_Angeles'))
previous_five_am = now.replace(hour = 5, minute = 0, second = 0, microsecond = 0)
start_date = previous_five_am

It seems that whenever I deploy by setting the start_date to 5am the previous day it would always fire at the next 6am no matter what time I deploy the dag or do a airflow update

Canicular answered 20/2, 2021 at 3:5 Comment(2)
I had a similar problem check the answer on my question: #67213384Bison
To be honest haven't found a good solutionBison
A
5

Your confusion may be because you expect Airflow to schedule DAGs like cronjob when it's not. The first DAG Run is created based on the minimum start_date for the tasks in your DAG. Subsequent DAG Runs are created by the scheduler process, based on your DAG’s schedule_interval, sequentially. Airflow schedule tasks at the END of the interval (See docs) you can view this answer for examples.

As for your sample code - never set your start_date to be dynamic. It's a bad practice that can sometimes lead to DAG never being executed because now() always moves to now() + interval may never be reached see Airflow FAQ.

Archegonium answered 20/2, 2021 at 6:20 Comment(1)
In my case start date is 5am yesterday so that plus one interval (24 hours) would make the next execution date = tmrw at 6am, why is that not the case? Also if I set a static start date, it ends up being far in the past and then airflow will create tasks and execution dates for all the intervals between, and this results in many failures (noise). Even if I set catchup=False, in order to prevent backfill too many intervals are created. How can I configure these two values so that I can ensure that only the next days interval is scheduled?Canicular

© 2022 - 2024 — McMap. All rights reserved.