What's the best way to handle having a different schedule interval for backfilling and ongoing running?
For backfilling I want to use a daily interval, but for ongoing running I want to use an hourly interval.
I can think of three approaches to this:
The easiest approach I see is to define two DAGs in the one .py file.
dag_backfill
with a daily interval, a start date in the past and end date ofdatetime.now()
, anddag_ongoing
with an hourly interval and start date ofdatetime.now()
that takes over whendag_backfill
finishes. However two DAGs in one file is discouraged here:We do support more than one DAG definition per python file, but it is not recommended as we would like better isolation between DAGs from a fault and deployment perspective...
Two .py files that import the same python functions that make up the pipeline. I worry about keeping the separate files consistent in this approach.
Only one DAG with an hourly interval that checks if the the run date is over 1 day in the past and if so only runs at midnight for those dates. I feel like that is inelegant through as it would obscure the schedule the backfilling will run on, at least from the gui homepage.
Is there a common pattern for this or known best practice?