What is the difference between backfill and catchup in airflow
Asked Answered
R

2

6

I am trying to understand catchup and backfill in airflow. I understood what catchup is but I have not completely understood what backfill exactly is and how it is used.

I have read the documentation but but couldn't find good example to understand backfill.

Risibility answered 30/7, 2019 at 9:31 Comment(0)
A
13

From the documentation Backfill and Catchup are the same thing 1. If the catchup parameter is set to True in your DAG arguments, then the Airflow scheduler will perform Backfill, i.e. it will perform all the missing DAG Runs between your start_date and your potential end_date.

Ascend answered 30/7, 2019 at 16:0 Comment(1)
backfill CLI command can run for dates before the start date of the dag but catchup cannot, so there's some minor different I findVertebra
H
1

I noticed the previous accepted answer's link is broken, and when I looked, I did not find they are the same thing, although very similar. Here is the documentation outlining catchup - https://airflow.apache.org/docs/apache-airflow/stable/core-concepts/dag-run.html#catchup with backfill being defined right below.

Backfill is a manual operation you can run in the command line while catchup is an attribute you can set directly on the DAG. I also can't verify this in the documentation, but backfill will run in parallel to your other DAGs, while I believe catchup will be executed by the scheduler and will be part of the normal DAG process.

Possibly a little pedantic, but adding backfill=False on a DAG won't work, and running

airflow dags catchup \
    --start-date START_DATE \
    --end-date END_DATE \
    dag_id

won't work. When I first read the current accepted answer, I personally thought that catchup was a deprecated keyword to mean same thing as backfill, which is not the case.

Holifield answered 16/4, 2023 at 21:5 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.