Airflow backfill stops if any task fails
Asked Answered
M

5

6

I am using airflow cli's backfill command to manually run some backfill jobs.

 airflow backfill mydag -i -s 2018-01-11T16-00-00 -e 2018-01-31T23-00-00 --reset_dagruns --rerun_failed_tasks

The dag interval is hourly and it has around 40 tasks. Hence this kind of backfill job takes more than a day to finish. I need it to run without supervision. I noticed however, that even if one task fails at one of the runs in the backfill interval, the entire backfill job stops with the following exception and I have to restart it manually again.

    Traceback (most recent call last):
      File "/home/ubuntu/airflow/bin/airflow", line 4, in <module>
        __import__('pkg_resources').run_script('apache-airflow==1.10.0', 'airflow')
      File "/home/ubuntu/airflow/lib/python3.5/site-packages/pkg_resources/__init__.py"
    , line 719, in run_script
        self.require(requires)[0].run_script(script_name, ns)
      File "/home/ubuntu/airflow/lib/python3.5/site-packages/pkg_resources/__init__.py", line 1504, in run_script
        exec(code, namespace, namespace)
      File "/home/ubuntu/airflow/lib/python3.5/site-packages/apache_airflow-1.10.0-py3.
    5.egg/EGG-INFO/scripts/airflow", line 32, in <module>
        args.func(args)
      File "/home/ubuntu/airflow/lib/python3.5/site-packages/apache_airflow-1.10.0-py3.5.egg/airflow/utils/cli.py", line 74, in wrapper
        return f(*args, **kwargs)
      File "/home/ubuntu/airflow/lib/python3.5/site-packages/apache_airflow-1.10.0-py3.
    5.egg/airflow/bin/cli.py", line 217, in backfill
        rerun_failed_tasks=args.rerun_failed_tasks,
      File "/home/ubuntu/airflow/lib/python3.5/site-packages/apache_airflow-1.10.0-py3.5.egg/airflow/models.py", line 4105, in run
        job.run()
      File "/home/ubuntu/airflow/lib/python3.5/site-packages/apache_airflow-1.10.0-py3.
    5.egg/airflow/jobs.py", line 202, in run
        self._execute()
      File "/home/ubuntu/airflow/lib/python3.5/site-packages/apache_airflow-1.10.0-py3.5.egg/airflow/utils/db.py", line 74, in wrapper
        return func(*args, **kwargs)
      File "/home/ubuntu/airflow/lib/python3.5/site-packages/apache_airflow-1.10.0-py3.
    5.egg/airflow/jobs.py", line 2533, in _execute


 airflow.exceptions.AirflowException: 

Some task instances failed:
{('mydag', 'a_task', datetime.datetime(2018, 1, 30, 17, 5, tzinfo=psy
copg2.tz.FixedOffsetTimezone(offset=0, name=None)))}

The task instances do not depend on their previous instances, therefore I don't mind if one or two tasks fail. I need the job to continue.

I could not find any option in the documentation of backfill which would allow me to specify this behaviour.

Is there a way to achieve what I am looking for?

Mouldy answered 20/9, 2018 at 15:47 Comment(0)
C
1

Adding --donot_pickle switch to the backfill command may help.

Comeaux answered 6/10, 2018 at 9:59 Comment(0)
B
1

Have experienced the same problem with the backfill command.

Tried the --donot_pickle option and depends_on_past set to False without success.

Possible workaround: Set a start date and catchup=True for the DAG, and unpause it in the web gui. This worked like a backfill.

I could not get backfill CLI command to keep running if more than 1 DAG run was marked as failed.

Beverlybevers answered 27/5, 2019 at 8:55 Comment(0)
R
0

Try the --continue-on-failures flag.

Note on it: if set, the backfill will keep going even if some of the tasks failed

Example: nohup airflow dags backfill --continue-on-failures -s 2020-01-01 -e 2022-05-04 test_dag_name --reset-dagruns -y > backfill_logs/test_dag_name_backfill.txt &

Reciprocity answered 1/3, 2023 at 19:57 Comment(0)
G
-1

If I understand your issue correctly, the behaviour you seek can be achieved by setting

'depends_on_past': False

among the DAG args.

Source: https://airflow.incubator.apache.org/tutorial.html#backfill

Goodfellowship answered 22/9, 2018 at 17:55 Comment(0)
N
-1

From what I understand backfilling stops execution when the tasks that has in queue fail.

A trick that worked for me is to load the queue with all the tasks that I need to be run irrespective of failures. That is to say, I increase the max_active_runs to a ridiculous number so that all dag runs are executed.

e.g. max_active_runs: 1000

Check airflow documentation about default arguments for a dag.

Naif answered 20/7, 2020 at 8:39 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.