Unable to run Airflow Tasks due to execution date and start date
Asked Answered
A

4

22

Whenever I try to run a DAG, it will be in the running state but the tasks will not run. I have set my start date to datetime.today() and my schedule interval to "* * * * *". Manually triggering a run will start the dag but the task will not run due to:

The execution date is 2017-09-13T00:00:00 but this is before the task's start date 2017-09-13T16:20:30.363268.

I have tried various combinations of schedule intervals (such as a specific time each day) as well as waiting for the dag to be triggered and manual triggers. Nothing seems to work.

Almoner answered 13/9, 2017 at 20:23 Comment(3)
Try hardcoding the DAG's start date to something like datetime(2017, 9, 12), instead of datetime.today(). The FAQ has some more details about this.Blowsy
if you wish to trigger manually, you can disable by setting schedule_interval:none and trigger dag manually. If you want it to be scheduled, make sure worker and schedular are running. Below settings works fine for me (for every 2 minute run): start_date: datetime.utcnow()-timedelta(minutes=2), schedule_interval: timedelta(minutes=2)Dagan
Unfortunately I've tried all the above suggestions and nothing appears to work. @VinodVutpala using your start date and interval, I still get: The execution date is 2017-09-14T00:00:00 but this is before the task's start date 2017-09-14T13:16:33.998064. However now I also get: Task instance's dagrun did not exist: Unknown reason. I wonder if this is a worker issue.Almoner
O
20

First of all start_date is a task attribute; but in general, it is set in default_args and used like dag attribute.

The message is very clear, if a task's execution_date is before the task's start_date, it can not be scheduled. You can set start_date smaller value:

import datetime

default_args = {
    'start_date': datetime.datetime(2019, 1, 1)  # hard coded date
}

or

import airflow

default_args = {
    'start_date': airflow.utils.dates.days_ago(7)  # 7 days ago
}

From Airflow Documentation

Note that if you run a DAG on a schedule_interval of one day, the run stamped 2016-01-01 will be trigger soon after 2016-01-01T23:59. In other words, the job instance is started once the period it covers has ended.

Let’s Repeat That The scheduler runs your job one schedule_interval AFTER the start date, at the END of the period.

So, when you schedule your dag, any dag_run's execution_date will be smaller than its start time. For daily, there will be 24 hours difference.

We can say start time = execution_date + schedule_interval
(start time is not start_date, it is just the start time of the dag run)

Omeara answered 10/5, 2019 at 16:3 Comment(2)
"We can say start time = execution_date + schedule_interval" That would mean that start time > execution_date. Did you mean to have it the other way around? Surely start time is the first point on the timeline and all execution_dates come after it.Snowshoe
I mean by the "start time" is "the start time of a specific dag run". Example: I have a daily scheduled dag, there is a dag run that starts at 2020-11-23T00:00, it's execution_date will be 2020-11-22T00:00. formula: 2020-11-23T00:00 = 2020-11-22T00:00 + 24h (schedule_interval). In other words: DAGs are running periodically, execution_date is the start of the period, and DAG starts to run at end of the period which is "start time" here. (It is not start_date in the code)Omeara
R
3

google send me here, I had the same problem as you. I was defined the start_date as today

'start_date': datetime.today()

The problem was solved when I used an older date (for example 7 days ago)

seven_days_ago = datetime.combine(datetime.today() - timedelta(7),
                                  datetime.min.time())
args = {
    'owner': 'airflow',
    'start_date': seven_days_ago,
    'depends_on_past': False,
}

I found this Explanation Airflow Docs

Raeraeann answered 7/6, 2018 at 15:7 Comment(0)
R
0

try restarting the schedulers, that worked for me.

Rivas answered 1/12, 2017 at 12:58 Comment(1)
What is the explanation behind this issue ?Raeraeann
A
0

As a best practice, for start_date use only static data, not dynamic data. A dynamic start_date is misleading, and can cause failures when clearing out failed task instances and missing DAG runs. Additionally, if you change the start_date of your DAG you should also change the DAG name. source

Adaadabel answered 29/10, 2023 at 7:55 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.