How to Run a Simple Airflow DAG
Asked Answered
C

4

20

I am totally new to Airflow. I would like to run a simple DAG at a specified date. I'm struggling to make difference between the start date, the execution date, and backfilling. And what is the command to run the DAG?

Here is what I've tried since:

airflow run dag_1 task_1 2017-1-23

The first time I ran that command, the task executed correctly, but when I tried again it did not work.

Here is another command I ran:

airflow backfill dag_1 -s 2017-1-23 -e 2017-1-24

I don't know what to expect from this command. Will the DAGs execute every day from 23 to 24?

Before running the two commands above, I did this:

airflow initdb
airflow scheduler 
airflow webserver -p 8085 --debug &

Here is my DAG

from airflow import DAG
from airflow.operators.bash_operator import BashOperator
from datetime import datetime, timedelta
default_args = {
    'owner': 'airflow',
    'depends_on_past': False,
    'start_date': datetime(2017, 1, 23, 12),
    'email': ['[email protected]'],
    'email_on_failure': False,
    'email_on_retry': False,
    'retries': 1,
    'retry_delay': timedelta(minutes=5),
}

dag = DAG(
    'dag_1', default_args=default_args, schedule_interval=timedelta(1))

t1 = BashOperator(
    task_id='create_clients',
    bash_command='Rscript /scripts/Cli.r',
    dag=dag)

t2 = BashOperator(
    task_id='create_operation',
    bash_command='Rscript Operation.r',
    retries=3,
    dag=dag)

t2.set_upstream(t1)

Screenshot:Tree View

UPDATE

airflow run dag_1 task_1 2017-1-23T10:34
Crandall answered 23/1, 2017 at 11:26 Comment(1)
If you've actually tried something, please edit your question to include a minimal reproducible example.Lewes
H
36

If you run it once with the

airflow run dag_1 task_1 2017-1-23

The run is saved and running it again won't do anything you can try to re-run it by forcing it

airflow run --force=true dag_1 task_1 2017-1-23

The airflow backfill command will run any executions that would have run in the time period specified from the start to end date. It will depend what schedule you set on the DAG, if you set it to trigger every hour it should run 24 times, but it also won't re-execute previously executed runs.

You can clear the task as if it NEVER ran

airflow clear dag_1 -s 2017-1-23 -e 2017-1-24

Also check the cli docs here: https://airflow.incubator.apache.org/cli.html

Harlotry answered 24/1, 2017 at 0:2 Comment(4)
Thanks for your explanation. I tried to add time to the first run (see update). Why does the run executes immediately even if the time specified is not reached ? By example my current time is 10:30 and i specify 10:34 in the run...It runs immediately, is it normal behaviour?Crandall
I believe(not 100% sure) that it runs the task as if it were that specified date since you passed it in as an argument. So when it completes, the information saved regarding the run says it completed a run of that task at that time.Harlotry
-f option will do, no need to set =true. Airflow 1.9 airflow run: error: argument -f/--force: ignored explicit argument 'true'Adao
For latest airflow they have replaced airflow run with airflow tasks run docSero
P
9

difference between the start date ,the execution date and backfilling

Backfilling is done to run DAG explicitly to test/manually run DAG/re run a DAG which error-ed out. You do this using CLI

airflow backfill -s <<start_date>> <<dag>> 
#optionally provide -1 as start_date to run it immediately

start_date is, as the name suggests, date from when the DAG definition is valid

execution_date is the date-time when it is to be run. This you provide while testing individual tasks of DAG as below

airflow test <<dag>> <<task>> <<exec_date>>

what is the command to run the dag

Backfill is the command to run DAG explicitly. Otherwise you just put the DAG in the DAGBAG folder and the scheduler will run it as per the schedule defined in the DAG definition

airflow backfill -s <<start_date>> <<dag>> 
#optionally provide -1 as start_date to run it immediately
Proficient answered 11/2, 2017 at 8:35 Comment(0)
S
1

For more recent versions of Airflow you should use airflow tasks run.

For example: airflow tasks run dag_1 task_1 2023-1-3

Selsyn answered 3/1, 2023 at 13:10 Comment(0)
D
0

I'm running Airflow in Docker, as described at the official tutorial. I also installed the airflow.sh script described at the end of the page

What worked for me was the following:

  1. List the available DAGS (id their ids)

    ./airflow.sh dags list
    
  2. Run the DAG

    ./airflow.sh dags trigger my_dag --conf '{"manual_execution": true}'
    

    Which will output a nicely formatted MD table and will show in the DAGs runs in the UI.

Decree answered 27/9, 2023 at 11:50 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.