How fix DAG seems to be missing?
Asked Answered
T

4

2

I want to run a simple Dag "test_update_bq", but when I go to localhost I see this: DAG "test_update_bq" seems to be missing. There are no errors when I run "airflow initdb", also when I run test airflow test test_update_bq update_table_sql 2015-06-01, It was successfully done and the table was updated in BQ. Dag:

from airflow import DAG
from airflow.contrib.operators.bigquery_operator import BigQueryOperator
from datetime import datetime, timedelta

default_args = {
    'owner': 'Anna',
    'depends_on_past': True,
    'start_date': datetime(2017, 6, 2),
    'email': ['[email protected]'],
    'email_on_failure': True,
    'email_on_retry': False,
    'retries': 5,
    'retry_delay': timedelta(minutes=5),
}
schedule_interval = "00 21 * * *"

# Define DAG: Set ID and assign default args and schedule interval
dag = DAG('test_update_bq', default_args=default_args, schedule_interval=schedule_interval, template_searchpath = ['/home/ubuntu/airflow/dags/sql_bq'])


update_task = BigQueryOperator(
   dag = dag,
   allow_large_results=True,
   task_id = 'update_table_sql',
   sql = 'update_bq.sql',
   use_legacy_sql = False,
   bigquery_conn_id = 'test'
)

update_task

enter image description here

I would be grateful for any help.

/logs/scheduler

[2019-10-10 11:28:53,308] {logging_mixin.py:95} INFO - [2019-10-10 11:28:53,308] {dagbag.py:90} INFO - Filling up the DagBag from /home/ubuntu/airflow/dags/update_bq.py
[2019-10-10 11:28:53,333] {scheduler_job.py:1532} INFO - DAG(s) dict_keys(['test_update_bq']) retrieved from /home/ubuntu/airflow/dags/update_bq.py
[2019-10-10 11:28:53,383] {scheduler_job.py:152} INFO - Processing /home/ubuntu/airflow/dags/update_bq.py took 0.082 seconds
[2019-10-10 11:28:56,315] {logging_mixin.py:95} INFO - [2019-10-10 11:28:56,315] {settings.py:213} INFO - settings.configure_orm(): Using pool settings. pool_size=5, max_overflow=10, pool_recycle=3600, pid=11761
[2019-10-10 11:28:56,318] {scheduler_job.py:146} INFO - Started process (PID=11761) to work on /home/ubuntu/airflow/dags/update_bq.py
[2019-10-10 11:28:56,324] {scheduler_job.py:1520} INFO - Processing file /home/ubuntu/airflow/dags/update_bq.py for tasks to queue
[2019-10-10 11:28:56,325] {logging_mixin.py:95} INFO - [2019-10-10 11:28:56,325] {dagbag.py:90} INFO - Filling up the DagBag from /home/ubuntu/airflow/dags/update_bq.py
[2019-10-10 11:28:56,350] {scheduler_job.py:1532} INFO - DAG(s) dict_keys(['test_update_bq']) retrieved from /home/ubuntu/airflow/dags/update_bq.py
[2019-10-10 11:28:56,399] {scheduler_job.py:152} INFO - Processing /home/ubuntu/airflow/dags/update_bq.py took 0.081 seconds
Trantrance answered 10/10, 2019 at 9:31 Comment(1)
Restarting the airflow web server helped.Trantrance
T
4

Restarting the airflow webserver helped. So I kill gunicorn process on ubuntu and then restart airflow webserver

Trantrance answered 1/11, 2019 at 15:16 Comment(1)
Worked for me. See also https://mcmap.net/q/324499/-dag-seems-to-be-missingMirepoix
C
4

None of the responses helped me solving this issue.

However after spending some time I found out how to see the exact problem.

In my case I ran airflow (v2.4.0) using helm chart (v1.6.0) inside kubernetes. It created multiple docker containers. I got into the running container using ssh and executed two commands using airflow's cli and this helped me a lot to debug and understand the problem

airflow dags report

airflow dags reserialize

In my case the problem was that database schema didn't match the airflow version.

Cofferdam answered 28/9, 2022 at 11:51 Comment(1)
this would real solution. should be accepted answer.Weise
F
2

This error is usually due to an exception happening when Airflow tries to parse a DAG. So the DAG gets registered in metastore(thus visible UI), but it wasn't parsed by Airflow. Can you take a look at Airflow logs, you might see an exception causing this error.

Flodden answered 10/10, 2019 at 11:0 Comment(4)
Which logs did you mean? Where i can find it? If about airflow/logs there is nothing due to the dag was not run, also there is nothing in log in visible UI. I added log after run airflow schedulerTrantrance
Yeah, you've looked at the log that I was thinking of - from airflow scheduler. Honestly, I expected a stack trace thereFlodden
No error in my case. There seems to be a large number of ways a DAG can fail to get into the DagBag.Prenomen
you can view logs from Airflow UI by clicking on the task boxes (the Green, Red colored ones) and click View Log.Nellie
C
0

In my case, I built a custom TimeTable class but that class has a runtime like error. Not until I ran airflow dags reserialize did I find it, so I suggest you should run this as well to make sure there's no syntax errors in your codes.

Callum answered 17/10, 2023 at 19:38 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.