Airflow "This DAG isnt available in the webserver DagBag object "
Asked Answered
C

5

52

when I put a new DAG python script in the dags folder, I can view a new entry of DAG in the DAG UI but it was not enabled automatically. On top of that, it seems does not loaded properly as well. I can only click on the Refresh button few times on the right side of the list and toggle the on/off button on the left side of the list to be able to schedule the DAG. These are manual process as I need to trigger something even though the DAG Script was put inside the dag folder.

Anyone can help me on this ? Did I missed something ? Or this is a correct behavior in airflow ?

By the way, as mentioned in the post title, there is an indicator with this message "This DAG isn't available in the webserver DagBag object. It shows up in this list because the scheduler marked it as active in the metdata database" tagged with the DAG title before i trigger all this manual process.

Carlcarla answered 10/1, 2017 at 3:21 Comment(2)
Try to restart airflow web server. If this doesn't help try to do airflow backfill '<dag_id>' -s '<date>' -e '<date>' for the same date as start and end. This should run workflow once and correct UI issues.Daglock
I think the other views are reliant on data created by the scheduler parsing the dag file. If you have more than a hundred this can take minutes. It totally depends on how much work you're doing in the immediate context of the all dag files as to how quickly they can be processed in a loop.Left
S
16

It is not you nor it is correct or expected behavior. It is a current 'bug' with Airflow. The web server is caching the DagBag in a way that you cannot really use it as expected.

"Attempt removing DagBag caching for the web server" remains on the official TODO as part of the roadmap, indicating that this bug may not yet be fully resolved, but here are some suggestions on how to proceed:

only use builders in airflow v1.9+

Prior to airflow v1.9 this occurs when a dag is instantiated by a function which is imported into the file where instantiation happens. That is: when a builder or factory pattern is used. Some reports of this issue on github 2 and JIRA 3 led to a fix released with in airflow v1.9.

If you are using an older version of airflow, don't use builder functions.

airflow backfill to reload the cache

As Dmitri suggests, running airflow backfill '<dag_id>' -s '<date>' -e '<date>' for the same start and end date can sometimes help. Thereafter you may end up with the (non)-issue that Priyank points, but that is expected behavior (state: paused or not) depending on the configuration you have in your installation.

Siusan answered 12/5, 2017 at 17:40 Comment(0)
H
16

Restart the airflow webserver solves my issue.

Hospital answered 17/7, 2018 at 22:53 Comment(2)
this fixed it for me!Fronniah
Run into the same issue on airflow v1.9, when trying to import a customized plugin operator into one DAG, and restart the airflow webserver resolved the problem for me.Hoof
W
3

This error can be misleading. If hitting refresh button or restarting airflow webserver doesn't fix this issue, check the DAG (python script) for errors.

Running airflow list_dags can display the DAG errors (in addition to listing out the dags) or even try running/testing your dag as a normal python script.

After fixing the error, this indicator should go away.

Wohlen answered 24/9, 2020 at 8:50 Comment(0)
M
2

The issue is because the DAG by default is put in the DagBag in paused state so that the scheduler is not overwhelmed with lots of backfill activity on start/restart.

To work around this change the below setting in your airflow.cfg file:

# Are DAGs paused by default at creation 
dags_are_paused_at_creation = False

Hope this helps. Cheers!

Meliamelic answered 17/2, 2017 at 7:18 Comment(2)
problem still happens with this settingKwa
This is not the OP's problemHospital
S
2

I have a theory about possible cause of this issue in Google Composer. There is section about dag failures on webserver in troubleshooting documentation for Composer, which says:

Avoid running heavyweight computation at DAG parse time. Unlike the worker and scheduler nodes, whose machine types can be customized to have greater CPU and memory capacity, the webserver uses a fixed machine type, which can lead to DAG parsing failures if the parse-time computation is too heavyweight.

And I was trying to load configuration from external source (which actually took negligible amount of time comparing to other operations to create DAG, but still broke something, because webserver of Airflow in composer runs on App Engine, which has strange behaviours).

I found the workaround in discussion of this Google issue, and it is to create separate DAG with task which loads all the data needed and stores that data in airflow variable:

Variable.set("pipeline_config", config, serialize_json=True)

Then I could do

Variable.get("pipeline_config", deserialize_json=True)

And successfully generate pipeline from that. Additional benefit is that I get logs from that task, which I get from web server, because of this issue.

Slime answered 13/12, 2018 at 12:44 Comment(2)
thanks for answering for Composer specifically. Top answer to restart airflow webserver cannot work in this case.Obola
the documentation for adding/updating/deleting dags are now here: cloud.google.com/composer/docs/how-to/using/managing-dagsFlavorful

© 2022 - 2024 — McMap. All rights reserved.