Google Cloud Composer (Apache Airflow) cannot access log files
Asked Answered
T

5

5

I'm running a DAG in Google Cloud Composer (hosted Airflow) which runs fine in Airflow locally. All it does is print "Hello World". However, when I run it through Cloud Composer I receive the error:

*** Log file does not exist: /home/airflow/gcs/logs/matts_custom_dag/main_test/2020-04-20T23:46:53.652833+00:00/2.log
*** Fetching from: http://airflow-worker-d775d7cdd-tmzj9:8793/log/matts_custom_dag/main_test/2020-04-20T23:46:53.652833+00:00/2.log
*** Failed to fetch log file from worker. HTTPConnectionPool(host='airflow-worker-d775d7cdd-tmzj9', port=8793): Max retries exceeded with url: /log/matts_custom_dag/main_test/2020-04-20T23:46:53.652833+00:00/2.log (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f8825920160>: Failed to establish a new connection: [Errno -2] Name or service not known',))

I've also tried making the DAG add data into a database and it actually succeeds 50% of the time. However, it always returns this error message (and no other print statements or logs). Any help much appreciated on why this might be happening.

Tidwell answered 21/4, 2020 at 2:20 Comment(1)
Hi! I would like to ask you for more information. Are you using self-managed Airflow web server? What is the version of Composer&Airflow? It happens, that the logs take around 10mins to appear but the speed at which tasks are run is normal? I recommend looking at your bucket for this environment and possibly delete some old log an unused files. Moreover, you can always check your logs in Stackdriver Logging. Let me know about the results.Gama
J
4

We also faced the same issue then raised a support ticket to GCP and got the following reply.

  1. The message is related to the latency of syncing logs from Airflow workers to WebServer, it takes at least some minutes (depending on the number of objects and their size) The total log size seems not large but it’s enough to noticeably slow down synchronization, hence, we recommend cleanup/archive the logs

  2. Basically we recommend relying on Stackdriver logs instead, because of latency due to the design of this sync

I hope this will help you solve the problem.

Jar answered 22/4, 2020 at 8:30 Comment(4)
Amazing, couldn't find any info on the issue so I'm glad I'm not the only one having it. So, is the best move here to simply clean up / archive the logs regularly since GCP doesn't allow you to override the logging location?Tidwell
I just deleted all the log files and still receive the same error message. Does this make sense to you?Tidwell
In the older version of composer we have not faced this issue, only in the latest version we are facing, the only option is using stackdriver.Jar
@Jar I am facing this issue with Composer version 1.17.2 and Airflow version 2.1.2. It is not showing up everytime but at fewer instances. Did setting the task retries to greater than 1 helps ?Daysidayspring
S
4

I have the same problem after upgrading from 1.10.3 to 1.10.6 of Google Composer. I can see in my logs that airflow is trying to get the logs from a bucket with a name ended with -tenant while the bucket in my account ends with -bucket

In the configuration, I can see something weird too.

## airflow.cfg
[core]
remote_base_log_folder = gs://us-east1-dada-airflow-xxxxx-bucket/logs

## also in the running configuration says
core    remote_base_log_folder  gs://us-east1-dada-airflow-xxxxx-tenant/logs   env var

I wrote to google support and they said the team is working on a fix.

EDIT: I've been accessing my logs with gsutil and replacing the bucket name suffix to -bucket

gsutil cat gs://us-east1-dada-airflow-xxxxx-bucket/logs/...../5.logs
Sixteenth answered 27/5, 2020 at 14:45 Comment(2)
Thanks for the additional contextTidwell
what ever you said is true, but what you did to resolve this reading issue from remote log??Spragens
D
1

I faced the same situation in multiple occasions. As soon as when the job finished when I take a look at the log on Airflow Web UI, it used to give me the same error. Although when I check back the same logs on UI after a min or 2, I could see the logs properly. As per the above answers, its a sync issue between the webserver and the Worker node.

Dailey answered 27/5, 2020 at 15:42 Comment(1)
So how to solve this sync issue?Spragens
U
0

In general, the issue describe here should be more like a sporadic issue.

In certain situations, what could help is setting default-task-retries to a value that allows for retrying a task at least 1.

Uniaxial answered 7/12, 2021 at 18:6 Comment(0)
R
-1

This issue is resolved at least since Airflow version: 1.10.10+composer.

Rajah answered 11/1, 2021 at 13:58 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.