Cloud composer tasks fail without reason or logs
Asked Answered
R

1

8

I run Airflow in a managed Cloud-composer environment (version 1.9.0), whic runs on a Kubernetes 1.10.9-gke.5 cluster.

All my DAGs run daily at 3:00 AM or 4:00 AM. But sometime in the morning, I see a few Tasks failed without a reason during the night.

  • When checking the log using the UI - I see no log and I see no log either when I check the log folder in the GCS bucket enter image description here

  • In the instance details, it reads "Dependencies Blocking Task From Getting Scheduled" but the dependency is the dagrun itself. enter image description here

  • Although the DAG is set with 5 retries and an email message it does not look as if any retry took place and I haven't received an email about the failure.

  • I usually just clear the task instance and it run successfully on the first try.

Has anyone encountered a similar problem?

Risk answered 21/1, 2019 at 9:42 Comment(1)
I edited my answer to add a Feature Request for sending emails even if the pod was evicted.Lorielorien
A
12

Empty logs often means the Airflow worker pod was evicted (i.e., it died before it could flush logs to GCS), which is usually due to an out of memory condition. If you go to your GKE cluster (the one under Composer's hood) you will probably see that there is indeed a evicted pod (GKE > Workloads > "airflow-worker").

You will probably see in "Tasks Instances" that said tasks have no Start Date nor Job Id or worker (Hostname) assigned, which, added to no logs, is a proof of the death of the pod.

Since this normally happens in highly parallelised DAGs, a way to avoid this is to reduce the worker concurrency or use a better machine.

EDIT: I filed this Feature Request on your behalf to get emails in case of failure, even if the pod was evicted.

Alurta answered 21/1, 2019 at 13:3 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.