Broken DAG: (...) No module named docker
Asked Answered
L

5

9

I have BigQuery connectors all running, but I have some existing scripts in Docker containers I wish to schedule on Cloud Composer instead of App Engine Flexible.

I have the below script that seems to follow the examples I can find:

import datetime
from airflow import DAG
from airflow import models
from airflow.operators.docker_operator import DockerOperator

yesterday = datetime.datetime.combine(
    datetime.datetime.today() - datetime.timedelta(1),
    datetime.datetime.min.time())

default_args = {
    # Setting start date as yesterday starts the DAG immediately
    'start_date': yesterday,
    # If a task fails, retry it once after waiting at least 5 minutes
    'retries': 1,
    'retry_delay': datetime.timedelta(minutes=5),
}

schedule_interval = '45 09 * * *'

dag = DAG('xxx-merge', default_args=default_args, schedule_interval=schedule_interval)

hfan = DockerOperator(
   task_id = 'hfan',
   image   = 'gcr.io/yyyyy/xxxx'
 )

...but when trying to run it tells me in the web UI:

Broken DAG: [/home/airflow/gcs/dags/xxxx.py] No module named docker

Is it perhaps that the Docker is not configured to work inside the Kubernetes cluster that Cloud Composer runs? Or am I just missing something in the syntax?

Lucic answered 9/5, 2018 at 12:11 Comment(2)
Does this answer your question? Running docker operator from Google Cloud ComposerPrance
It’s a couple of years since I asked this question :) these days I use KubernetesPodOperator instead. Installing docker or any other extra configuration on Airflow didn’t work out wellLucic
P
1

As explained in other answers, the Docker Python client is not preinstalled in Cloud Composer environments. To install it, add it as a PyPI dependency in your environment's configuration.

Caveat: by default, DockerOperator will try to talk to the Docker API at /var/run/docker.sock to manage containers. This socket is not mounted inside Composer Airflow worker pods, and manually configuring it to do so is not recommended. Use of DockerOperator is only recommended in Composer if configured to talk to Docker daemons running outside of your environments.

To avoid more brittle configuration or surprises from bypassing Kubernetes (since it is responsible for managing containers across the entire cluster), you should use the KubernetesPodOperator. If you are launching containers into a GKE cluster (or the Composer environment's cluster), then you can use GKEPodOperator, which has more specific GCP-related parameters.

Peart answered 15/8, 2020 at 17:4 Comment(1)
Agreed, this is what I eventually ended up withLucic
M
8

I got it resolved by installing docker-py==1.10.6 in the PyPI section of composer.

However, to get DockerOperator to work properly requires a bit more effort as the composer workers do not have access to the Docker daemon. Head to the GCP console and perform the following steps; after getting cluster credentials).

  1. Export current deployment config to file

    kubectl get deployment airflow-worker -o yaml --export > airflow-worker-config.yaml

  2. Edit airflow-worker-config.yaml (example link) to mount docker.sock and docker, grant privileged access to airflow-worker to run docker commands

  3. Apply deployment settings

    kubectl apply -f airflow-worker-config.yaml

Mcsweeney answered 8/6, 2018 at 2:35 Comment(1)
To new readers, this is not a recommended reconfiguration: https://mcmap.net/q/1153451/-broken-dag-no-module-named-docker. There is also no guarantee that patches to airflow-worker will persist if you update or upgrade your environment. Consider GKEPodOperator as the recommended solution for launching containers.Peart
A
5

This means: whereever your Airflow instance is installed, the Python package named docker is missing.

If I configure my personal machine, I can install missing packages with

pip install docker

EDIT

Within the source code of the docker component https://airflow.incubator.apache.org/_modules/airflow/operators/docker_operator.html

there is an import statement:

from docker import Client, tls

So the new error cannot import name Client seems to me to be connected to a broken install or a wrong version of the docker package.

Ardatharde answered 9/5, 2018 at 12:13 Comment(9)
Ahh ok, so I should install that in the PyPi section! Will try it now, thanks!Lucic
Hmm well close but now after installing docker I get Broken DAG: [...] cannot import name Client - is this the same thing or a different issue?Lucic
Great thank you - I will try it with a new docker version and environmentLucic
Is it possible it should be installing pip install docker-py ? (no, its from 2016 pypi.org/project/docker-py )Lucic
Have you seen this: #43386503Ardatharde
I have when I was installing airflow on my own server, but this is via Google Cloud Composer so I don't have server access - I do suspect that they may have to configure that thoughLucic
Client was renamed to DockerClient in docker>=2.0.0 so I'm trying to install docker==1.10.6Lucic
This is still not working - the docker install fails when I try via its pip section, I'm waiting for some answer from Google on if it is supportedLucic
All python packages need to be installed into the virtual env which airflow is using (even docker) and not system wide/globablly for airflow to be able to access themDioscuri
P
1

As explained in other answers, the Docker Python client is not preinstalled in Cloud Composer environments. To install it, add it as a PyPI dependency in your environment's configuration.

Caveat: by default, DockerOperator will try to talk to the Docker API at /var/run/docker.sock to manage containers. This socket is not mounted inside Composer Airflow worker pods, and manually configuring it to do so is not recommended. Use of DockerOperator is only recommended in Composer if configured to talk to Docker daemons running outside of your environments.

To avoid more brittle configuration or surprises from bypassing Kubernetes (since it is responsible for managing containers across the entire cluster), you should use the KubernetesPodOperator. If you are launching containers into a GKE cluster (or the Composer environment's cluster), then you can use GKEPodOperator, which has more specific GCP-related parameters.

Peart answered 15/8, 2020 at 17:4 Comment(1)
Agreed, this is what I eventually ended up withLucic
C
1

What solved the problem in my case was adding the word "docker" inside the Dockerfile

&& pip install pyasn1 \
&& pip install apache-airflow[crypto,docker,celery,postgres,hive,jdbc,mysql,ssh${AIRFLOW_DEPS:+,}${AIRFLOW_DEPS}]==${AIRFLOW_VERSION} \
&& pip install 'redis==3.2' \
Catlaina answered 27/12, 2020 at 2:17 Comment(0)
R
0

As noted in tobi6's answer, you need to have the PyPI package for docker installed in your Composer environment. There are instructions here for installing PyPI packages in your environment at a particular package version.

Riplex answered 9/5, 2018 at 19:26 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.