How to install packages in Airflow (docker-compose)?
Asked Answered
G

5

22

The question is very similar to the one already available. The only difference is that I ran Airflow in docker

Step by step:

  1. Put docker-compose.yaml to PyCharm project
  2. Put requirements.txt to PyCharm project
  3. Run docker-compose up
  4. Run DAG and receive a ModuleNotFoundError

I want to start Airflow using docker-compose with the dependencies from requirements.txt. These dependencies should be available by PyCharm interpreter and during DAGs execution

Is there a solution that doesn't require rebuilding the image?

Gristede answered 8/6, 2021 at 12:38 Comment(0)
V
19

Is there a solution that doesn't require rebuilding the image?

Yes there is now: currently (oct-2021 v2.2.0) it's available as an env variable:

_PIP_ADDITIONAL_REQUIREMENTS

It is used in the docker-compose.yml file. That should do the trick without building a complete image as some of the other answers explain (very well actually :-)

See: https://airflow.apache.org/docs/apache-airflow/stable/docker-compose.yaml

Official documentation https://airflow.apache.org/docs/apache-airflow/stable/start/docker.html#environment-variables-supported-by-docker-compose

Vincenz answered 21/10, 2021 at 22:2 Comment(3)
I will mark the answer as correct to the question "Is there a solution that doesn't require rebuilding the image?". But in general, now I think that the method with rebuilding the image is preferable: requirements.txt is used incl. PyCharm for installing dependencies locallyGristede
This is only for development and tends to be problematic. Didn't work for me.Catholicism
Indeed the compose file states: "Use this option ONLY for quick checks [...] A better way is to build a custom image or extend the official image as described in https://airflow.apache.org/docs/docker-stack/build.html"Malleus
G
47

Got the answer at airflow GitHub discussions. The only way now to install extra python packages to build your own image. I will try to explain this solution in more details

Step 1. Put Dockerfile, docker-compose.yaml and requirements.txt files to the project directory

Step 2. Paste to Dockefile code below:

FROM apache/airflow:2.1.0
COPY requirements.txt .
RUN pip install -r requirements.txt

Step 3. Paste to docker-compose.yaml code, which you can find in the official documentation. Replace section image: ${AIRFLOW_IMAGE_NAME:-apache/airflow:2.1.0} with build: .:

---
version: '3'
x-airflow-common:
  &airflow-common
  build: .
  # REPLACED # image: ${AIRFLOW_IMAGE_NAME:-apache/airflow:2.1.0}
  environment:
    &airflow-common-env
    AIRFLOW__CORE__EXECUTOR: CeleryExecutor
    AIRFLOW__CORE__SQL_ALCHEMY_CONN: postgresql+psycopg2://airflow:airflow@postgres/airflow
    AIRFLOW__CELERY__RESULT_BACKEND: db+postgresql://airflow:airflow@postgres/airflow
    AIRFLOW__CELERY__BROKER_URL: redis://:@redis:6379/0
    AIRFLOW__CORE__FERNET_KEY: ''
    AIRFLOW__CORE__DAGS_ARE_PAUSED_AT_CREATION: 'true'
    AIRFLOW__CORE__LOAD_EXAMPLES: 'false'
    AIRFLOW__API__AUTH_BACKEND: 'airflow.api.auth.backend.basic_auth'
  volumes:
    - ./dags:/opt/airflow/dags
    - ./logs:/opt/airflow/logs
    - ./plugins:/opt/airflow/plugins
  user: "${AIRFLOW_UID:-50000}:${AIRFLOW_GID:-50000}"
  depends_on:
    redis:
      condition: service_healthy
    postgres:
      condition: service_healthy

# ...

Your project directory at this moment should look like this:

airflow-project
|docker-compose.yaml
|Dockerfile
|requirements.txt

Step 4. Run docker-compose up to start Airflow, docker-compose should build your image automatically from Dockerfile. Run docker-compose build to rebuild the image and update dependencies

Gristede answered 8/6, 2021 at 16:5 Comment(0)
V
19

Is there a solution that doesn't require rebuilding the image?

Yes there is now: currently (oct-2021 v2.2.0) it's available as an env variable:

_PIP_ADDITIONAL_REQUIREMENTS

It is used in the docker-compose.yml file. That should do the trick without building a complete image as some of the other answers explain (very well actually :-)

See: https://airflow.apache.org/docs/apache-airflow/stable/docker-compose.yaml

Official documentation https://airflow.apache.org/docs/apache-airflow/stable/start/docker.html#environment-variables-supported-by-docker-compose

Vincenz answered 21/10, 2021 at 22:2 Comment(3)
I will mark the answer as correct to the question "Is there a solution that doesn't require rebuilding the image?". But in general, now I think that the method with rebuilding the image is preferable: requirements.txt is used incl. PyCharm for installing dependencies locallyGristede
This is only for development and tends to be problematic. Didn't work for me.Catholicism
Indeed the compose file states: "Use this option ONLY for quick checks [...] A better way is to build a custom image or extend the official image as described in https://airflow.apache.org/docs/docker-stack/build.html"Malleus
H
3

Another alternative is update your file docker-compose.yml, put the follow lines with all the commands you need

  command: -c "pip3 install apache-airflow-providers-sftp  apache-airflow-providers-ssh --user"

And rebuild the image

docker-compose up airflow-init
docker-compose up
Harveyharvie answered 31/8, 2021 at 23:2 Comment(0)
C
3

1. Create new Airflow docker image with installed Python requirements

Check what Airflow image your docker-compose.yaml is using and use that image, in my case it's: apache/airflow:2.3.2 I same folder where you have your docker-compose.yaml create Dockerfile with following content:

FROM apache/airflow:2.3.2
COPY requirements.txt /requirements.txt
RUN pip install --user --upgrade pip
RUN pip install --no-cache-dir --user -r /requirements.txt

2. Build new Airflow image

In same folder run:

docker build . --tag pyrequire_airflow:2.3.2

3. Use new image in your docker-compose.yaml

Find name of the airflow image used in your docker-compose.yaml under AIRFLOW_IMAGE_NAME. Change:

image: ${AIRFLOW_IMAGE_NAME:-apache/airflow:2.3.2}

To:

image: ${AIRFLOW_IMAGE_NAME:-pyrequire_airflow:2.3.2}
Catholicism answered 25/6, 2022 at 16:4 Comment(0)
F
1

don't know if the answer is too late, anyway, I've managed to workaround this problem by:

  • Defining new volumen in docker compose, pointing to a directory where new modules will be deployed. In my case, in Docker compose x-airflow-common: common section, in volumes subsection, I added: - ${AIRFLOW_PROJ_DIR:-.}/python:/python_extended
  • Then, in ${AIRFLOW_PROJ_DIR:-.}/python, I can deploy new modules to be used as PYTHONPATH.
  • Finally, it is possible to define a .env specific per task defined in Airflow in ${AIRFLOW_PROJ_DIR:-.}/python and load a per-task specific .env files in the dag.py of the task.

So, my docker-compose.yml looks like:

version: '3.8'
x-airflow-common:
  &airflow-common
  # In order to add custom dependencies or upgrade provider packages you can use your extended image.
  # Comment the image line, place your Dockerfile in the directory where you placed the docker-compose.yaml
  # and uncomment the "build" line below, Then run `docker-compose build` to build the images.
  
  #image: ${AIRFLOW_IMAGE_NAME:-apache/airflow:2.7.0}
  build: .
  
  environment:
    &airflow-common-env
    AIRFLOW__CORE__EXECUTOR: CeleryExecutor
    AIRFLOW__DATABASE__SQL_ALCHEMY_CONN: postgresql+psycopg2://airflow:airflow@postgres/airflow
    # For backward compatibility, with Airflow <2.3
    AIRFLOW__CORE__SQL_ALCHEMY_CONN: postgresql+psycopg2://airflow:airflow@postgres/airflow
    AIRFLOW__CELERY__RESULT_BACKEND: db+postgresql://airflow:airflow@postgres/airflow
    AIRFLOW__CELERY__BROKER_URL: redis://:@redis:6379/0
    AIRFLOW__CORE__FERNET_KEY: ''
    AIRFLOW__CORE__DAGS_ARE_PAUSED_AT_CREATION: 'true'
    AIRFLOW__CORE__LOAD_EXAMPLES: 'false'
    AIRFLOW__API__AUTH_BACKENDS: 'airflow.api.auth.backend.basic_auth,airflow.api.auth.backend.session'
    # yamllint disable rule:line-length
    # Use simple http server on scheduler for health checks
    # See https://airflow.apache.org/docs/apache-airflow/stable/administration-and-deployment/logging-monitoring/check-health.html#scheduler-health-check-server
    # yamllint enable rule:line-length
    AIRFLOW__SCHEDULER__ENABLE_HEALTH_CHECK: 'true'
    # WARNING: Use _PIP_ADDITIONAL_REQUIREMENTS option ONLY for a quick checks
    # for other purpose (development, test and especially production usage) build/extend Airflow image.
    _PIP_ADDITIONAL_REQUIREMENTS: ${_PIP_ADDITIONAL_REQUIREMENTS:-}
    PYTHONPATH: '$PYTHONPATH;/python_extended'
  volumes:
    - ${AIRFLOW_PROJ_DIR:-.}/dags:/opt/airflow/dags
    - ${AIRFLOW_PROJ_DIR:-.}/logs:/opt/airflow/logs
    - ${AIRFLOW_PROJ_DIR:-.}/config:/opt/airflow/config
    - ${AIRFLOW_PROJ_DIR:-.}/plugins:/opt/airflow/plugins
    - ${AIRFLOW_PROJ_DIR:-.}/python:/python_extended
  user: "${AIRFLOW_UID:-50000}:0"
  depends_on:
    &airflow-common-depends-on
    redis:
      condition: service_healthy
    postgres:
      condition: service_healthy

And my DAGs start with:

import sys, os, logging
sys.path.append(os.path.abspath("/python_extended"))
logging.info(os.environ['PYTHONPATH'])

...

from dotenv import load_dotenv

path = '/python_extended/whatever.env'
print(f'> Using .env file: {path}')
load_dotenv(path)

This way, keeping the current docker deploy of the image, you can add new modules by just deploying them to your local path in ${AIRFLOW_PROJ_DIR:-.}/python.

Hope this helps :/

Fibrinolysis answered 13/9, 2023 at 0:22 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.