How to connect airflow to minio s3
Asked Answered
N

1

0

I'm trying to run docker containers with airflow and minio and connect airflow tasks to buckets defined in minio. I'm using the new versions - airflow 2.1.3 and the newest minio image.

How would I get the access key and access secret in minio for the connection? How to define the connection in airflow?

I have tried multiple approaches and settings, but I keep getting: botocore.exceptions.ClientError: An error occurred (403) when calling the HeadObject operation: Forbidden

I defined the connection through the UI as:

conn type: s3
host: locals3 (name of the service in docker-compose)
login: user (also minio_root_user)
password: password  (also minio_root_password)
port: 9000

Here is the task I use to test the connection (taken from another stackoverflow question):

sensor = S3KeySensor(
    task_id='check_s3_for_file_in_s3',
    bucket_key='test',
    bucket_name='airflow-data',
    # aws_conn_id="aws_default",
    timeout=18 * 60 * 60,
    poke_interval=120,
    dag=dag)

Thank you.

EDIT: Docker-compose file:

version: '3.8'

# ====================================== AIRFLOW ENVIRONMENT VARIABLES =======================================
x-environment: &airflow_environment
  - AIRFLOW__API__AUTH_BACKEND=airflow.api.auth.backend.basic_auth
  - AIRFLOW__CORE__EXECUTOR=LocalExecutor
  - AIRFLOW__CORE__LOAD_DEFAULT_CONNECTIONS=False
  - AIRFLOW__CORE__LOAD_EXAMPLES=False
  - AIRFLOW__CORE__SQL_ALCHEMY_CONN=postgresql://airflow:airflow@postgres:5432/airflow
  - AIRFLOW__CORE__STORE_DAG_CODE=True
  - AIRFLOW__CORE__STORE_SERIALIZED_DAGS=True
  - AIRFLOW__WEBSERVER__EXPOSE_CONFIG=True

x-airflow-image: &airflow_image apache/airflow:2.1.3-python3.8
# ====================================== /AIRFLOW ENVIRONMENT VARIABLES =======================================

services:
  postgres:
    image: postgres:13-alpine
    healthcheck:
      test: [ "CMD", "pg_isready", "-U", "airflow" ]
      interval: 5s
      retries: 5
    environment:
      - POSTGRES_USER=airflow
      - POSTGRES_PASSWORD=airflow
      - POSTGRES_DB=airflow
    ports:
      - "5432:5432"

  init:
    image: *airflow_image
    depends_on:
      - postgres
    environment: *airflow_environment
    entrypoint: /bin/bash
    command: -c 'airflow db init && airflow users create --username user --password password --firstname Marin --lastname Marin --role Admin --email [email protected]'

  webserver:
    image: *airflow_image
    restart: always
    depends_on:
      - postgres
    ports:
      - "8080:8080"
    volumes:
      - logs:/opt/airflow/logs
    environment: *airflow_environment
    command: webserver

  scheduler:
    build:
      context: docker
      args:
        AIRFLOW_BASE_IMAGE: *airflow_image
    #    image: *airflow_image
    restart: always
    depends_on:
      - postgres
    volumes:
      - logs:/opt/airflow/logs
      - ./dags:/opt/airflow/dags
    environment: *airflow_environment
    command: scheduler

  locals3:
    image: minio/minio
    ports:
      - "9000:9000"
      - "9001:9001"
    environment:
      - MINIO_ROOT_USER=user
      - MINIO_ROOT_PASSWORD=password
    command: "server --console-address :9001 /data"
    volumes:
      - "locals3-data:/data"
    healthcheck:
      test: [ "CMD", "curl", "-f", "http://localhost:9000/minio/health/live" ]
      interval: 30s
      timeout: 20s
      retries: 3

  locals3_init:
    image: minio/mc
    depends_on:
      - locals3
    entrypoint: >
      /bin/sh -c "
      while ! /usr/bin/mc config host add locals3 http://locals3:9000 user password; do echo 'MinIO not up and running yet...' && sleep 1; done;
      echo 'Added mc host config.';
      /usr/bin/mc mb locals3/airflow-data;
      exit 0;
      "

volumes:
  logs:
  locals3-data:
Nihility answered 6/9, 2021 at 20:19 Comment(7)
Do the logs from minio provide any clues?Snicker
What is the exact requirement here.Bangka
What is your docker-compose file? Did you overwrite the aws_default connection in the Airflow Connections UI?Mixie
@Mixie yes I did overwrite the default aws_default connection.Nihility
@PrakashS I just want to test how to connect airflow 2 to a minio s3 bucketNihility
And how does your docker-file look like?Mixie
@Mixie added docker-compose to questionNihility
N
2

I hope this might help someone. So what I did is that I created a user and through the UI logged in with that user. Afterwards, through the UI I created a service account for that user which generated a secret key and access key.

The part of the docker-compose file that I changed looks like this now:

 locals3_init:
    image: minio/mc
    depends_on:
      - locals3
    entrypoint: >
      /bin/sh -c "
      while ! /usr/bin/mc config host add locals3 http://locals3:9000 user password; do echo 'MinIO not up and running yet...' && sleep 1; done;
      echo 'Added mc host config.';
      /usr/bin/mc admin user add locals3 airflow airflow_secret;
      echo 'Added user airflow.';
      /usr/bin/mc admin policy set locals3 readwrite user=airflow;
      /usr/bin/mc mb locals3/data;
      /usr/bin/mc alias set locals3 http://locals3 9RTK1ISXS13J85I4U6JS 4z+akfubnu+XZuoCXhqGwrtq+jgK2AYcrgGH5zsQ --api s3v4;
      exit 0;
      "

Perhaps some the mc commands could be cleaned up.

Afterwards I added a connection to the airflow. I added the secret key in the login field, the secret access key in the password field. For the connection type I chose S3.

Now, adding the name of the container (locals3 in my case) in the host field and port in the port field DOES NOT WORK. I added both the host and port through the extras:

{
    "host": "http://locals3:9000"
}

Afterwards I was able to connect.

I'm not sure whtether the connection would work if I added a service account through the root user or used the root credentials because I did not test this yet.

EDIT:

Tested with root user credentials and it works. So the problem seems to be in the way host and port were defined.

EDIT2:

Comparing the two connection strings:

  1. added host and port as extra value:

    s3://user:password@?host=http%3A%2F%2Flocals3%3A9000

  2. added host and port through fields:

    s3://user:password@locals3:9000

The only explanation I can find why the first one works, and the second doesn't is because the characters in the second are not url formatted.

Nihility answered 16/9, 2021 at 17:41 Comment(1)
I'm facing a similar issue here: #72612368 Any help is appreciatedMown

© 2022 - 2024 — McMap. All rights reserved.