Connecting Airflow and Minio s3
Asked Answered
P

1

2

I am using docker compose with bitnami's airflow image as well as minio. I can get airflow to talk to AWS S3, but when I try to substitute Minio I am getting this error:

File "/opt/bitnami/airflow/venv/lib/python3.8/site-packages/botocore/client.py", line 719, in _make_api_call
    raise error_class(parsed_response, operation_name)
botocore.exceptions.ClientError: An error occurred (403) when calling the HeadObject operation: Forbidden

Here's the .env:

OBJECT_STORE=s3://xxxx:xxxxx@S3?host%3Dhttp%3A%2F%2Fminio1%3A9001

Here's the environment connection in compose:

AIRFLOW_CONN_AWS_S3=${OBJECT_STORE}

Here's the Airflow test dag:

default_args = {
    'owner': 'airflow', 
    'retries': 1,
    'retry_delay': timedelta(seconds=5),
    'provide_context': True
}

dag = DAG(
    dag_id='s3_test',
    tags=['ti'],
    default_args=default_args,
    start_date=days_ago(2),
    schedule_interval='0 * * * *',
    catchup=False
)

def func_test():
    s3 = S3Hook('aws_s3')
    obj = s3.get_key("file.csv", "mybucket")
    contents = obj.get()['Body'].read().decode('utf-8')
    print('contents', contents)

t1 = PythonOperator(
    task_id='test',
    python_callable=func_test, 
    dag=dag
) 

t1

I know the file exists in the bucket and the path is correct. I gave the minio user account full admin rights too. Not sure what is causing the 403.

Purple answered 14/6, 2022 at 5:22 Comment(0)
A
1

Connection type S3 is now official removed from Airflow instead use aws. See: https://github.com/apache/airflow/pull/25980

A working example can be found here: Airflow and MinIO connection with AWS

NOTE: if test_connection is failing, it doesn't necessarily mean that the connection won't work!

The solution (all credits to Taragolis & hanleybrand) Create a new connection call it for example minio_s3, is type Amazon AWS, and only has the extra field set to:

{ 
  "aws_access_key_id": "your MinIO username", 
   "aws_secret_access_key": "your MinIO password",
  "endpoint_url": "http://localhost:9000", 
  "region_name": "us-east-1"
}

Please note. If you're running Airflow from a KinD cluster and MinIO on the same host in docker, you need to use host.docker.internal instead of localhost.

Aigrette answered 24/1, 2023 at 21:47 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.