How to add a Minio connection to Airflow connections?
Asked Answered
L

6

6

I am trying to add a running instance of MinIO to Airflow connections, I thought it should be as easy as this setup in the GUI (never mind the exposed credentials, this is a blocked of environment and will be changed afterwards): enter image description here

Airflow as well as minio are running in docker containers, which both use the same docker network. Pressing the test button results in the following error:

'ClientError' error occurred while testing connection: An error occurred (InvalidClientTokenId) when calling the GetCallerIdentity operation: The security token included in the request is invalid.

I am curious about what I am missing. The idea was to set up this connection and then use a bucket for data-aware scheduling (= I want to trigger a DAG as soon as someone uploads a file to the bucket)

Levitation answered 15/12, 2022 at 7:56 Comment(2)
I am also facing error "'ClientError' error occurred while testing connection: An error occurred (InvalidClientTokenId) when calling the GetCallerIdentity operation: The security token included in the request is invalid." ...... but I am not clear from both answers below are not mentioning how to fix this token-related issue or how to get this token....so @Roland did you get the solution to your token-related error ? please someone could help me regarding thisG
For me the connection test is shows the error message as you said, but I was able to use the Minio and all operations. The test is wrong for minio somehow, Isn't it ? ~Correct me If I am wrong~Carolacarolan
R
0

I am also facing the problem that the endpoint URL refused connection. what I have done is the is actually running in the docker container so we should give docker host url

{ "aws_access_key_id":"your_minio_access_key", "aws_secret_access_key": "your_minio_secret_key", "host": "http://host.docker.internal:9000" }

enter image description here

Reflux answered 18/12, 2022 at 11:15 Comment(1)
Not quite sure, what you mean with what I have done is the is actually running in the docker container, but as mentioned in my question, both airflow and minio are running in docker containers in the same network, the address I have set is the internal address (minio:9000) of the container. Externally I expose both containers to different ports.Levitation
M
0

I am also facing this error in Airflow 2.5.0. I've found workaround using boto3 library that already buit-in.

Firsty I created connection with parameters:

Connection Id: any label (Minio in my case)

Connection Type: Generic

Host: minio server ip and port

Login: Minio access key

Password: Minio secret key

And here's my code:

import boto3
from airflow.hooks.base import BaseHook

conn = BaseHook.get_connection('Minio')

s3 = boto3.resource('s3',
                     endpoint_url=conn.host,
                     aws_access_key_id=conn.login,
                     aws_secret_access_key=conn.password
)
s3client = s3.meta.client 

#and then you can use boto3 methods for manipulating buckets and files
#for example:

bucket = s3.Bucket('test-bucket')
# Iterates through all the objects, doing the pagination for you. Each obj
# is an ObjectSummary, so it doesn't contain the body. You'll need to call
# get to get the whole body.
for obj in bucket.objects.all():
    key = obj.key
Marella answered 20/12, 2022 at 7:17 Comment(1)
Hi, thanks for taking the time to answer! :) However, the answer refers to using boto , which could be used to gather bucket notifications for example, but I am looking for a general solution in Airflow to trigger pipelines in a data aware manner, without any actual access code on airflow. According to the documentation this should in theory be possible.Levitation
P
0

I was also facing the same issue and was unable to test connection after putting details in web page for create connection. Seems connection works during DAG run but fails during test connection in web UI also found the same mentioned in airflow amazon provider's Wiki page

Breaking changes Warning

In this version of provider Amazon S3 Connection (conn_type="s3") removed due to the fact that it was always an alias to AWS connection conn_type="aws" In practice the only impact is you won’t be able to test the connection in the web UI / API. In order to restore ability to test connection you need to change connection type from Amazon S3 (conn_type="s3") to Amazon Web Services (conn_type="aws") manually.

Phene answered 23/4, 2023 at 12:49 Comment(0)
G
0

It is late but if anyone has issues connecting:

You can run your DAG even when testing the connection fails. If you are on Mac or even Windows and your service is on a default bridge network then you should use the docker internal address host ip resolution or get the container ip address (not advised as it is dynamic).

Ensure both AWS Access Key Id and AWS Secret Access Key fields are empty and only the extra filed supplied.

This is an example of the extra field detail:

{ "aws_access_key_id": "xxxxxxx", "aws_secret_access_key": "xxxxxx", "endpoint_url": "http://host.docker.internal:9000" }

Note that endpoint_url can be replaced with host.

In addition, I assume your airflow and minio are running on default bridge network, -p 9000:9000 and are on same docker-compose.yaml file or specified together like this docker-compose -f apache_file.yaml -f minio_file.yaml up.

If they are on user-defined bridge network then you cannot use host.docker.internal you will have to use the minio container name/app name/define a static address. You may or may not publish the port, just expose it.

If they're on different user-defined bridge network then you are doing something more complex than I can explain here

If minio is not in docker then you want to get the host ip address and port (127.0.0.1:9000).

If you are running your setup in cloud, say EC2, then you need to use a user-defined bridge network.

Glossator answered 25/7, 2023 at 18:46 Comment(0)
O
0

When utilizing the test connection button in the UI, it invokes the AWS Security Token Service API GetCallerIdentity. From the documentation this is not supported by all S3 compatible services, refer to the Apache Airflow documentation.

If you use the Amazon Provider to communicate with AWS API compatible services (MinIO, LocalStack, etc.) test connection failure doesn’t mean that your connection has wrong credentials. Many compatible services provide only a limited number of AWS API services, and most of them do not implement the AWS STS GetCallerIdentity method.

Oilbird answered 30/4 at 17:20 Comment(0)
B
0

I Was getting the same error. Just Enter the below details in new connection:

  • AWS Access Key ID,
  • AWS Secret Access Key,
  • Extra; {"host": "http://host.docker.internal:9000"}

Dont test the connection in airflow. Just save the connection and use it directly in the DAG.

For me it worked.

Brown answered 5/10 at 5:4 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.