Using aws profile with fs S3Filesystem
Asked Answered
K

4

7

Trying to use a specific AWS profile when using Apache Pyarrow. The documentation show no option to pass a profile name when instantiating S3FileSystem using pyarrow fs [https://arrow.apache.org/docs/python/generated/pyarrow.fs.S3FileSystem.html]

Tried to get around this by creating a session with boto3 and using that :

# include mfa profile
session = boto3.session.Session(profile_name="custom_profile")

# create filesystem with session
bucket = fs.S3FileSystem(session_name=session)

bucket.get_file_info(fs.FileSelector('bucket_name', recursive=True))

but this too fails :

OSError: When listing objects under key '' in bucket 'bucket_name': AWS Error [code 15]: Access Denied

is it possible to use fs with custom aws profile ?

~/.aws/credentials :

[default]
aws_access_key_id = <access_key>
aws_secret_access_key = <secret_key>

[custom_profile]
aws_access_key_id = <access_key>
aws_secret_access_key = <secret_key>
aws_session_token = <token>

additional context : all actions of users require MFA. custom AWS profile in credentials file stores token generated post MFA based authentication on the CLI, need to use that profile in the script

Ketcham answered 22/6, 2022 at 16:50 Comment(0)
K
-2

one can specify a token, but must also specify access key and secret key :

s3 = fs.S3FileSystem(access_key="", 
                     secret_key="",
                     session_token="")

one would also have to implement some method to parse the ~/.aws/credentials file to get access to these values or do it manually each time

Ketcham answered 22/6, 2022 at 19:21 Comment(0)
A
7

I think is better this way:

session = boto3.session.Session(profile_name="custom_profile")
credentials = session.get_credentials()

s3_files = fs.S3FileSystem(
    secret_key=credentials.secret_key,
    access_key=credentials.access_key,
    region=session.region_name,
    session_token=credentials.token)
Alfredoalfresco answered 25/10, 2022 at 8:32 Comment(0)
S
1

You should use environment variables. For example,

import os

os.environ["AWS_PROFILE"] = "custom_profile"

s3fs = fs.S3FileSystem()
Sackett answered 28/6, 2023 at 23:46 Comment(0)
R
1

If you are using the pyarrow.fs.S3FileSystem (which is different from the S3FileSystem in the s3fs package) then the only way to use a named SAML profile or SSO profile is this workaround suggested by Rafael

Note that the path does not start with 's3://'

import boto3
session = boto3.session.Session(profile_name="custom_profile")
credentials = session.get_credentials()

from pyarrow.fs import S3FileSystem

fs = S3FileSystem(
    secret_key=credentials.secret_key,
    access_key=credentials.access_key,
    region='your-region-3',
    session_token=credentials.token)

import pyarrow.parquet as pq
df = pq.read_table("bucket_name/path1/path2/",filesystem=fs)
print(df)
Raynor answered 11/8, 2023 at 9:49 Comment(0)
K
-2

one can specify a token, but must also specify access key and secret key :

s3 = fs.S3FileSystem(access_key="", 
                     secret_key="",
                     session_token="")

one would also have to implement some method to parse the ~/.aws/credentials file to get access to these values or do it manually each time

Ketcham answered 22/6, 2022 at 19:21 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.