Unable to read from s3 bucket using spark
Asked Answered
S

2

3
val spark = SparkSession
        .builder()
        .appName("try1")
        .master("local")
        .getOrCreate()

val df = spark.read
        .json("s3n://BUCKET-NAME/FOLDER/FILE.json")
        .select($"uid").show(5)

I have given the AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY as environment variables. I face below error while trying to read from S3.

Exception in thread "main" org.apache.hadoop.fs.s3.S3Exception: org.jets3t.service.S3ServiceException: S3 HEAD request failed for '/FOLDER%2FFILE.json' - ResponseCode=400, ResponseMessage=Bad Request

I suspect the error is caused due to "/" being converted to "%2F" by some internal function as the error shows '/FOLDER%2FFILE.json' instead of '/FOLDER/FILE.json'

Shumate answered 16/6, 2017 at 12:43 Comment(0)
R
0

Your spark (jvm) application cannot read environment variable if you don't tell it to, so a quick work around :

spark.sparkContext
     .hadoopConfiguration.set("fs.s3n.awsAccessKeyId", awsAccessKeyId)
spark.sparkContext
     .hadoopConfiguration.set("fs.s3n.awsSecretAccessKey", awsSecretAccessKey)

You'll also need to precise the s3 endpoint :

spark.sparkContext
     .hadoopConfiguration.set("fs.s3a.endpoint", "<<ENDPOINT>>");

To know more about what is AWS S3 Endpoint, refer to the following documentation :

Rachellerachis answered 16/6, 2017 at 13:11 Comment(2)
Thanks @elisah, I tried including your the aws credentials in the code like you have mentioned but I still have the same error with code 400. I'm assuming this is not an issue because of credentials as it would throw an authentication error that way (error code 403)?Shumate
there's a section on S3A troubleshooting in the Hadoop docs; you should start there. Let's just say "bad auth" has a lot of possible causesAlceste
W
0

bucket can be encrypted. require define sse algorithm and key

Whitsun answered 29/11, 2023 at 5:56 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.