File already exists error while writing Spark dataframe to S3 using AWS Glue

I'm using this command to write a dataframe to S3:

df.write.option("delimiter","|").option("header",True).option("compression", "gzip").mode("overwrite").format("csv").save("s3://bucketname/metrics/parsed/")

But I'm always getting this error, just the filename keeps changing:

An error occurred while calling o293.save. File already exists:s3://bucketname/metrics/parsed/part-01195-6ef08750-dbf5-41c6-b024-501403820268-c000.csv.gz

Full error:

"Failure Reason": "JobFailed(org.apache.spark.SparkException: Job aborted due to stage failure: Task 1195 in stage 11.0 failed 4 times, most recent failure: 
Lost task 1195.3 in stage 11.0 (TID 3023) (172.36.67.235 executor 9):
 org.apache.hadoop.fs.FileAlreadyExistsException: File already exists

I tried the following but it didn't work, and ends up giving the same error:

Added coalesce(100) in the command
Writing to a new destination, with and without the .mode("overwrite") option
Exporting the data in parquet format
Writing with .mode("append") option

I couldn't find anything helpful which could help resolve this, except this post but I'm using Glue 3.0 (Spark 3.1) hence this shouldn't be applicable.

Recommended topics

Hot tags