Problem Could not find any valid local directory for s3ablock-0001-

Asked 13/10, 2020 at 20:14 Answered 19/4, 2024 at 6:40

I'm facing a problem running Jobs on an Amazon EMR when I try to write data on S3.

This is the stacktrace:

org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any valid local directory for s3ablock-0001-
    at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:463)
    at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.createTmpFileForWrite(LocalDirAllocator.java:477)
    at org.apache.hadoop.fs.LocalDirAllocator.createTmpFileForWrite(LocalDirAllocator.java:213)
    at org.apache.hadoop.fs.s3a.S3AFileSystem.createTmpFileForWrite(S3AFileSystem.java:589)
    at org.apache.hadoop.fs.s3a.S3ADataBlocks$DiskBlockFactory.create(S3ADataBlocks.java:811)
    at org.apache.hadoop.fs.s3a.S3ABlockOutputStream.createBlockIfNeeded(S3ABlockOutputStream.java:190)
    at org.apache.hadoop.fs.s3a.S3ABlockOutputStream.(S3ABlockOutputStream.java:168)
    at org.apache.hadoop.fs.s3a.S3AFileSystem.create(S3AFileSystem.java:822)
    at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1125)
    at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1105)
    at org.apache.parquet.hadoop.util.HadoopOutputFile.create(HadoopOutputFile.java:74)
    at org.apache.parquet.hadoop.ParquetFileWriter.(ParquetFileWriter.java:248)
    at org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:390)
    at org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:349)
    at org.apache.spark.sql.execution.datasources.parquet.ParquetOutputWriter.(ParquetOutputWriter.scala:37)
    at org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$$anon$1.newInstance(ParquetFileFormat.scala:158)
    at org.apache.spark.sql.execution.datasources.SingleDirectoryDataWriter.newOutputWriter(FileFormatDataWriter.scala:126)
    at org.apache.spark.sql.execution.datasources.SingleDirectoryDataWriter.(FileFormatDataWriter.scala:111)
    at org.apache.spark.sql.execution.datasources.FileFormatWriter$.executeTask(FileFormatWriter.scala:264)
    at org.apache.spark.sql.execution.datasources.FileFormatWriter$.$anonfun$write$15(FileFormatWriter.scala:205)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
    at org.apache.spark.scheduler.Task.run(Task.scala:127)
    at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:444)
    at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:447)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)

I builded a image using Amazon ECS for EMR with Apache Spark 3.0.1 and Hadoop 3.2.1, I've already tried using Apache Spark 2.4.5 and Hadoop 2.7.1 with no success.

When we build up the EMR without ECS Image, manually, the job finished with success and writes everything it needs on S3.

I plead to you guys, what I need to do to put this things up and running? Thanks a lot.

Nd answered 13/10, 2020 at 20:14 Comment(0)

you need to give the app a directory to store data in

spark.hadoop.fs.s3a.buffer.dir /tmp,/drive1/tmp

Normally it picks up what hadoop.tmp.dir is set to. Maybe you just don't have enough disk space, or the option is set to somewhere on a small root drive.

Better: include a entry for every disk you have, it will try to use any which has enough space.

Further reading How S3A writes data to S3

On Hadoop 3.2.1 you can tell S3A to buffer in heap or bytebuffer, so not use local disk at all.

spark.hadoop.fs.s3a.fast.upload.buffer bytebuffer

We do that in some deployments where the process doesn't have write access to the local FS and/or there's no capacity. But you then need to put in effort tuning some of the other related parameters to avoid buffering too much data -the limited bandwidth from EC2 to S3 can build up big backlogs

Actually, that may be the problem with disk buffering too -maybe you are just creating data faster than it can be uploaded. Try limiting the number of blocks which single output stream (here: spark worker thread) can have queued for upload before the stream writes block:

spark.hadoop.fs.s3a.fast.upload.active.blocks 1

That and/or smaller number of worker threads.

Please update this post which whatever worked, so others can make use your findings

Superiority answered 14/10, 2020 at 8:49 Comment(5)

thanks @stevel, your explanation was great and helped me find my solution, but the solution was related to the s3 protocol that I used, as described in the documentation that you send – Nd 16/10, 2020 at 18:35

@BrunoBernardes May I ask what's the final solution? I got the same error in my job as well. Thanks! – Enneagon 17/1, 2021 at 8:18

I am encountering this issue, when impala tries to insert overwrite a table that has location set to s3, any lead, ? thanks – Primero 18/1, 2021 at 1:24

where DO I set this spark.hadoop.fs.s3a.fast.upload.buffer bytebuffer? – Cinerator 19/4, 2022 at 14:20

Great answer ! I could fix the same issue using bytebuffer, but might have to fall back to disk again, we will see – Account 5/9, 2023 at 22:6

Try to use offheap or bytebuffer instead of disk for S3A uploads. Most likely another process evicts s3ablock files during uploads.

I ran into this issue while writing to a delta table when delta io cache was enabled (read another delta table).

Shingle answered 9/9, 2021 at 8:45 Comment(1)

valid point there -we need to make sure that blocks are created with unique IDs. risk of corruption otherwise. Be aware that the memory storage options can run out of memory fast -reduce the number of blocks which be queued for uploads if you play with this – Superiority 24/10, 2023 at 10:20

For me this was showing up due to a permission issue with the directory set in hadoop.tmp.dir. For example, when I changed the permissions to 777 for testing, my job started running successfully.

Krp answered 4/8, 2022 at 18:42 Comment(0)

If you are running spark container via docker and getting this error. It might be related to out of space issue for docker and not related to spark or hadoop itself. Therefore, before trying the other solutions, it would be better to try

docker system prune -a --volumes

This will remove: - all stopped containers - all networks not used by at least one container - all anonymous volumes not used by at least one container - all images without at least one container associated to them - all build cache

Or, you can also try to manually delete unused volumes, images etc. via docker desktop. Then, retry running your spark job.

Annaleeannaliese answered 19/4, 2024 at 6:40 Comment(0)

Recommended topics

Hot tags