spark thrift server does not clean shuffle files
Asked Answered
V

0

6

We are running SQL queries against Spark EMR cluster using Spark Thrift Server and we see that when a SQL query (translated to Spark job) is finished, it's shuffle files located under /mnt/yarn/usercache/root/appcache are not cleaned. This causes No space left on device eventually after running several queries.

If we stop the Spark Thrift Server, the shuffle files are then cleaned. Is there any way to make the cleanup run not only after the application is stopped but after every job run? We tried setting the following parameters

yarn.nodemanager.localizer.cache.cleanup.interval-ms=6000
yarn.nodemanager.localizer.cache.target-size-mb=1000

but the files are still not cleaned. Any idea why it happens and how can we avoid it?

Vitality answered 9/11, 2017 at 13:20 Comment(2)
Did you find a solution in the end?Vannoy
Did you find a solution in the end?Chauffer

© 2022 - 2024 — McMap. All rights reserved.