We are running SQL queries against Spark EMR cluster using Spark Thrift Server and we see that when a SQL query (translated to Spark job) is finished, it's shuffle files located under /mnt/yarn/usercache/root/appcache
are not cleaned. This causes No space left on device
eventually after running several queries.
If we stop the Spark Thrift Server, the shuffle files are then cleaned. Is there any way to make the cleanup run not only after the application is stopped but after every job run? We tried setting the following parameters
yarn.nodemanager.localizer.cache.cleanup.interval-ms=6000
yarn.nodemanager.localizer.cache.target-size-mb=1000
but the files are still not cleaned. Any idea why it happens and how can we avoid it?