I am trying to change the location spark writes temporary files to. Everything I've found online says to set this by setting the SPARK_LOCAL_DIRS
parameter in the spark-env.sh
file, but I am not having any luck with the changes actually taking effect.
Here is what I've done:
- Created a 2-worker test cluster using Amazon EC2 instances. I'm using spark 2.2.0 and the R
sparklyr
package as a front end. The worker nodes are spun up using an auto scaling group. - Created a directory to store temporary files in at
/tmp/jaytest
. There is one of these in each worker and one in the master. - Puttied into the spark master machine and the two workers, navigated to
home/ubuntu/spark-2.2.0-bin-hadoop2.7/conf/spark-env.sh
, and modified the file to contain this line:SPARK_LOCAL_DIRS="/tmp/jaytest"
Permissions for each of the spark-env.sh
files are -rwxr-xr-x
, and for the jaytest folders are drwxrwxr-x
.
As far as I can tell this is in line with all the advice I've read online. However, when I load some data into the cluster it still ends up in /tmp
, rather than /tmp/jaytest
.
I have also tried setting the spark.local.dir
parameter to the same directory, but also no luck.
Can someone please advise on what I might be missing here?
Edit: I'm running this as a standalone cluster (as the answer below indicates that the correct parameter to set depends on the cluster type).