How to set SPARK_LOCAL_DIRS parameter using spark-env.sh file
Asked Answered
G

2

7

I am trying to change the location spark writes temporary files to. Everything I've found online says to set this by setting the SPARK_LOCAL_DIRS parameter in the spark-env.sh file, but I am not having any luck with the changes actually taking effect.

Here is what I've done:

  1. Created a 2-worker test cluster using Amazon EC2 instances. I'm using spark 2.2.0 and the R sparklyr package as a front end. The worker nodes are spun up using an auto scaling group.
  2. Created a directory to store temporary files in at /tmp/jaytest. There is one of these in each worker and one in the master.
  3. Puttied into the spark master machine and the two workers, navigated to home/ubuntu/spark-2.2.0-bin-hadoop2.7/conf/spark-env.sh, and modified the file to contain this line: SPARK_LOCAL_DIRS="/tmp/jaytest"

Permissions for each of the spark-env.sh files are -rwxr-xr-x, and for the jaytest folders are drwxrwxr-x.

As far as I can tell this is in line with all the advice I've read online. However, when I load some data into the cluster it still ends up in /tmp, rather than /tmp/jaytest.

I have also tried setting the spark.local.dir parameter to the same directory, but also no luck.

Can someone please advise on what I might be missing here?

Edit: I'm running this as a standalone cluster (as the answer below indicates that the correct parameter to set depends on the cluster type).

Gutenberg answered 29/8, 2018 at 2:41 Comment(2)
You restarted Spark service after making the changes?Hundley
Yes, restarted the service.Gutenberg
A
1

As per the spark documentation it is clearly saying that if you have configured Yarn Cluster manager then it will be overwrite the spark-env.sh setting. Can you just check in Yarn-env or yarn-site file for the local dir folder setting.

"this will be overridden by SPARK_LOCAL_DIRS (Standalone, Mesos) or LOCAL_DIRS (YARN) environment variables set by the cluster manager." source - https://spark.apache.org/docs/2.3.1/configuration.html

Afc answered 29/8, 2018 at 8:31 Comment(1)
Thanks Vijay. I'm running a standalone cluster, so as per the documentation I'm trying to set the SPARK_LOCAL_DIRS parameter rather than the spark.local.dir parameter (since the former will overwrite the latter).Gutenberg
K
1

Mac env, spark-2.1.0, and spark-env.sh contains:

export SPARK_LOCAL_DIRS=/Users/kylin/Desktop/spark-tmp

Using spark-shell, it works.

Did you use the right format?

Khalid answered 4/9, 2018 at 8:28 Comment(1)
Thanks kylin, tried adding export SPARK_LOCAL_DIRS=/path/to/dir to the conf file but didn't work using sparklyr (I'm not using the shell - need a solution that works via sparklyr). What do you mean by 'using the right format'?Gutenberg

© 2022 - 2024 — McMap. All rights reserved.