How to set up Spark with Zookeeper for HA?
Asked Answered
P

2

13

I want to config Apache spark master to connect with Zookeeper

I have installed both of them and run Zookeeper.

In spark-env.sh, I add 2 lines:

-Dspark.deploy.recoveryMode=ZOOKEEPER

-Dspark.deploy.zookeeper.url=localhost:2181

But when I start Apache spark with ./sbin/start-all.sh

It shows errors

/home/deploy/spark-1.0.0/sbin/../conf/spark-env.sh: line 46: -Dspark.deploy.recoveryMode=ZOOKEEPER: command not found

/home/deploy/spark-1.0.0/sbin/../conf/spark-env.sh: line 47: -Dspark.deploy.zookeeper.url=localhost:2181: command not found

I want to know how to add Zookeeper settings on spark-env.sh

Principium answered 12/6, 2014 at 12:3 Comment(3)
looks like you didn't add those to the run command but as separate lines. Bash is interpreting them as commands.Relief
@maasg: I did not add those to the run command. I think I could add to spark-env.sh to use ./sbin/start-all.shPrincipium
could you post the complete file?Relief
R
14

Most probably you have added these lines directly to the file like so:

export SPARK_PREFIX=`dirname "$this"`/..
export SPARK_CONF_DIR="$SPARK_HOME/conf"
...
-Dspark.deploy.recoveryMode=ZOOKEEPER
-Dspark.deploy.zookeeper.url=localhost:2181

And when invoked by start-all.sh, bash complains that those -Dspark... are not valid commands. Note that spark_config.sh is a bash script and should contain valid bash expressions.

Following the configuration guide at High Availability, you should set SPARK_DAEMON_JAVA_OPTS with the options for: spark.deploy.recoveryMode, spark.deploy.zookeeper.url, and spark.deploy.zookeeper.dir.

Using your data, you need to add a line to spark-conf.sh like so:

export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER -Dspark.deploy.zookeeper.url=localhost:2181"
Relief answered 12/6, 2014 at 17:46 Comment(0)
P
1

Try adding the below line in spark_env.sh

export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER -Dspark.deploy.zookeeper.url=ZK1:2181,ZK2:2181,ZK3:2181 -Dspark.deploy.zookeeper.dir=/sparkha"

Please replace ZK1, ZK2 and ZK3 with your ZK quorum hosts and port and here /sparkha is the data store in ZK for spark , bu default it will be /spark Just tested , it worked for us . HTH

Pentachlorophenol answered 15/9, 2016 at 8:47 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.