Spark cluster Master IP address not binding to floating IP
Asked Answered
G

2

9

I'm trying to configure a Spark cluster using OpenStack. Currently I have two servers named

  • spark-master (IP: 192.x.x.1, floating IP: 87.x.x.1)
  • spark-slave-1 (IP: 192.x.x.2, floating IP: 87.x.x.2)

I am running into problems when trying to use these floating IPs vs the standard public IPs.

On the spark-master machine, the hostname is spark-master and /etc/hosts looks like

127.0.0.1 localhost
127.0.1.1 spark-master

The only change made to spark-env.sh is export SPARK_MASTER_IP='192.x.x.1'. If I run ./sbin/start-master.sh I can view the web UI.

The thing is I view the web UI using the floating IP 87.x.x.1, and there it lists the Master URL: spark://192.x.x.1:7077.

From the slave I can run ./sbin/start-slave.sh spark://192.x.x.1:7077 and it connects successfully.

If I try to use the floating IP by changing spark-env.sh on the master to export SPARK_MASTER_IP='87.x.x.1' then I get the following error log

Spark Command: /usr/lib/jvm/java-7-openjdk-amd64/bin/java -cp /usr/local/spark-1.6.1-bin-hadoop2.6/conf/:/usr/local/spark-1.6.1-bin-hadoop2.6/lib/spark-assembly-1.6.1-hadoop2.6.0.jar:/usr/local/spark-1.6.1-bin-hadoop2.6/lib/datanucleus-core-3.2.10.jar:/usr/local/spark-1.6.1-bin-hadoop2.6/lib/datanucleus-api-jdo-3.2.6.jar:/usr/local/spark-1.6.1-bin-hadoop2.6/lib/datanucleus-rdbms-3.2.9.jar -Xms1g -Xmx1g -XX:MaxPermSize=256m org.apache.spark.deploy.master.Master --ip 87.x.x.1 --port 7077 --webui-port 8080
========================================
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
16/05/12 15:05:33 INFO Master: Registered signal handlers for [TERM, HUP, INT]
16/05/12 15:05:33 WARN Utils: Your hostname, spark-master resolves to a loopback address: 127.0.1.1; using 192.x.x.1 instead (on interface eth0)
16/05/12 15:05:33 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
16/05/12 15:05:33 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/05/12 15:05:33 INFO SecurityManager: Changing view acls to: ubuntu
16/05/12 15:05:33 INFO SecurityManager: Changing modify acls to: ubuntu
16/05/12 15:05:33 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(ubuntu); users with modify permissions: Set(ubuntu)
16/05/12 15:05:33 WARN Utils: Service 'sparkMaster' could not bind on port 7077. Attempting port 7078.
16/05/12 15:05:33 WARN Utils: Service 'sparkMaster' could not bind on port 7078. Attempting port 7079.
16/05/12 15:05:33 WARN Utils: Service 'sparkMaster' could not bind on port 7079. Attempting port 7080.
16/05/12 15:05:33 WARN Utils: Service 'sparkMaster' could not bind on port 7080. Attempting port 7081.
16/05/12 15:05:33 WARN Utils: Service 'sparkMaster' could not bind on port 7081. Attempting port 7082.
16/05/12 15:05:33 WARN Utils: Service 'sparkMaster' could not bind on port 7082. Attempting port 7083.
16/05/12 15:05:33 WARN Utils: Service 'sparkMaster' could not bind on port 7083. Attempting port 7084.
16/05/12 15:05:33 WARN Utils: Service 'sparkMaster' could not bind on port 7084. Attempting port 7085.
16/05/12 15:05:33 WARN Utils: Service 'sparkMaster' could not bind on port 7085. Attempting port 7086.
16/05/12 15:05:33 WARN Utils: Service 'sparkMaster' could not bind on port 7086. Attempting port 7087.
16/05/12 15:05:33 WARN Utils: Service 'sparkMaster' could not bind on port 7087. Attempting port 7088.
16/05/12 15:05:33 WARN Utils: Service 'sparkMaster' could not bind on port 7088. Attempting port 7089.
16/05/12 15:05:33 WARN Utils: Service 'sparkMaster' could not bind on port 7089. Attempting port 7090.
16/05/12 15:05:33 WARN Utils: Service 'sparkMaster' could not bind on port 7090. Attempting port 7091.
16/05/12 15:05:33 WARN Utils: Service 'sparkMaster' could not bind on port 7091. Attempting port 7092.
16/05/12 15:05:33 WARN Utils: Service 'sparkMaster' could not bind on port 7092. Attempting port 7093.
Exception in thread "main" java.net.BindException: Cannot assign requested address: Service 'sparkMaster' failed after 16 retries!
  at sun.nio.ch.Net.bind0(Native Method)
  at sun.nio.ch.Net.bind(Net.java:463)
  at sun.nio.ch.Net.bind(Net.java:455)
  at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
  at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
  at io.netty.channel.socket.nio.NioServerSocketChannel.doBind(NioServerSocketChannel.java:125)
  at io.netty.channel.AbstractChannel$AbstractUnsafe.bind(AbstractChannel.java:485)
  at io.netty.channel.DefaultChannelPipeline$HeadContext.bind(DefaultChannelPipeline.java:1089)
  at io.netty.channel.AbstractChannelHandlerContext.invokeBind(AbstractChannelHandlerContext.java:430)
  at io.netty.channel.AbstractChannelHandlerContext.bind(AbstractChannelHandlerContext.java:415)
  at io.netty.channel.DefaultChannelPipeline.bind(DefaultChannelPipeline.java:903)
  at io.netty.channel.AbstractChannel.bind(AbstractChannel.java:198)
  at io.netty.bootstrap.AbstractBootstrap$2.run(AbstractBootstrap.java:348)
  at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357)
  at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357)
  at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
  at java.lang.Thread.run(Thread.java:745)

Obviously the takeaway here for me is the line

Your hostname, spark-master resolves to a loopback address: 127.0.1.1; using 192.x.x.1 instead (on interface eth0) 16/05/12 15:05:33 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address

but no matter what approach I then try and take I just run into more errors.

If I set both export SPARK_MASTER_IP='87.x.x.1' and export SPARK_LOCAL_IP='87.x.x.1' and try ./sbin/start-master.sh I get the following error log

16/05/17 11:00:55 WARN Utils: Service 'sparkMaster' could not bind on port 7077. Attempting port 7078.
16/05/17 11:00:55 WARN Utils: Service 'sparkMaster' could not bind on port 7078. Attempting port 7079.
16/05/17 11:00:55 WARN Utils: Service 'sparkMaster' could not bind on port 7079. Attempting port 7080.
16/05/17 11:00:55 WARN Utils: Service 'sparkMaster' could not bind on port 7080. Attempting port 7081.
16/05/17 11:00:55 WARN Utils: Service 'sparkMaster' could not bind on port 7081. Attempting port 7082.
16/05/17 11:00:55 WARN Utils: Service 'sparkMaster' could not bind on port 7082. Attempting port 7083.
16/05/17 11:00:55 WARN Utils: Service 'sparkMaster' could not bind on port 7083. Attempting port 7084.
16/05/17 11:00:55 WARN Utils: Service 'sparkMaster' could not bind on port 7084. Attempting port 7085.
16/05/17 11:00:55 WARN Utils: Service 'sparkMaster' could not bind on port 7085. Attempting port 7086.
16/05/17 11:00:55 WARN Utils: Service 'sparkMaster' could not bind on port 7086. Attempting port 7087.
16/05/17 11:00:55 WARN Utils: Service 'sparkMaster' could not bind on port 7087. Attempting port 7088.
16/05/17 11:00:55 WARN Utils: Service 'sparkMaster' could not bind on port 7088. Attempting port 7089.
16/05/17 11:00:55 WARN Utils: Service 'sparkMaster' could not bind on port 7089. Attempting port 7090.
16/05/17 11:00:55 WARN Utils: Service 'sparkMaster' could not bind on port 7090. Attempting port 7091.
16/05/17 11:00:55 WARN Utils: Service 'sparkMaster' could not bind on port 7091. Attempting port 7092.
16/05/17 11:00:55 WARN Utils: Service 'sparkMaster' could not bind on port 7092. Attempting port 7093.
Exception in thread "main" java.net.BindException: Cannot assign requested address: Service 'sparkMaster' failed after 16 retries!

This, despite the fact my security group seems correct

ALLOW IPv4 443/tcp from 0.0.0.0/0
ALLOW IPv4 80/tcp from 0.0.0.0/0
ALLOW IPv4 8081/tcp from 0.0.0.0/0
ALLOW IPv4 8080/tcp from 0.0.0.0/0
ALLOW IPv4 18080/tcp from 0.0.0.0/0
ALLOW IPv4 7077/tcp from 0.0.0.0/0
ALLOW IPv4 4040/tcp from 0.0.0.0/0
ALLOW IPv4 to 0.0.0.0/0
ALLOW IPv6 to ::/0
ALLOW IPv4 22/tcp from 0.0.0.0/0
Gabfest answered 12/5, 2016 at 15:9 Comment(1)
Were you able to solve your problem? I can create a cluster from computers which share same private network, but when I am trying to do similar things (assigning public IP to nodes), it is not working. Link to my question: #48021157Helsinki
D
5

I've set a spark cluster (standalone cluster) on Openstack myself and in my /etc/hosts file on the master, I have:

127.0.0.1 localhost

192.168.1.2 spark-master instead of 127.0.0.1

Now, since I have a virtual private network for my master and my slaves, I only work with the private IPs. The only time I use the floating IP is on my host computer when I launch spark-submit --master spark://spark-master (spark-master here resolves to the floating IP). I don't think you need to try to bind the floating IP. I hope that helps!

Bruno

Delicatessen answered 18/5, 2016 at 13:6 Comment(2)
What do you set SPARK_LOCAL_IP and SPARK_MASTER_IP to in the conf file?Weigela
I didn't set SPARK_LOCAL_IP but in my conf file, in spark.master I have: spark://spark-master:7077 (spark-master is 192.168.1.2), hope it helps!Delicatessen
I
2

As appears in logs,

Your hostname, spark-master resolves to a loopback address: 127.0.1.1; using 192.x.x.1 instead (on interface eth0)

Spark automatically try to get the IP of the host, and it uses the other IP 192.x.x.1 rather than the floating IP 87.x.x.1

To resolve this problem you should set SPARK_LOCAL_IP=87.x.x.1 (prefereably in spark-env.sh) and start your master again

Ipswich answered 16/5, 2016 at 9:26 Comment(9)
So if I set SPARK_LOCAL_IP=87.x.x.1 do I also set SPARK_MASTER_IP=87.x.x.1 in that same spark-env.sh?Weigela
Yes, that's exactly what I meantIpswich
Could you force Spark to use IPV4? in spark-env.sh add the followinig line: export SPARK_DAEMON_JAVA_OPTS="-Djava.net.preferIPv4Stack=true"Ipswich
If I set SPARK_LOCAL_IP and SPARK_MASTER_IP to 87.x.x.1 then I get the same error I described above. That is with /etc/hosts/ having 127.0.1.1 spark-masterWeigela
I see. But I think if you force the IPV4 for spark, we will be sure that the problem comes not from spark trying to use IPV6Ipswich
Forcing IPV4 as you suggest doesn't change anything for meWeigela
Would you please set the log level to INFO or event DEBUG? So we could see the actuall address Spark trying to bind toIpswich
In the logs it shows the attempted command --ip 87.x.x.1 --port 7077 --webui-port 8080Weigela
Yes I could see that in the logs. But actually I ve had this problem before where spark command show me the port and host to bind to, but it is just the command issued by Spark. After that Spark will do host name resolution and check if the ports are available. That could bee seen in the log INFOIpswich

© 2022 - 2024 — McMap. All rights reserved.