Worker failed to connect to master in Spark Apache
Asked Answered
E

2

7

I'm deploying a Spark Apache application using standalone cluster manager. My architecture uses 2 Windows machines: one set as a master, and another set as a slave (worker).

Master: on which I run: \bin>spark-class org.apache.spark.deploy.master.Master and this is what the web UI shows:

Slave: on which I run: \bin>spark-class org.apache.spark.deploy.worker.Worker spark://192.*.*.186:7077 and this what what the web UI shows:

The problem is that the worker node can not connect to the master node and shows the following error:

17/09/26 16:05:17 INFO Worker: Connecting to master 192.*.*.186:7077...
17/09/26 16:05:22 WARN Worker: Failed to connect to master 192.*.*.186:7077
org.apache.spark.SparkException: Exception thrown in awaitResult:
    at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:205)
    at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)
    at org.apache.spark.rpc.RpcEnv.setupEndpointRefByURI(RpcEnv.scala:100)
    at org.apache.spark.rpc.RpcEnv.setupEndpointRef(RpcEnv.scala:108)
    at org.apache.spark.deploy.worker.Worker$$anonfun$org$apache$spark$deploy$worker$Worker$$tryRegisterAllMasters$1$$anon$1.run(Worker.scala:241)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
  Caused by: java.io.IOException: Failed to connect to /192.*.*.186:7077
    at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:232)
    at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:182)
    at org.apache.spark.rpc.netty.NettyRpcEnv.createClient(NettyRpcEnv.scala:197)
    at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:194)
    at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:190)
    ... 4 more
 Caused by: io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection timed out: no further information: /192.*.*.186:7077
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
    at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:257)
    at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:291)
    at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:631)
    at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:566)
    at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:480)
    at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:442)
    at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:131)
    at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:144)
    ... 1 more

What can be the case of this error knowing that the firewall is disabled for both machines and I tested the connection between them both (using nmap) and everything is OK! But using telnet I receive this error: Connecting To 192.*.*.186...Could not open connection to the host, on port 23: Connect failed

Emendation answered 26/9, 2017 at 15:9 Comment(7)
did you try to connect manually using telnet?Revisory
how is that? and what is its utility?Emendation
You have to activate telnet (see social.technet.microsoft.com/wiki/contents/articles/…) and then run telnet 192.*.*.186 7077.Bedwell
on the master or on the worker node?Emendation
this is what I receive when running telnet Connecting To 192.*.*.186...Could not open connection to the host, on port 23: Connect failedEmendation
is your master up and running? worker node telnet maserhost portRevisory
yes the muster is up and it's running and this is the result of running telnet >telnet 192.*.*.186 7077 Connecting To 192.*.*.186...Could not open connection to the host, on port 7077: Connect failedEmendation
B
5

Can you show me your spark-env.sh conf? This would help to pinpoint your problem.

My first idea is that you need to export SPARK_MASTER_HOST=(master ip) instead of SPARK_MASTER_IP in spark-env.sh file. You need to do it for both master and slave. Also export SPARK_LOCAL_IP for both master and slave.

Blackdamp answered 2/10, 2017 at 7:16 Comment(1)
Hi. Request you to please help me resolve this issue: #58767918Slovakia
S
1

You need to set your environment path to SPARK_MASTER_HOST & SPARK_LOCAL_HOST to localhost.

SPARK_LOCAL_IP & SPARK_MASTER_IP is now deprecated.

Sheets answered 5/7, 2022 at 13:5 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.