Kafka startup fails with zookeeper timeout (remote server), yet the machine can connect to zookeeper directly
Asked Answered
S

3

7

WHen I start kafka up, it fails quickly while complaining that it cannot connect to zookeeper. I am running zookeeper as a standalone cluster/ensemble. I am confused because there is no Firewall between the servers (as evidenced by the zookeeper-shell.sh test).

from /var/log/kafka/server.log

2016-02-24 16:07:12,101 INFO kafka.server.KafkaServer: [Kafka Server 1], Connecting to zookeeper on 10.7.20.100:2181,10.7.20.101:2181,10.7.20.102:2181
2016-02-24 16:07:20,291 FATAL kafka.server.KafkaServerStartable: Fatal error during KafkaServerStable startup. Prepare to shutdown
org.I0Itec.zkclient.exception.ZkTimeoutException: Unable to connect to zookeeper server within timeout: 6000
    at org.I0Itec.zkclient.ZkClient.connect(ZkClient.java:880)
    at org.I0Itec.zkclient.ZkClient.<init>(ZkClient.java:98)
    at org.I0Itec.zkclient.ZkClient.<init>(ZkClient.java:84)
    at kafka.server.KafkaServer.initZk(KafkaServer.scala:113)
    at kafka.server.KafkaServer.startup(KafkaServer.scala:69)
    at kafka.server.KafkaServerStartable.startup(KafkaServerStartable.scala:34)
    at kafka.Kafka$.main(Kafka.scala:46)
    at kafka.Kafka.main(Kafka.scala)
2016-02-24 16:07:20,294 INFO kafka.server.KafkaServer: [Kafka Server 1], shutting down
2016-02-24 16:07:20,312 INFO kafka.server.KafkaServer: [Kafka Server 1], shut down completed
2016-02-24 16:07:20,317 INFO kafka.server.KafkaServer: [Kafka Server 1], shutting down

However from the /opt/kafka install directory I am able to connect to zookeeper using the esemble connection string - so I really doubt it is network OR firewall.

[me@dckafka01 kafka]$ cd /opt/kafka
[me@dckafka01 kafka]$ bin/zookeeper-shell.sh 10.7.20.100:2181,10.7.20.101:2181,10.7.20.102:2181

Connecting to 10.7.20.100:2181,10.7.20.101:2181,10.7.20.102:2181
Welcome to ZooKeeper!
JLine support is disabled
WATCHER::WatchedEvent state:SyncConnected type:None path:null

get /blah
null
cZxid = 0x400000009
ctime = Tue Feb 16 09:00:28 EST 2016
mZxid = 0x400000009
mtime = Tue Feb 16 09:00:28 EST 2016
pZxid = 0x40000017e
cversion = 2
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x0
dataLength = 0
numChildren = 2

ls /blah
[applications, registry]

Which is as expected. Does anybody have an angle for me to investigate?

Synectics answered 24/2, 2016 at 21:19 Comment(0)
S
10

Well - changing the timeout helped. now i need to chase the network delays down

cat config/server.properties

# coding: UTF-8 
# This file created by Chef from template. Do not hand edit this file

log.dirs=/var/kafka
port=9092
num.partitions=4
default.replication.factor=3
log.flush.interval.messages=1
log.retention.minutes=43200
log.retention.check.interval.ms=3600000
num.replica.fetchers=4
replica.fetch.wait.max.ms=5000
replica.lag.max.messages=10000
auto.leader.rebalance.enable=true
num.network.threads=8
advertised.host.name=10.7.20.71
zookeeper.connection.timeout.ms=16000
broker.id=1
zookeeper.connect=10.7.20.100:2181,10.7.20.101:2181,10.7.20.102:2181
Synectics answered 24/2, 2016 at 22:52 Comment(4)
Did you manage to find any reason why your kafka was timeouting?Amigo
@radoslaw.busz unfortunately, noSynectics
@Synectics In my case, I noticed everytime I kept Kafka idle, it used to timeout, and not reconnectAtwitter
@JohnStrood: What version of Kafka did you see this with, and are you aware of the problem being documented in a JIRA issue?Chump
A
1

In my case, I just found that my command prompt which was running the zookeepers, kind of hung (usually happens in windows).

I just had to randomly press some key and the cmd was active again. And then running the command gave me no errors.

Allred answered 21/9, 2020 at 8:4 Comment(2)
what OS are you running on?Synectics
It's on windowsAllred
F
0

I found this thread while looking for a solution to my specific problem with Kafka not being able to connect to Zookeeper. In your case, I think it's only a delay in Zookeeper to start up and get ready to receive connections (listening to the socket). So probably a better solution is to wait before starting Kafka or best have a script that checks Zookeeper nodes are ready to receive connections and then start Kafka. I don't think it's good to change the configured timeout only for Kafka startup. It should be changed, for example, if you consider your network is too slow (you would need a higher number) or it is fast enough to deal with a shorter timeout.

Flyleaf answered 1/2, 2019 at 20:22 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.