Why do we need to mention Zookeeper details even though Apache Kafka configuration file already has it?

I am using Apache Kafka in (Plain Vanilla) Hadoop Cluster for the past few months and out of curiosity I am asking this question. Just to gain additional knowledge about it.

Kafka server.properties file already has the below parameter :

zookeeper.connect=localhost:2181

And I am starting Kafka Server/Broker with the following command :

bin/kafka-server-start.sh config/server.properties

So I assume that Kafka automatically infers the Zookeeper details by the time we start the Kafka server itself. If that's the case, then why do we need to explicitly mention the zookeeper properties while we create Kafka topics the syntax for which is given below for your reference :

bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic test

As per the Kafka documentation we need to start zookeeper before starting Kafka server. So I don't think Kafka can be started by commenting out the zookeeper details in Kafka's server.properties file

But atleast can we use Kafka to create topics and to start Kafka Producer/Consumer without explicitly mentioning about zookeeper in their respective commands ?

The zookeeper.connect parameter in the Kafka properties file is needed for having each Kafka broker in the cluster connecting to the Zookeeper ensemble. Zookeeper will keep information about connected brokers and handling the controller election. Other than that, it keeps information about topics, quotas and ACL for example.

When you use the kafka-topics.sh tool, the topic creation happens at Zookeeper level first and then thanks to it, information are propagated to Kafka brokers and topic partitions are created and assigned to them (thanks to the elected controller). This connection to Zookeeper will not be needed in the future thanks to the new Admin Client API which provides some admin operations executed against Kafka brokers directly. For example, there is a opened JIRA (https://issues.apache.org/jira/browse/KAFKA-5561) and I'm working on it for having the tool using such API for topic admin operations.

Regarding producer and consumer ... the producer doesn't need to connect to Zookeeper while only the "old" consumer (before 0.9.0 version) needs Zookeeper connection because it saves topic offsets there; from 0.9.0 version, the "new" consumer saves topic offsets in real topics (__consumer_offsets). For using it you have to use the bootstrap-server option on the command line insteand of the zookeeper one.

Recommended topics

Hot tags