How does consumer rebalancing work in Kafka?
Asked Answered
D

2

43

When a new consumer/brorker is added or goes down, Kafka triggers a rebalance operation. Is Kafka Rebalancing a blocking operation. Are Kafka consumers blocked while a rebalancing operation is in progress?

Djerba answered 28/11, 2014 at 4:1 Comment(0)
V
56

Depends on what you mean by "blocked". If you mean "are existing connections closed when rebalance is triggered" then the answer is yes. The current Kafka's rebalancing algorithm is unfortunately imperfect.

Here is what is happening during consumer rebalance.

Assume we have a topic with 10 partitions (0-9), and one consumer (lets name it consumer1) consuming it. When a second consumer appears (consumer2) the rebalance task triggers for both of them (consumer1 gets an event, consumer2 does the initial rebalance). Now consumer1 closes all the existing connections (even those that will be reopened soon) and releases the partition ownership in Zookeeper for all 10 partitions.

Then it runs the partition assignment algorithm and decides what partitions should be claimed and claims the partition ownership in Zookeeper again. If the claim was successful consumer1 starts fetching his new partitions.

Meanwhile consumer2 runs the partition assignment algorithm as well and tries to claim his partitions in Zookeeper as well. Claim will succeed only when consumer1 releases the ownership on these partitions. When the claim is successful consumer2 starts fetching, or if it fails to claim partitions within a given amount of retries you get a rebalance failed after n retries exception.

As you noticed instead of just closing connections and releasing ownership for partitions consumer1 does not own anymore, it unnecessarily closes ALL his connections and restarts with just a lower amount of partitions. The same story with adding partitions (when we consume by a wildcard filter and new topic appears) - ALL connections are closed and then opened again instead of just opening new ones.

So I hope this answers your question - fetching stops when rebalance kicks in.

Virgulate answered 2/12, 2014 at 12:25 Comment(1)
Very detailed answer. Do you happen to be able to explain why when having a topic with 3 partiotions and a single consumers, why when I restart the consumer it takes about one hour for the consumer group to rebalance?Sudhir
H
8

The accepted response above (from serejja) was correct in the past. Kafka has implemented "Incremental Cooperative Rebalancing" from version 2.3 (release date June 2019) and above. So now there is no need for all consumers to stop the processing ("stop the world event") to rebalance work in group fe. when new consumer appears in group or some consumer goes offline.

For more info see: From Eager to Smarter in Apache Kafka Consumer Rebalances

Hairdresser answered 7/10, 2020 at 9:48 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.