What is the need of consumer group in kafka?
Asked Answered
H

2

5

I don't understand the practical use case of the consumer group in Kafka. A partition can only be read by only one consumer in a consumer group, so only a subset of a topic record is read by one consumer.

Can someone help with any practical scenario where the consumer group helps?

Hummer answered 8/2, 2023 at 19:39 Comment(3)
What happens when a consumer restarts? What offset will be used? Have you researched how "offset commits" are maintained?Teasley
@Teasley I think whenever consumer reads it maintains a log of it and next time when consumer starts reading again it will check that log and take offset from there.Hummer
Correct... But consumer groups make that possible. Specifically, restarting a consumer of the same group, rather than a new idTeasley
P
4

It's for parallel processing of event messages from the specific topic.

Consumers label themselves with a consumer group name, and each record published to a topic is delivered to one consumer instance within each subscribing consumer group. Consumer instances can be in separate processes or on separate machines.

If all the consumer instances have the same consumer group, then the records will effectively be load balanced over the consumer instances.

If all the consumer instances have different consumer groups, then each record will be broadcast to all the consumer processes.

Read more here:

Prunella answered 8/2, 2023 at 19:57 Comment(3)
Groups are also used to track comitted offsetsTeasley
can I declare how many consumers I need in a group through code? because right now where I work there is no mention of the number of consumers we want in a group. we just declare the groupId and that's it.Hummer
In spring-kafka there's a concurrency setting... Confluent maintains a "parallel consumer" external repo... Otherwise, you can create an ExecutorPool of threads, but the better option is to deploy multiple instances of your app, for example in auto scaling group or KubernetesTeasley
C
2

It is used for parallel processing .

Single Consumer: A single consumer can read from multiple partitions of a topic. However, this can limit throughput because the consumer must handle messages sequentially from all assigned partitions, which might become a bottleneck.

Consumer Group: In a consumer group, multiple consumers work together to read messages from a topic’s partitions. Each partition is assigned to only one consumer within the group at any given time. This allows for parallel processing of messages:

=> For example, if you have a topic with four partitions and you deploy two consumers in a consumer group, Consumer 1 might read from Partition 0 and Partition 1, while Consumer 2 reads from Partition 2 and Partition 3. This setup allows both consumers to process messages simultaneously, thus increasing throughput.

Fault Tolerance: If one consumer in the group goes down, the system automatically handles the situation to maintain message processing. In this scenario:

=> The partitions assigned to the failed consumer are reassigned to the remaining consumers in the group. => If there is an inactive or idle third consumer available in the group, it can take over the partitions of the failed consumer. This rebalancing ensures that message consumption continues smoothly without significant interruptions or delays.

Cheapjack answered 9/8 at 5:18 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.