What is the need of consumer group in kafka?

Asked 8/2, 2023 at 19:39 Answered 9/8, 2024 at 5:18

I don't understand the practical use case of the consumer group in Kafka. A partition can only be read by only one consumer in a consumer group, so only a subset of a topic record is read by one consumer.

Can someone help with any practical scenario where the consumer group helps?

Hummer answered 8/2, 2023 at 19:39 Comment(3)

What happens when a consumer restarts? What offset will be used? Have you researched how "offset commits" are maintained? – Teasley 9/2, 2023 at 0:39

@Teasley I think whenever consumer reads it maintains a log of it and next time when consumer starts reading again it will check that log and take offset from there. – Hummer 9/2, 2023 at 15:23

Correct... But consumer groups make that possible. Specifically, restarting a consumer of the same group, rather than a new id – Teasley 9/2, 2023 at 21:30

It's for parallel processing of event messages from the specific topic.

Consumers label themselves with a consumer group name, and each record published to a topic is delivered to one consumer instance within each subscribing consumer group. Consumer instances can be in separate processes or on separate machines.

If all the consumer instances have the same consumer group, then the records will effectively be load balanced over the consumer instances.

If all the consumer instances have different consumer groups, then each record will be broadcast to all the consumer processes.

Read more here:

https://docs.confluent.io/5.3.3/kafka/introduction.html#consumers

Prunella answered 8/2, 2023 at 19:57 Comment(3)

Groups are also used to track comitted offsets – Teasley 9/2, 2023 at 0:40

can I declare how many consumers I need in a group through code? because right now where I work there is no mention of the number of consumers we want in a group. we just declare the groupId and that's it. – Hummer 9/2, 2023 at 15:22

In spring-kafka there's a concurrency setting... Confluent maintains a "parallel consumer" external repo... Otherwise, you can create an ExecutorPool of threads, but the better option is to deploy multiple instances of your app, for example in auto scaling group or Kubernetes – Teasley 8/8, 2023 at 12:59

It is used for parallel processing .

Single Consumer: A single consumer can read from multiple partitions of a topic. However, this can limit throughput because the consumer must handle messages sequentially from all assigned partitions, which might become a bottleneck.

Consumer Group: In a consumer group, multiple consumers work together to read messages from a topic’s partitions. Each partition is assigned to only one consumer within the group at any given time. This allows for parallel processing of messages:

=> For example, if you have a topic with four partitions and you deploy two consumers in a consumer group, Consumer 1 might read from Partition 0 and Partition 1, while Consumer 2 reads from Partition 2 and Partition 3. This setup allows both consumers to process messages simultaneously, thus increasing throughput.

Fault Tolerance: If one consumer in the group goes down, the system automatically handles the situation to maintain message processing. In this scenario:

=> The partitions assigned to the failed consumer are reassigned to the remaining consumers in the group. => If there is an inactive or idle third consumer available in the group, it can take over the partitions of the failed consumer. This rebalancing ensures that message consumption continues smoothly without significant interruptions or delays.

Cheapjack answered 9/8, 2024 at 5:18 Comment(0)

Recommended topics

Hot tags