I have one Kafka topic and five partitions for that one topic. There will be 5 consumer groups. Each consumer group has one service instances consuming from that topic.
Will the offset be the same in each consumer for the same record in Kafka?
I have one Kafka topic and five partitions for that one topic. There will be 5 consumer groups. Each consumer group has one service instances consuming from that topic.
Will the offset be the same in each consumer for the same record in Kafka?
By offset, if you mean the ordering of messages, then yes. It'd be the same for all consumers, because the ordering is determined by producers and brokers. So, if you have msg-1, msg-2, ..., msg-1000 in the topic, all the 5 consumers will consume those in that specific order. But the rate of consumption might vary. It has lots of variables (e.g. Network latency, network topology, consumer logic etc.) that determines the rate of consumption.
The offset is assigned by the broker when the message comes into the partition so it's unique and it's not related to the consumers (and consumer groups). It identifies the unique position that the record has inside the partition. On the other side, each consumer (in a consumer group) reading from a specific partition will track its own offset which will be different from consumers (in other consumer groups); the offset concept in this case is used for tracking the position inside the partition from which reading messages. Of course it's always a message offset.
offset
is different in producer context and consumer context? Lets say producer gave offset 5 for a message. And when consumer reads that message, it just marks that offset 5 as read. How are they different? –
Grecism By offset, if you mean the ordering of messages, then yes. It'd be the same for all consumers, because the ordering is determined by producers and brokers. So, if you have msg-1, msg-2, ..., msg-1000 in the topic, all the 5 consumers will consume those in that specific order. But the rate of consumption might vary. It has lots of variables (e.g. Network latency, network topology, consumer logic etc.) that determines the rate of consumption.
I think the question you are asking is this:
Can the same offset appear in more than one partition?
The answer is yes.
The below screenshot from Conduktor demonstrates this. The code to generate these events is provided below. Note that no "key" is provided for the messages which means they will be Round-Robin distributed to the partitions. If a fixed key is provided, then all messages go to the same topic and you will not see this effect.
#!/usr/bin/env python3
from confluent_kafka import Producer
from confluent_kafka import Message
def main():
topic = 'test_topic'
producer = create_producer()
number_of_messages = 30
for i in range(number_of_messages):
producer.produce(
topic=topic,
value=f'message {i} of {number_of_messages}')
producer.poll(3)
producer.flush(10)
def create_producer():
config = {
'bootstrap.servers': 'localhost:29092',
'client.id': 'produce_test',
'enable.idempotence': True,
}
producer = Producer(config)
return producer
if __name__ == '__main__':
main()
© 2022 - 2024 — McMap. All rights reserved.