Kafka, will different partitions have the same offset number
Asked Answered
L

3

5

I have one Kafka topic and five partitions for that one topic. There will be 5 consumer groups. Each consumer group has one service instances consuming from that topic.

Will the offset be the same in each consumer for the same record in Kafka?

Lurid answered 17/12, 2018 at 15:49 Comment(0)
J
2

By offset, if you mean the ordering of messages, then yes. It'd be the same for all consumers, because the ordering is determined by producers and brokers. So, if you have msg-1, msg-2, ..., msg-1000 in the topic, all the 5 consumers will consume those in that specific order. But the rate of consumption might vary. It has lots of variables (e.g. Network latency, network topology, consumer logic etc.) that determines the rate of consumption.

Jedidiah answered 18/12, 2018 at 5:27 Comment(2)
What I mean is will the same message in the 5 different partitions have the same offset number?Lurid
If you have replication factor set to 1 for this topic, then one message will only go to one partition. If it's more than one, then one replica is chosen as leader, and others become followers. The leader takes writes from producer, and the followers just copy the messages in order. So, in that case the same message will have same offset. But I don't think it has any implication to your question. Because the replicas are only used when there's a fail over scenario.Jedidiah
U
3

The offset is assigned by the broker when the message comes into the partition so it's unique and it's not related to the consumers (and consumer groups). It identifies the unique position that the record has inside the partition. On the other side, each consumer (in a consumer group) reading from a specific partition will track its own offset which will be different from consumers (in other consumer groups); the offset concept in this case is used for tracking the position inside the partition from which reading messages. Of course it's always a message offset.

Universe answered 18/12, 2018 at 6:18 Comment(5)
OK, so the same message in the five different partitions can have a different offset number.Lurid
when a message comes into a topic it goes to just one partition it isn't replicated to the 5 partitions.Universe
@Universe Do you mean offset is different in producer context and consumer context? Lets say producer gave offset 5 for a message. And when consumer reads that message, it just marks that offset 5 as read. How are they different?Grecism
Also, What I feel the OP meant is that, If we have 5 partitions, and send 10 messages, does the offset start at 0 and end at 9? or is it that the offset is 0 ,1 in each of 5 partitions?Grecism
The producer doesn't assign offset to message. The producer just sends a message, the broker "append" the message to the partition (which is a log) with the next available offset. Of course, when a consumer consumes that message it will get the same message offset because it's its position in the partition. Finally, the offset isn't assign across partitions. So in your example (5 partitions and sends 10 messages) they will get offset 0 and 1 in each partition (of course assuming that you are using the default round robin partition).Universe
J
2

By offset, if you mean the ordering of messages, then yes. It'd be the same for all consumers, because the ordering is determined by producers and brokers. So, if you have msg-1, msg-2, ..., msg-1000 in the topic, all the 5 consumers will consume those in that specific order. But the rate of consumption might vary. It has lots of variables (e.g. Network latency, network topology, consumer logic etc.) that determines the rate of consumption.

Jedidiah answered 18/12, 2018 at 5:27 Comment(2)
What I mean is will the same message in the 5 different partitions have the same offset number?Lurid
If you have replication factor set to 1 for this topic, then one message will only go to one partition. If it's more than one, then one replica is chosen as leader, and others become followers. The leader takes writes from producer, and the followers just copy the messages in order. So, in that case the same message will have same offset. But I don't think it has any implication to your question. Because the replicas are only used when there's a fail over scenario.Jedidiah
I
1

I think the question you are asking is this:

Can the same offset appear in more than one partition?

The answer is yes.

  • The same offset value can appear in more than one partition
  • The combination of partition number + offset value is a unique identifier
  • The offset alone is not

The below screenshot from Conduktor demonstrates this. The code to generate these events is provided below. Note that no "key" is provided for the messages which means they will be Round-Robin distributed to the partitions. If a fixed key is provided, then all messages go to the same topic and you will not see this effect.

Conduktor

#!/usr/bin/env python3

from confluent_kafka import Producer
from confluent_kafka import Message

def main():

    topic = 'test_topic'

    producer = create_producer()

    number_of_messages = 30

    for i in range(number_of_messages):
        producer.produce(
            topic=topic,
            value=f'message {i} of {number_of_messages}')

    producer.poll(3)
    producer.flush(10)


def create_producer():

    config = {
        'bootstrap.servers': 'localhost:29092',
        'client.id': 'produce_test',
        'enable.idempotence': True,
    }

    producer = Producer(config)

    return producer

if __name__ == '__main__':
    main()
Inquisitionist answered 15/2 at 23:13 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.