What is the difference between kafka earliest and latest offset values
Asked Answered
T

3

64

producer sends messages 1, 2, 3, 4

consumer receives messages 1, 2, 3, 4

consumer crashes/disconnects

producer sends messages 5, 6, 7

consumer comes back up and should receive messages starting from 5 instead of 7

For this kind of result, which offset value I have to use and what are the other changes/configurations need to do

Tysontyumen answered 18/1, 2018 at 11:50 Comment(0)
T
97

When a consumer joins a consumer group it will fetch the last committed offset so it will restart to read from 5, 6, 7 if before crashing it committed the latest offset (so 4).

The earliest and latest values for the auto.offset.reset property is used when a consumer starts but there is no committed offset for the assigned partition.
In this case you can chose if you want to re-read all the messages from the beginning (earliest)
or just after the last one (latest).

Tull answered 18/1, 2018 at 12:1 Comment(7)
Producer sending messages continuously... I checked the offset value before stopping the consumer, it was 8023. after 10mins I started consumer then the first offset value is 8020. After some time again I stopped consumer at that time offset value is 9239 after an hour I started consumer then the first message offset value is 9299 I am setting a groupId and auto.offset.reset is latest I am also logging the partition value , it is 0 onlyTysontyumen
So if you set it to latest, it will read 7. After it's commited 7, will it then read 6 and 5? Or is there a scenario in which they won't get processed if there is a constant stream of new records coming in with higher priority?Wojak
when you commit an offset, it means that you read all the previous messages. So committing 7 means that next you won't read 6 and 5 but the new incoming message 8 sent by the producer.Tull
I think @Tull not answer the question. For the question from Sat: the value for auto.offset.reset should be latest. When auto.offset.reset is set to latest, there are 2 scenarios can happen: first time when the consumer subscribe to topic, it will only receive the message arrive after it subscribed. Other scenario is when the consumer reconnect to the topic(after get crashed or something), consumer will receive the message 5, 6, 7 because the latest commit was 4.Tarsia
For @Yoker's question: the sequence of the message is immutable. The consumer will receive the message in this sequence: 5, 6, 7Tarsia
@EscaTran The answer is correct. docs.confluent.io/current/clients/…. "After the consumer receives its assignment from the coordinator, it must determine the initial position for each assigned partition. When the group is first created, before any messages have been consumed, the position is set according to a configurable offset reset policy (auto.offset.reset). Typically, consumption starts either at the earliest offset or the latest offset."Genovevagenre
Can you repeat the conclusion? If I want to consume 5, 6, is offset the earliest or the latest?Palladino
B
-1

To get a clear idea about this scenario we need to understand what happens when a consumer joins the same consumer group.

  1. Join the consumer group which triggers rebalance and assigns partitions to the new consumer.
  2. Look for committed offsets of the partitions assigned to the consumer.
  3. Check the auto.offset.reset configuration parameter to decide where to start consuming messages from.

We can set two values for auto.offset.reset configuration.

i. earliest - start consuming from the point where it stopped consuming before. (According to your example starts from 5)

ii. latest - starts consuming from the latest offsets in the assigned partitions. (According to your example starts from 7)

Bathypelagic answered 20/4, 2022 at 8:25 Comment(2)
Isn't this answer completely wrong if the accepted answer below is correct?Jacobin
This is incorrect. This setting only takes effect when no valid offset is available.Crafton
P
-1

In actual production, both values may be invalid. It is best to check the offset before restarting.

startingOffsets = " offset value of 4"
Palladino answered 12/4 at 5:2 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.