Kafka log.segment.bytes vs log.retention.hours
Asked Answered
D

3

11

I was following the book "Kafka: The Definitive Guide" First Edition to understand when log segments are deleted by the broker.

As per the text I understood, a segment will not become eligible for deletion until it is closed. A segment can be closed only when it has reached log.segment.bytes size (considering log.segment.ms is not set) . Once a segment becomes eligible for deletion, the log.retention.ms policy would apply to finally decide when to delete this segment.

However this seems to contradict the behaviour I see in our production cluster ( Kafka ver 2.5).

The log segment gets deleted as soon as log.retention.ms is satisfied, even when the segment size is less than log.segment.bytes.

[2020-12-24 15:51:17,808] INFO [Log partition=Topic-2, dir=/Folder/Kafka_data/kafka] Found deletable segments with base offsets [165828] due to retention time 604800000ms breach (kafka.log.Log)

[2020-12-24 15:51:17,808] INFO [Log partition=Topic-2, dir=/Folder/Kafka_data/kafka] Scheduling segments for deletion List(LogSegment(baseOffset=165828, size=895454171, lastModifiedTime=1608220234000, largestTime=1608220234478)) (kafka.log.Log)

The size is still less than 1GB, but the segment got deleted.

The book mentions at the time of press release the Kafka version was 0.9.0.1 . So was this setting changed in later versions of Kafka. ( I could not find any specific mention of this change in the Kafka docs). Below is the snippet from the book.

enter image description here

Distichous answered 30/12, 2020 at 12:48 Comment(2)
What's the actual segment size for this topic? You can find out with the command $KAFKA_HOME/bin/kafka-configs.sh --bootstrap-server :9092 --entity-type topics --entity-name my-topic --describe --all | grep segment.bytesEvent
segment.bytes=1073741824 sensitive=false synonyms={STATIC_BROKER_CONFIG:log.segment.bytes=1073741824, DEFAULT_CONFIG:log.segment.bytes=1073741824} , so all are at default i.e. 1GBDistichous
W
25

Broker Configs: log.retention.ms and log.retention.bytes

The most common configuration for how long Kafka broker will retain messages (actually, “log segments”) is by time (in ms), and is specified using log.retention.ms parameter (default: 1 week). If set to -1, no time limit is applied.

Another way to expire is based on the total number of bytes of messages retained. This value is set using the log.retention.bytes parameter, and it is applied per partition. Its default value is -1, which allows for infinite retention. This means that if you have a topic with 8 partitions, and log.retention.bytes is set to 1 GB, the amount of data retained for the topic will be 8 GB at most. If you have specified both log.retention.bytes and log.retention.ms, messages may be removed when either criterion is met.

Broker Configs: log.segment.bytes and log.roll.ms

As messages are produced to the Kafka broker, they are appended to the current log segment for the partition. Once the log segment has reached the size specified by the log.segment.bytes parameter (default: 1 GB), the log segment is closed and a new one is opened. Only once a log segment has been closed, it can be considered for expiration (by log.retention.ms or log.retention.bytes).

Another way to control when log segments are closed is by using the log.roll.ms parameter (default: 1 week), which specifies the amount of time after which a log segment should be closed. Kafka will close a log segment either when the size limit is reached or when the time limit is reached, whichever comes first.

A smaller log-segment size means that files must be closed and allocated more often, which reduces the overall efficiency of disk writes. Adjusting the size of the log segment can be important if topics have a low produce rate. For example, if a topic receives only 100 megabytes per day of messages, and log.segment.bytes is set to the default, it will take 10 days to fill one segment. As messages cannot be expired until the log segment is closed, if log.retention.ms is set to 1 week, they will actually be up to 17 days of messages retained until the closed segment expires. This is because once the log segment is closed with the current 10 days of messages, that log segment must be retained 7 days before it expires based on the time policy.

Note on Topic Configs: Both retention times as well as segment roll-over behavior can also be overridden by topic properties. The names of these topic properties are slightly different: retention.ms, retention.bytes, segment.bytes, and segment.ms.

Wertheimer answered 5/8, 2021 at 5:31 Comment(4)
log.segment.ms does not exist, but log.roll.ms (and log.roll.hours) do. This appears to be the case for both Kafka 2.0 and 3.2. kafka.apache.org/documentation/#brokerconfigs The per-topic config names are slightly different.Fluorescence
I don't understand why did this answer get so many points because this is just a copy paste from "Kafka: The Definitive Guide". What fascinates me is that the author of the question voted for this one to be the best answer even though he references the same book. Am I missing something?Oospore
@ScottCarey updated answer from log.segment.ms to log.roll.msExchange
...and added hint to topic configsExchange
K
11

Hope this becomes clearer.

segment.ms => the maximum age of the segment file (from the date of creation)

retention.ms => the maximum age of any message in a segment (that is closed) beyond which this segment is eligible for deletion (if delete policy is set)

So if the segment is "active segment" then it can be rolled over based on segment.ms (or segment.bytes) but NOT by retention.ms. The retention only comes into play on closed (not active) segments.

So the behavior that is quoted from the book is correct. However you think that the segment is active and the INFO logs specify that the segment is setup for deletion. This cannot happen on an active segment (assuming no bug). The segment has to be closed (not active) before any of the retention.* properties can take effect.

See this.

Kassey answered 12/5, 2021 at 22:33 Comment(1)
that's a helpful summary and thread @KasseyDistichous
E
2

What you observe is the expected behavior. In short, if you have an active segment that is not full yet, and segment.ms has passed, then it will be closed and turn into an "old log segment" even if it is not full.

Event answered 30/12, 2020 at 14:40 Comment(3)
Thanks for the link.. couldn't really understand at once, but after multiple reads, I think I get it (hopefully). The statement about a log being delete-able only when it is closed is true. There is a configuration "segment.ms" which is defaulted to 7 days. Hence my logs are getting rolled. Closed logs which have been idle for "retention.ms" (again 7 days in my case) are eligible for deletion. [ disregarding the size as it was less than 1GB in my case] Hopefully this makes sense, @fvaleri, do confirmDistichous
I think you fully got it.Event
retention.ms has no effect on active segments being deleted / closed. Please see my answer.Kassey

© 2022 - 2024 — McMap. All rights reserved.