I was following the book "Kafka: The Definitive Guide" First Edition to understand when log segments are deleted by the broker.
As per the text I understood, a segment will not become eligible for deletion until it is closed. A segment can be closed only when it has reached log.segment.bytes size (considering log.segment.ms is not set) . Once a segment becomes eligible for deletion, the log.retention.ms policy would apply to finally decide when to delete this segment.
However this seems to contradict the behaviour I see in our production cluster ( Kafka ver 2.5).
The log segment gets deleted as soon as log.retention.ms is satisfied, even when the segment size is less than log.segment.bytes.
[2020-12-24 15:51:17,808] INFO [Log partition=Topic-2, dir=/Folder/Kafka_data/kafka] Found deletable segments with base offsets [165828] due to retention time 604800000ms breach (kafka.log.Log)
[2020-12-24 15:51:17,808] INFO [Log partition=Topic-2, dir=/Folder/Kafka_data/kafka] Scheduling segments for deletion List(LogSegment(baseOffset=165828, size=895454171, lastModifiedTime=1608220234000, largestTime=1608220234478)) (kafka.log.Log)
The size is still less than 1GB, but the segment got deleted.
The book mentions at the time of press release the Kafka version was 0.9.0.1 . So was this setting changed in later versions of Kafka. ( I could not find any specific mention of this change in the Kafka docs). Below is the snippet from the book.
$KAFKA_HOME/bin/kafka-configs.sh --bootstrap-server :9092 --entity-type topics --entity-name my-topic --describe --all | grep segment.bytes
– Event