When you enable change logging for a state store, Kafka Streams captures changes to the state and writes them to a changelog topic in Kafka. This changelog topic acts as a durable and fault-tolerant storage for the state, allowing the state to be restored in case of application restarts or failures.
Lets take word count example.
Initial State:
- Word: "hello", Count: 1
- Word: "world", Count: 1
Change Log Entries:
When a word is processed multiple times, the state store updates the count for that word, and these updates are written to the changelog topic.
- Update for "hello":
- Update for "world":
- Update for "hello" again:
Changelog Topic:
The changelog topic for the word-count-store
might contain records like the following
- Key: "hello", Value: 1 (Initial state)
- Key: "world", Value: 1 (Initial state)
- Key: "hello", Value: 2 (Update)
- Key: "world", Value: 2 (Update)
- Key: "hello", Value: 3 (Update)
Restore State:
If the Kafka Streams application restarts or fails over to another instance, it can restore the state of the word-count-store
by replaying the changelog topic from the beginning. This ensures that the state is consistent and up-to-date across application instances.
Compact Topics:
To optimize storage and reduce the volume of change log data, it can be configured to use log compaction. This ensures that only the latest update for each key is retained in the changelog topic, allowing the state to be fully restored while minimizing storage requirements.
changelog
is used by Kafka. In the events of failures, the state of your application can be recreated from thechangelog
. That's whystatestore
writes tochangelog
. This page makes it pretty clear – Roadblock