This question is about architecture and kafka topics migrating.
Original problem: schema evolution without backward compatibility.
https://docs.confluent.io/current/schema-registry/avro.html
I am asking the community to give me an advice or share articles from which I can get inspired and maybe think of a solution to my problem. Maybe there is an architecture or streaming pattern. It is not necessary to give me a language specific solution; just give me a direction into which I can go... My question is big, it may be interesting for those who later want
- a) change message format and produce message into a new topic.
- b) stop producing message into one topic and start producing messages into another topic "instantly"; in other words once a message in
v2
was produced, no new messages are appended intov1
.
Problem
I am changing message format, which is not compatible with the pervious version. In order not to break existing consumers, I decided to produce message to a new topic.
Up-caster idea
I have read about an up-caster.
https://docs.axoniq.io/reference-guide/operations-guide/production-considerations/versioning-events
Formal task
Let v1
and v2
be the topics. Currently, I produce messages in the format format_v1
into the topic v1
. I want to produce messages in the format format_v2
into the topic v2
. The switch should happen at some moment of time which I can choose.
In other words, at some moment of time, all instances of the producer stop sending messages into v1
, and start sending messages into v2
; thus the last message m1
in v1
is produced before the first message of m2
in v2
.
Details
I got an idea, that I can produce messages to the topic v1
have a kafka steam up-caster that is subscribed to v1
and pushes transformed messages to v2
. Let assume that the transformer (in my case of course) is able to transform message of format_v1
into format_v2
without errors.
As described in the link above about avro schema evolution, by the time I have added an up-caster and produce messages into v1
, I have all my consumers of v1
changed into v2
.
Now, a tricky part. We have two requirements:
1. No production down-time.
2. Preserve message ordering.
It means:
1) We are not allowed to lose messages; a client may use our system at any time, so our system should produce a message at any time.
2) We are running multiple instances of the producer. At some moment of time there can (potentially) be producers that may produce messages of format format_v1
into the topic v1
, and some instances that produce messages of format format_v2
into the topic v2
.
As we know, kafka does not guarantee message ordering for different partitions and topics.
I can solve the problem with partitions by writing message into v2 with the same partition selector as for v1. Or for now, I can imagine that we use just one partition for v1
and one partition for v2
.
My simplifications and attempts
1) I imagined that by the moment I want to change the producer to produce messages into a new topic, I have an up-caster (kafka stream component) that is capable of transforming messages from v1
into v2
without error. This kafka stream component is scalable.
2) All my consumers have been already switched into v2
topic. They constantly receive messages from v2
. At this moment of time, my producer instances are producing messages into the topic v1
and up-caster does its job well.
3) To simplify the problem, let's imagine that for now format_v1
and format_v2
do not matter, and they are the same.
4) Let's imagine we have one partition for v1
and one partition for v2
.
Now my problem, how to instantly switch all producers that from a given point of time; all the instances produce messages into the topic v2.
My colleague and kafka expert told me that with down-time it can be done
If you rely on the order of the messages in the partitions, you cannot switch to the new version without down time. To make down time minimal we can do the following.
Upcaster component must write the data to the same partitions and should try to make the same offsets. However it is not always possible, as offsets may have gaps, so the mapping between old offsets, and new offsets must be kept. No all the records, only the last bulk for each partition. If upcaster crashes, just start again, producer is still not involved in v2.
Start the v2 consumer. If it starts with the same consumer group as v1, nothing should be done, if it has new consumer group, update offsets in Kafka according to the new offsets.
Now Producers writes to v1, upcaster converts the data, consumer consumes from v2
Here comes down time. When the lag of upcaster is close to 0, shutdown v1 producer, wait until upcaster converts rest of the records, shutdown upcaster, start v2 producer, which writes to v2 topic.
I though of manual manipulation in the database (via some rest-endpoint or etc) to change a flag; producers always check the flag before they produce messages. When the flag says v2
or true
, the producer will start writing messages into v2
. However, what if at moment of time the flag is false one producers starts producing message into v1
, then the flag has changed and another producer has sent a message into v2
before the previous producer finished producing into v1
.