Problem
I want to understand how I need to design the message passing for microservices to be still elastic and scalable.
Goal
- Microservices allow an elastic number of instances which are scaled up and down automatically based on the current load. This characteristic should not be limited by Kafka.
- We need to guarantee an at-least-once delivery
- We need to guarantee the delivery order of events concerning the same entity
Running example
- For simplicitly, let's say there are 2 microservices A and B;
- A1, A2 being instances of microservices A; B1 and B2 instances of microservices B
- A1 and A2 publish events describing CRUD operations on entities of A, like entity was created, updated, deleted.
Design I
- A1 publishes events under topic
a.entity.created
including information like theid
of the entity that was created in the message body. - A1 has to specify how many partitions (p) there will be in order to allow parallel consumption of consumers. This will allow for scalability.
- B1 and B2 subscirbe to the topic
a.entity.created
as a consumer groupb-consumers
. This leads to a load distribution between instances of B.
Questions:
- a) Events concerning the same entity might be processed in parallel and might get out of order, right?
- b) This will lead to instances of B dealing with requests in parallel. The limitting factor is how p (amount of partitions) is defined by the producers. If there are 5 partitions, but I'd need 8 consumers to cope with the load, it won't work since 3 consumers will not receive events. Did I understand this correctly? This would IMO be unusable for elastic microservices which might want to scale further.
Design II
- A1 publishes events under topic
a.entity.created.{entityId}
leading to a lot of different topics. - Each partition size is set to 1.
- B1 and B2 subscribe to the topic
a.entity.created.*
with a wildcard as a consumer groupb-consumers
. This leads to a load distribution between instances of B.
Questions:
- c) Events concering the same entity should be guaranteed to be delivered in order since there is only one partition, right?
- d) How is scalability handled here? The amount of partitions shouldn't limit the amount of consumers, or does it?
- e) Is there a better way to guarantee the goals mentioned above?
Design III (thx to StuartLC)
- A1 publishes event under topic
a.entity.created
and partition key based on the entityId. - Each partition size is set to 10.
- B1 and B2 subscirbe to the topic
a.entity.created
as a consumer groupb-consumers
. This leads to a load distribution between instances of B. Events regarding the same entity will be delivered in order thanks to the partition key.
Questions:
- f) With p=10 I can have max 10 consumers. This means I have to estimate the number of consumers at design time / deploy time if using env variables. Can I move that to runtime anyhow so that is adjusted dynamically?