is kafka reliable when used as a message bus in micro services
Asked Answered
M

1

5

I am using kafka as a message bus for Micro Service architecture, hence multiple services listen on a topic for a message. Therefore, the services are highly dependent on the topic to be live.

But, there are many instances where I get leader not available, broker not available and leader= - 1 on the topics.

Now, I am not sure if I can rely on the kafka topics, as services get interrupted when there are issues on the topics with cause issues in the platform.

Can someone throw some light on the reliability and dependability on the topics and can we recover if we can across the above issues.

Mitre answered 24/1, 2018 at 9:52 Comment(1)
Kafka is very reliable but if you don't configure it correctly it won't work (as many other things). For example, if you put all Kafka instances in a single physical server and that server is down then it won't be reliable as it can't.Cottontail
L
7

I'll answer to your question by explaining how Kafka works in general and how it deals with failures.

Every topic, is a particular stream of data (similar to a table in a database). Topics, are split into partitions (as many as you like) where each message within a partition gets an incremental id, known as offset as shown below.

Partition 0:

+---+---+---+-----+
| 0 | 1 | 2 | ... |
+---+---+---+-----+

Partition 1:

+---+---+---+---+----+
| 0 | 1 | 2 | 3 | .. |
+---+---+---+---+----+

Now a Kafka cluster is composed of multiple brokers. Each broker is identified with an ID and can contain certain topic partitions.

Example of 2 topics (each having 3 and 2 partitions respectively):

Broker 1:

+-------------------+
|      Topic 1      |
|    Partition 0    |
|                   |
|                   |
|     Topic 2       |
|   Partition 1     |
+-------------------+

Broker 2:

+-------------------+
|      Topic 1      |
|    Partition 2    |
|                   |
|                   |
|     Topic 2       |
|   Partition 0     |
+-------------------+

Broker 3:

+-------------------+
|      Topic 1      |
|    Partition 1    |
|                   |
|                   |
|                   |
|                   |
+-------------------+

Note that data is distributed (and Broker 3 doesn't hold any data of topic 2).

Topics, should have a replication-factor > 1 (usually 2 or 3) so that when a broker is down, another one can serve the data of a topic. For instance, assume that we have a topic with 2 partitions with a replication-factor set to 2 as shown below:

Broker 1:

+-------------------+
|      Topic 1      |
|    Partition 0    |
|                   |
|                   |
|                   |
|                   |
+-------------------+

Broker 2:

+-------------------+
|      Topic 1      |
|    Partition 0    |
|                   |
|                   |
|     Topic 1       |
|   Partition 0     |
+-------------------+

Broker 3:

+-------------------+
|      Topic 1      |
|    Partition 1    |
|                   |
|                   |
|                   |
|                   |
+-------------------+

Now assume that Broker 2 has failed. Broker 1 and 3 can still serve the data for topic 1. So a replication-factor of 3 is always a good idea since it allows for one broker to be taken down for maintenance purposes and also for another one to be taken down unexpectedly. Therefore, Apache-Kafka offers strong durability and fault tolerance guarantees.

Note about Leaders: At any time, only one broker can be a leader of a partition and only that leader can receive and serve data for that partition. The remaining brokers will just synchronize the data (in-sync replicas). Also note that when the replication-factor is set to 1, the leader cannot be moved elsewhere when a broker fails. In general, when all replicas of a partition fail or go offline, the leader will automatically be set to -1.

Lonesome answered 24/1, 2018 at 13:21 Comment(2)
There were instances where I saw leader:-1 when I ran "./bin/kafka-topics.sh --describe --zookeeper localhost:2181 --topic topicname" which was my concern.is there a way to recover from it ?Mitre
@rajesh_pudota This happens because you’ve picked a replication factor of 1. So when a broker is down, the leader cannot be moved elsewhere.Lonesome

© 2022 - 2024 — McMap. All rights reserved.