What is the difference between Kafka partitions and Kafka replicas?
Asked Answered
P

3

7

I created 3 Kafka brokers setup with broker id's 20,21,22. Then I created this topic:

bin/kafka-topics.sh --zookeeper localhost:2181 \
  --create --topic zeta --partitions 4 --replication-factor 3

which resulted in:

enter image description here

When a producer sends message "hello world" to topic zeta, to which partition the message first gets written to by Kafka?

The "hello world" message gets replicated in all 4 partitions?

Each broker among the 3 brokers contain all the 4 partitions? How is that related to replica factor of 3 in above context?

If I have 8 consumers running in their own processes or threads in parallel subscribed to zeta topic, how partitions or brokers are assigned by Kafka to serve these in parallel?

Poynter answered 30/7, 2020 at 6:24 Comment(0)
H
7

Replication and Partitions are two different things.

Replication will copy identical data across the cluster for higher availability/durability. Partitions are Kafka's way to distribute non-redundant data across the cluster and it scales with the number of partitions.

When a producer sends message "hello world" to topic zeta, to which partition the message first gets written to by Kafka?

When you send a "hello world" message to a topic, by default, your producer applies a hashing algorithm based on the key of that message (like hash(key) % number_of_partitions). In case you did not provide a key the producer will do round-robin and it is therefore not predictable to which partitions the message will be sent. I am guessing if it is the first message, it will end up in partition 0.

The "hello world" message gets replicated in all 4 partitions?

This one message will get replicated across all your Replicas but not to the 4 partitions.

You will find the message on the broker 20, 21, 22. However, each partition has a leader which is responsible for all reads and writes from and to that partition. In your screenshot you can also spot the broker id of the leader of each partition. From Leader: 21 for partition 0 you can tell that the leader of that partition sits on broker 21.

Each broker among the 3 brokers contain all the 4 partitions? How is that related to replica factor of 3 in above context?

As you have set the replication factor to 3 while having in total 3 brokers in your cluster all three brokers contain all four partitions. Again, there is a difference between partitions and replicas. You could have a Kafka "cluster" with a single broker and still have, say, 20 partitions in the topic.

If I have 8 consumers running in their own processes or threads in parallel subscribed to zeta topic, how partitions or brokers are assigned by Kafka to serve these in parallel?

Here it depends if those 8 consumers belong to the same Consumer Group or not. It is important to know that one partition can be read at most by one consumer thread from a particular consumer group.

If all 8 consumers belong to the same group, 4 of them will read from one partition (only from the partition leader) and the other four will be idle.

Hindustan answered 30/7, 2020 at 6:36 Comment(0)
A
7

Kafka topics are internally divided into a number of partitions. Partitions allow you to parallelize a topic by splitting the data in a particular topic across multiple brokers On the other side replica is the number of copies of each partition you wish to have to achieve fault tolerance incase of a failure Each partition has a preferred leader which handles all the write and read requests coming from the kafka clients.

Incase of a leader node failure one of the replicas from ISR (In sync replica) list is promoted to be the leader till the preferred leader node is recovered and it catches up to all the new data that was generated since the failure.

To answer your queries:

When a producer sends message "hello world" to topic zeta, to which partition the message first gets written to by Kafka?

The producer is responsible for choosing which record to assign to which partition within the topic. This can be done in a round-robin fashion simply to balance load or it can be done according to some semantic partition function .

The "hello world" message gets replicated in all 4 partitions?

The message "hello world" will only be written to 1 topic partition and later replicated to all the replicas of that partition.

Each broker among the 3 brokers contain all the 4 partitions? How is that related to replica factor of 3 in above context?

Each broker will be a leader to 1 partition and the other 3 will contain replica of that partition. In your output if you notice the ISR list the first id in the list is the current leader of the partition and the other 2 are the followers where the replica of that partition is

If I have 8 consumers running in their own processes or threads in parallel subscribed to zeta topic, how partitions or brokers are assigned by Kafka to serve these in parallel?

If all the 8 consumers are in the same consumer group only 4 of them will receive data. This is because the number of partition is 4. Kafka makes sure that each partition is assigned to only 1 consumer in the consumer group. When a consumer which has a partition assigned crashes the partition is reassigned to another consumer. If the consumers are all standalone clients, all 4 partitions of the topic are assigned to each consumer.

I hope this was helpful :)

Argumentum answered 30/7, 2020 at 9:35 Comment(0)
G
0

This is older question but the default partitioner is changed in kafka 3.3.2. It is StickyPartitioner for non-key messages, rather than RoundRobinPartitioner. Kafka Enhancement

Garvin answered 9/5 at 12:8 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.