What is the ideal number of partitions in kafka topic?
Asked Answered
K

2

6

I am learning Kafka and trying to create a topic for my recent search application. The data being pushed to kafka topics is assumed be a high number.

My kafka cluster have 3 brokers and there are already topics created for other requirements.

Now what should be the number of partitions which i should choose for my recent search topic? And what if i do not provide the partition number explicitly? What are things needs to be considered when choosing the partition number?

Keffiyeh answered 11/11, 2019 at 17:45 Comment(0)
E
16

This will depend on the throughput of your consumers. If you are producing 100 messages a second and your consumers can process 10 messages a second then you'll want at least 10 partitions (produce / consume) with 10 instances of your consumer. If you want this topic to be able to handle future growth, then you'll want to increase the partition count even higher so that you can add more instances of your consumer to handle the new volume.

Another piece of advice would be to make your partition count a highly divisible number so that you can scale up/down consumers while keeping their load balanced. For example, if you choose 10 partitions then you would have to have 1, 2, 5, or 10 instances of your consumer to keep them each processing from the same number of partitions. If you choose 12 partitions instead then you could be balanced with either 1, 2, 3, 4, 6, or 12 instances of your consumer.

Electromyography answered 11/11, 2019 at 19:6 Comment(5)
Ordering of message keys also needs considered.Garter
But what if my single consumer can handle 1000 messages per second? I don't think I'd need more than one partition but multiple resources advise against that. I'm not sure if I should just set up idle consumers.Devindevina
They wouldn't be idle if you had multiple partitions. They just wouldn't be utilizing their full resources which is good because it allows for growth. Re-partitioning can require downtime if you need to maintain message ordering. So starting with more partitions than you need at the beginning is probably a good idea.Electromyography
More partitions can be added (but not removed) later #45498378Varion
could you please elaborate on "you'll want at least 10 partitions (produce / consume) with 10 instances of your consumer. ". Lets say we are reading the data from spark, so we would need 10 Spark jobs running in parallel to consume all 100 messages ?Marianmariana
R
3

I would consider evaluating two main things before deciding on the no of partitions.

  1. First point is, how the partitions, consumers of a consumer group act together. In simple words, One consumer can consume messages from more than one partitions but one partition can't be consumed by more than one consumer. That means, it makes sense to have no.of partitions >= no.of consumers in a consumer group. Otherwise you will end up having consumers without any partition is being assigned.

  2. Second point is, what's your requirement from latency vs throughout point of view. In simple words, Latency is the time required to perform some action or to produce some result. Latency is measured in units of time -- hours, minutes, seconds, nanoseconds or clock periods. Throughput is the number of such actions executed or results produced per unit of time

Now, coming back to the comparison from kafka stand point, In general, more partitions in a Kafka cluster leads to higher throughput. But, you should be careful with this number if you are really looking for low latency.

Riesling answered 26/12, 2019 at 21:21 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.