Apache-Kafka, batch.size vs buffer.memory
Asked Answered
S

4

19

I'm trying to figure out the difference between the settings batch.size and buffer.memory in Kafka Producer.

As I understand batch.size: It's the max size of the batch that can be sent.

The documentation describes buffer.memory as: the bytes of memory the Producer can use to buffer records waiting to be sent.

I don't understand the difference between these two. Can someone explain?

Thanks

Shanaeshanahan answered 4/4, 2018 at 10:56 Comment(1)
H
24

In my opinion,

batch.size: The maximum amount of data that can be sent in a single request. If batch.size is (32*1024) that means 32 KB can be sent out in a single request.

buffer.memory: if Kafka Producer is not able to send messages(batches) to Kafka broker (Say broker is down). It starts accumulating the message batches in the buffer memory (default 32 MB). Once the buffer is full, It will wait for "max.block.ms" (default 60,000ms) so that buffer can be cleared out. Then it's throw exception.

Hanselka answered 4/1, 2019 at 16:23 Comment(2)
What role buffer.memory play for producers which have compression enabled? Does the size of buffer.memory need to be bigger than the uncompressed message or compressed message?Neuter
++ So, the key difference is:_ ++ batch.size => for each batch; ++ buffer.memory => for all batches inside the buffer; ++ Am I right?Fermium
C
5

Kafka Producer and Kafka Consumer have many configuration that helps in performance tuning like gaining low latency and High throughput. buffer.memory and batch.size is also one of those and these are specific for Kafka Producer. Let see more details on these configuration.

  1. buffer.memory This sets the amount of memory the producer will use to buffer messages waiting to be sent to broker. If messages are sent by the application faster than they can be delivered to server, producer may run out of space and additional send() call will either be blocked or throw exception based on the max.block.ms configuration which allow blocking for a certain time and then throw exception. Another case may be if all broker server are down due to any reason and kafka producer will not able to send messages to broker and producer have to keep these messages in the memory allocated based on buffer.memory configuration but this will be filled up soon if broker not back normal state then as mentioned above mx.block.ms time will be considered to free up the space. Default value for max.block.ms is 60,000 ms Default value for buffer.memory is 32 MB (33554432)

  2. batch.size When multiple records are sent to the same partition, the producer will put them in batch. This configuration controls the amount of memory in bytes (not messages) that will be used for each batch. When the batch is full, all the messages in the batch will be sent. However this does not means that the producer will wait for batch to become full. The producer will send half full batches and even the batch with just a single message in them. Therefore setting the batch size too large will not cause delays in sending the messages. it will just use memory for the batches. Setting the batch size too small will add extra overhead because producer will need to send messages more frequently. Default batch size is 16384.

batch.size is also work based on the linger.ms which controls the amount of time to wait for additional messages before sending current batch. As we know that Kafka producer sends a batch of messages either when rge current batch is full or when the linger.ms time is reached. By default prodcuer will send messages as soon as there is a sender thread available to send them even if there is just message in the bacth.

Callboy answered 26/5, 2021 at 10:46 Comment(1)
Error about batch.size! the documentation kafka.apache.org/07/documentation/#configuration says batch.size default 200 is the number of messages batched at the producer, before being dispatched to the event.handlerInnoxious
A
1

Both of these producer configurations are described on the Confluent documentation page as following:

  • batch.size

Kafka producers attempt to collect sent messages into batches to improve throughput. With the Java client, you can use batch.size to control the maximum size in bytes of each message batch.

  • buffer.memory

Use buffer.memory to limit the total memory that is available to the Java client for collecting unsent messages. When this limit is hit, the producer will block on additional sends for as long as max.block.ms before raising an exception.

Abate answered 5/9, 2020 at 19:54 Comment(1)
What role buffer.memory play for producers which have compression enabled? Does the size of buffer.memory need to be bigger than the uncompressed message or compressed message?Neuter
F
0

The buffer stores multiple batches, each batch stores records of the same partition. The point of batching is to be able to send multiple records at the same time, and to the same Kafka broker. If the records inside a batch don't have the same partition then it wouldn't be possible to group them in the same TCP request because the producer will have to send to multiple kafka brokers.

Conclusion

  • batch.size: maximum number of bytes for the same partition
  • buffer.memory: maximum number of bytes to store all batches.
Fearsome answered 28/7 at 17:28 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.