Understanding the max.inflight property of kafka producer
Asked Answered
V

3

20

I work on a bench of my Kafka cluster in version 1.0.0-cp1.

In part of my bench who focus on the max throughput possible with ordering guarantee and no data loss (a topic with only one partition), need I to set the max.in.flight.requests.per.connection property to 1?

I've read this article

And I understand that I only have to set the max.in.flight to 1 if I enable the retry feature at my producer with the retries property.

Another way to ask my question: Only one partition + retries=0 (producer props) is sufficient to guarantee the ordering in Kafka?

I need to know because increase the max.in.flight increases drastically the throughput.

Vinni answered 12/4, 2018 at 17:29 Comment(1)
If you see the comments in the bottom you will see the correction from max.in.flight.requests.per.connection to max.in.flight.requests.per.sessionShepherd
A
26

Your use case is slightly unclear. You mention ordering and no data loss but don't specify if you tolerate duplicate messages. So it's unclear if you want At least Once (QoS 1) or Exactly Once

Either way, as you're using 1.0.0 and only using a single partition, you should have a look at the Idempotent Producer instead of tweaking the Producer configs. It allows to properly and efficiently guarantee ordering and no data loss.

From the documentation:

Idempotent delivery ensures that messages are delivered exactly once to a particular topic partition during the lifetime of a single producer.

The early Idempotent Producer was forcing max.in.flight.requests.per.connection to 1 (for the same reasons you mentioned) but in the latest releases it can now be used with max.in.flight.requests.per.connection set to up to 5 and still keep its guarantees.

Using the Idempotent Producer you'll not only get stronger delivery semantics (Exactly Once instead of At least Once) but it might even perform better!

I recommend you check the delivery semantics [in the docs] [in the docs]:http://kafka.apache.org/documentation/#semantics


Back to your question

Yes without the idempotent (or transactional) producer, if you want to avoid data loss (QoS 1) and preserve ordering, you have to set max.in.flight.requests.per.connection to 1, allow retries and use acks=all. As you saw this comes at a significant performance cost.

Augur answered 18/4, 2018 at 11:47 Comment(4)
I'm not sure you can set max.in.flight.requests.per.connection to 5 if you want to preserve message order. If one message is rejected and needs a retry but in the mid-time a second message has been dispatched and written to the topic the first message will be written in the topic after the second one. To avoid this scenario you can only have one message in flight. I understand you can set this property to 5 if you only want exacty-once deliveryGalatea
@PabloAntequera : I don't know for former versions, but with recent versions of Kafka, enable.idempotence=true prevents Duplication & Keeps the ordering of the messages even if max.in.flight.requests.per.connection > 1. The explanations are here: docs.confluent.io/cloud/current/client-apps/optimizing/…Wawro
does it keep the send sequence when send to different partitions per connection? I mean if one send failed, is it possible the later message will succeed.Giusto
if idempotency is not needed, in the producer, set max inflight to 1. Else use idempotency to true. In fact, idempotency to true should actually be more performant than max inflight = 1, plus it gives idempotency for free. The reason I think is, broker knows producer is working on idempotent session, so there is a seq number. If it finds a seq number is missing, it informs producer and stops processing messages. Producer then switches to max inflight = 1 temporarily.Wingless
A
6

Yes, you must set the max.in.flight.requests.per.connection property to 1. In the article you have read it was an initial mistake (currently corrected) where author wrote:

max.in.flights.requests.per.session

which doesn't exist in the Kafka documentation.

This errata comes probably from the book "Kafka The Definitive Guide" (1st edition) where you can read in the page 52:

<...so if guaranteeing order is critical, we recommend setting in.flight.requests.per.session=1 to make sure that while a batch of messages is retrying, additional messages will not be sent ...>

Assize answered 18/1, 2019 at 11:17 Comment(0)
T
6

imo, it is invaluable to also know about this issue that makes things far more interesting and slightly more complicated.

When you enable enable.idempotence=true , every time you send a message to the broker, you also send a sequence number, starting from zero. Brokers store that sequence number too on their side. When you make a next request to the broker, let’s say with sequence_id=3, the broker can look at its currently stored sequence number and say :

  • if its 4 - good, its a new batch of records
  • if its 3 - its a duplicate
  • if its 5 (or higher), it means messages were lost

And now max.inflight.requests.per.connection . A producer can send as many as this value concurrent requests without actually waiting for an answer from the broker. When we reach 3 (let’s say max.inflight.requests.per.connection=3) , we start to ask the broker for the previous results (at the same time we can’t process any batches now even if they are ready).

Now, for the sake of the example, let’s say the broker says this : “1 was OK, I stored it”, “2 has failed” and now the important part: because 2 failed, the only possible thing you can get for 3 is “out of order”, which means it did not store it. The client now knows that it needs to reprocess 2 and 3 and it creates a List and resends them - in that exact order; if retry is enabled.

This explanation is probably over simplified, but this is my basic understanding after reading the source code a bit.

Topdrawer answered 29/9, 2022 at 19:54 Comment(1)
my understanding is the same as yours based on this docs.confluent.io/platform/current/installation/configuration/…, except that OPs question is for an older version of kafka rightTacky

© 2022 - 2025 — McMap. All rights reserved.