How to handle backpressure in a Kafka Connect Sink? [closed]

Asked 19/4, 2018 at 6:19 Answered 17/6, 2022 at 7:7

We build a custom Kafka Connect sink which in turn calls a remote REST API. How do I propagate backpressure to the Kafka Connect infrastructure, so put() is called less often in cases when the remote system is slower than the internal consumer delivers messages to put()? The Kafka connect documentation says that we should not block in put(), but block in flush(). But not blocking in put() means that we have to buffer data which surely leads to OOM exceptions at some point, if put() is called more often than flush(). I've seen that a kafka consumer is allowed to call pause() or block in the loop(). Is it possible to leverage this in a kafka connect sink?

Hallucinate answered 19/4, 2018 at 6:19 Comment(0)

I've seen that a kafka consumer is allowed to call pause() or block in the loop(). Is it possible to leverage this in a kafka connect sink?

The raw consumer is not exposed, so no. You could call /pause on the whole connector, though I'm not sure what happens to un-flushed messages at that point.

But not blocking in put() means that we have to buffer data which surely leads to OOM exceptions at some point

It can, sure, but that is really the only viable option for holding onto data for longer than necessary. For instance, this is how the S3 and HDFS connectors work.

rotate.interval.ms
The time interval in milliseconds to invoke file commits...

Your HTTP client connection is likely blocking anyway to make the request, is it not?

The alternative would be to make your HTTP server embed a Kafka consumer so it can poll messages itself and act on them locally rather than need to be sent requests over HTTP.

Evincive answered 19/1, 2020 at 7:58 Comment(2)

>Your HTTP client connection is likely blocking anyway to make the request, is it not?< The clients I usually use do not block for sure, otherwise I wouldn't worry about this whole stuff. – Hallucinate 19/1, 2020 at 20:30

If they have an async response, then I suppose you don't care what order your records arrive in Kafka? – Evincive 19/1, 2020 at 20:49

If you can use some sort of autoscaler, one idea could be that of collecting some metrics on the sinks and scale the connector accordingly, either at the worker or at the task level (in this case through the REST API).

Skye answered 18/10, 2021 at 9:20 Comment(0)

Unfortunately there is no built in backpressure mechanisms in Kafka, you need help from other frameworks for it, like Akka's Alpakka Kafka with their contribution to Kafka's code base.

If you need sample code, you can look to my blogs, blog1, blog2.

Griner answered 17/6, 2022 at 7:7 Comment(0)

Recommended topics

Hot tags