Kafka Producer: Got error produce response with correlation NETWORK_EXCEPTION
Asked Answered
S

2

14

We are running kafka in distributed mode across 2 servers. I'm sending messages to Kafka through Java sdk to a Queue which has Replication factor 2 and 1 partition.

We are running in async mode. I don't find anything abnormal in Kafka logs. Can anyone help in finding out what could be cause?

    Properties props = new Properties();
            props.put("bootstrap.servers", serverAdress);
            props.put("acks", "all");
            props.put("retries", "1");
            props.put("linger.ms",0);
            props.put("buffer.memory",10240000);
            props.put("max.request.size", 1024000);
            props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
            props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");

   Producer<String, Object> producer = new org.apache.kafka.clients.producer.KafkaProducer<>(props);

Exception trace:

-2017-08-15T02:36:29,148 [kafka-producer-network-thread | producer-1] WARN producer.internals.Sender - Got error produce response with correlation id 353736 on topic-partition BPA_BinLogQ-0, retrying (0 attempts left). Error: NETWORK_EXCEPTION

Siusan answered 17/8, 2017 at 14:29 Comment(1)
Hey, how did you solve this issue? Would appreciate your help on this.Nudity
E
2

You are getting a NETWORK_EXCEPTION so this should tell you that something is wrong with the network connection to the Kafka Broker you were producing toward. Either the broker shutdown or the TCP connection was shutdown for some reason.

Elodia answered 18/8, 2017 at 2:37 Comment(7)
Is there a to get the specific reason/trace?. NETWORK_EXCEPTION is way generic, can't identify which went wrong. The brokers was not shutdown for sureSiusan
Does the broker logs show anything at the same time? Is the error transient or happens all the time? Are you connected on plaintext port or SSL?Elodia
I dont find anything abnormal in logs at the same time. I'm connecting by giving serverAddress property as serverIP:port . This is first time we got this errorSiusan
Try TRACE level logging for more details. This is the KIP that added this feature starting in 0.9 issues.apache.org/jira/browse/KAFKA-2120. Is it possible you are having network outages?Elodia
Network outrage, there can be. But couldn't report to our team, as we don't have any logs. Thanks,will try TRACE levelSiusan
Hey how did you solve this? Would appreciate the help.Nudity
I got this error on localhost with versions 2.6.3 and 3.0.0 under load. The error required restarting the client.Fisticuffs
P
0

A quick code dive shows the most probable cause: lost connection to the upstream broker, what causes the delivery method to fail internally inside a sender (link) - you might want to start logging trace in Sender to confirm that:

    if (response.wasDisconnected()) {
        log.trace("Cancelled request with header {} due to node {} being disconnected",
            requestHeader, response.destination());
        for (ProducerBatch batch : batches.values())
            completeBatch(batch, new ProduceResponse.PartitionResponse(Errors.NETWORK_EXCEPTION, String.format("Disconnected from node %s", response.destination())),
                    correlationId, now);
    }

Now with the batch completed in a non-success fashion, it gets retried, but from the logs you have attached it looks like, you ran out of retries (0 attempts left), so it propagates to your level (link):

        if (canRetry(batch, response, now)) {
            log.warn(
                "Got error produce response with correlation id {} on topic-partition {}, retrying ({} attempts left). Error: {}",
                ....
            reenqueueBatch(batch, now);
        }

So the ideas are:

  • investigate your network connectivity - unfortunately this might mean tracing at least on client-side (esp. NetworkClient that does all the upstream broker management) to see if there's any connection loss;
  • increase producer's retries value (though newer versions of Kafka set it to MAX_INT or so).
Publius answered 24/6, 2022 at 19:9 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.