PRECONDITION_FAILED: Delivery Acknowledge Timeout on Celery & RabbitMQ with Gevent and concurrency

P

3

19

I just switched from ForkPool to gevent with concurrency (5) as the pool method for Celery workers running in Kubernetes pods. After the switch I've been getting a non recoverable erro in the worker:

amqp.exceptions.PreconditionFailed: (0, 0): (406) PRECONDITION_FAILED - delivery acknowledgement on channel 1 timed out. Timeout value used: 1800000 ms. This timeout value can be configured, see consumers doc guide to learn more

The broker logs gives basically the same message:

2021-11-01 22:26:17.251 [warning] <0.18574.1> Consumer None4 on channel 1 has timed out waiting for delivery acknowledgement. Timeout used: 1800000 ms. This timeout value can be configured, see consumers doc guide to learn more

I have the CELERY_ACK_LATE set up, but was not familiar with the necessity to set a timeout for the acknowledgement period. And that never happened before using processes. Tasks can be fairly long (60-120 seconds sometimes), but I can't find a specific setting to allow that.

I've read in another post in other forum a user who set the timeout on the broker configuration to a huge number (like 24 hours), and was also having the same problem, so that makes me think there may be something else related to the issue.

Any ideas or suggestions on how to make worker more resilient?

Proclus answered 3/11, 2021 at 16:36 Comment(0)

P

12

For future reference, it seems that the new RabbitMQ versions (+3.8) introduced a tight default for consumer_timeout (15min I think).

The solution I found (that has also been added to Celery docs not long ago here) was to just add a large number for the consumer_timeout in RabbitMQ.

In this question, someone mentions setting consumer_timeout to false, in a way that using a large number is not needed, but apparently there's some specifics regarding the format of the configuration for that to work.

I'm running RabbitMQ in k8s and just done something like:

rabbitmq.conf: |
  consumer_timeout = 31622400000

Proclus answered 5/3, 2022 at 1:40 Comment(1)

Wouldn't the timeout only be for the amount of time it takes for the consumer to ack the task, not run it? If that's the case (unless you have acks_late set), it should ack immediately, not after the task is run. – Rozanna 31/8, 2023 at 4:7

B

17

The accepted answer is the correct answer. However, if you have an existing RabbitMQ server running and do not want to restart it, you can dynamically set the configuration value by running the following command on the RabbitMQ server:

rabbitmqctl eval 'application:set_env(rabbit, consumer_timeout, 36000000).'

This will set the new timeout to 10 hrs (36000000ms). For this to take effect, you need to restart your workers though. Existing worker connections will continue to use the old timeout.

You can check the current configured timeout value as well:

rabbitmqctl eval 'application:get_env(rabbit, consumer_timeout).'

If you are running RabbitMQ via Docker image, here's how to set the value: Simply add -e RABBITMQ_SERVER_ADDITIONAL_ERL_ARGS="-rabbit consumer_timeout 36000000" to your docker run OR set the environment RABBITMQ_SERVER_ADDITIONAL_ERL_ARGS to "-rabbit consumer_timeout 36000000".

Hope this helps!

Blinders answered 2/10, 2022 at 19:40 Comment(0)

P

12

For future reference, it seems that the new RabbitMQ versions (+3.8) introduced a tight default for consumer_timeout (15min I think).

The solution I found (that has also been added to Celery docs not long ago here) was to just add a large number for the consumer_timeout in RabbitMQ.

In this question, someone mentions setting consumer_timeout to false, in a way that using a large number is not needed, but apparently there's some specifics regarding the format of the configuration for that to work.

I'm running RabbitMQ in k8s and just done something like:

rabbitmq.conf: |
  consumer_timeout = 31622400000

Proclus answered 5/3, 2022 at 1:40 Comment(1)

Wouldn't the timeout only be for the amount of time it takes for the consumer to ack the task, not run it? If that's the case (unless you have acks_late set), it should ack immediately, not after the task is run. – Rozanna 31/8, 2023 at 4:7

V

0

While configuring Airflow to use RabbitMQ, set the value as follows in airflow.cfg:

 [celery_broker_transport_options]
 consumer_timeout = 31622400000

Vachel answered 30/5 at 16:31 Comment(0)

Recommended topics

Hot tags