I just switched from ForkPool to gevent with concurrency (5) as the pool method for Celery workers running in Kubernetes pods. After the switch I've been getting a non recoverable erro in the worker:
amqp.exceptions.PreconditionFailed: (0, 0): (406) PRECONDITION_FAILED - delivery acknowledgement on channel 1 timed out. Timeout value used: 1800000 ms. This timeout value can be configured, see consumers doc guide to learn more
The broker logs gives basically the same message:
2021-11-01 22:26:17.251 [warning] <0.18574.1> Consumer None4 on channel 1 has timed out waiting for delivery acknowledgement. Timeout used: 1800000 ms. This timeout value can be configured, see consumers doc guide to learn more
I have the CELERY_ACK_LATE
set up, but was not familiar with the necessity to set a timeout for the acknowledgement period. And that never happened before using processes. Tasks can be fairly long (60-120 seconds sometimes), but I can't find a specific setting to allow that.
I've read in another post in other forum a user who set the timeout on the broker configuration to a huge number (like 24 hours), and was also having the same problem, so that makes me think there may be something else related to the issue.
Any ideas or suggestions on how to make worker more resilient?