Google Cloud Pub/Sub Retry Count
Asked Answered
B

3

10

We are moving from an unstable messaging queue service to Google's Pub Sub in NodeJS. It seems to work well but we would like to include error handling.

We would like to limit the number of retries for a particular message, say 10 times in our test environment and 100 times in production. Now if a message fails 10 times (in test), instead of it sitting in our queue and continue to be processed and fail for 7 days we would like to move it to a separate error queue and send us an email.

We currently have all of this set up in our previous messaging queue but we have yet to find Google's Pub Sub retry count attribute for each message. Does anyone know if this exists?

We do use task queues in Google App Engine and they have everything we would need but Google's pub sub seems to be missing a lot. We do require any solution to be in Node.

Barragan answered 26/7, 2016 at 20:24 Comment(0)
C
17

Update 04/21/2020: As of today, the dead letter queue feature for Cloud Pub/Sub has been released. This feature allows one to set the maximum number of times delivery of a message should be attempted and then to specify a topic to which to publish messages that were delivered more than that number of times. When enabled, the feature also exposes the number of delivery attempts as a field. For example, it is exposed at the deliveryAttempt property on the message passed into the subscriber callback in Node.js.

Previous answer

Cloud Pub/Sub does not have a limit to the number of times it will retry delivery of a message to a subscriber. If your subscriber never acknowledges the message within the ack deadline, it will be redelivered until the message expires 7 days later.

If you want to stop receiving these messages, then you will need to ack them at some point. If you want to protect against "messages of death" that cannot be processed by your subscribers, I recommend the following:

  1. Keep track of message failure counts in a database, keyed by message id. Hopefully, failures are not frequent, so this database should not be too large and queries to it will only be made when there is actually a failure.

  2. When a message fails, query the database and see how many failures have occurred before. Increment the counter and do not acknowledge the message if the count is below your threshold.

  3. If a message fails more times than your threshold, publish the message to a separate "failed messages" topic, send an email, and acknowledge the message.

  4. If necessary, have a means by which to publish messages from the "failed messages" topic back to your main topic when the problems that caused the message to fail in the first place have been remedied.

You now have the message saved in a separate topic (for 7 days or until you ack it) and the message won't be redelivered to the subscribers on your main topic.

Citizenship answered 27/7, 2016 at 12:38 Comment(4)
Thanks for the response Kamal. I think your response would work be we will be implementing it a little differently. Since keeping message failure counts in a database seems like useless information, especially if the actually message disappears, we will be storing an enqueue or publish date on every object. If the object stays in the Topic for a day or longer than we will add the message to the database and acknowledge/remove it from the Topic. This allows us to quiet the logs, limit the retries and have the message that failed viewable and available for longer than 7 days.Barragan
We realize Task queues in App Engine would have been perfect but that they will not be supported for Node. Thanks again for the response!Barragan
Task queues in appengine has a REST API (still with beta label though) cloud.google.com/appengine/docs/python/taskqueue/restTympanum
@Tympanum Yep! We use task queues for our other projects that are currently written in Python. The tasks queues for NodeJS are still being worked on and they will be holding a private alpha in the foreseeable future.Barragan
J
2

There is a simple "hack" to achieve this.

Use Dead Lettering

Once the limit is achieved, the subscription will publish your message to the set topic and will not retry again

enter image description here

In this new topic, you can use a subscription that does not have a retry setting. This is also good for reducing your log clutter as you can address these failures in a new topic with only the failed messages.

Jannery answered 7/11, 2022 at 9:58 Comment(0)
D
-2

In python, see the 'num_retries' parameter on .execute():

pubsub_client.projects().topics().publish(topic='projects/xxxx',body=body).execute(num_retries=0)

Not sure if the same thing exists in Node.JS, but I hope this points you in the right direction.

Dismay answered 26/7, 2016 at 23:30 Comment(2)
Hi Aerodyno, this got me super excited. I also saw it in the Python docs but when we looked at the Github for both Python and NodeJS it was not implemented or documented at all.Barragan
The num_retries property will affect the number of times the publish is retried on failure, e.g., if the publisher cannot reach Cloud Pub/Sub for some reason. It will not affect the number of times messages are delivered to subscribers in the event the subscriber cannot process and ack the message.Citizenship

© 2022 - 2025 — McMap. All rights reserved.