Cloud Run: 429: The request was aborted because there was no available instance
Asked Answered
I

1

6

We (as a company) experience large spikes every day. We use Pub/Sub -> Cloud Run combination.

The issue we experience is that when high traffic hits, Pub/Sub tries to push messages to Cloud/Run all at the same time without any flow control. The result?

429: The request was aborted because there was no available instance.

Although this is marked as a warning, every 4xx HTTP response results in the message retry delivery.

Messages, therefore, come back to the queue and wait. If a message repeats this process and the instances are still taken, Cloud Run returns 429 again, and the message is sent back to the queue. This process repeats x times (depends on what value we set in Maximum delivery attempts). After that, the message goes to the dead-letter queue.

We want to avoid this and ideally don't get any 429, so the message won't travel back and forth, and it won't end up in the dead-letter subscription because it is not one of the application errors we want to keep there, but rather a warning caused by Pub/Sub not controlling the flow and coordinating with Cloud Run.

Neither Pub/Sub nor a push subscription (which is required to use for Cloud Run) have any flow control feature.


Is there any way to control how many messages are sent to Cloud Run to avoid getting the 429 response? And also, why does Pub/Sub even try to deliver when it is obvious that Cloud Run hit the limit of instances. The best would be to keep the messages in a queue until the instances free up.


Most of the answers would probably suggest increasing the limit of instances. We already set 1000. This would not be scalable because even if we set the limit to 1500 and a huge spike comes, we would pass the limit and get the 429 messages again.

The only option I can think of is some flow control. So far, we have read about Cloud Tasks, but we are not sure if this can help us. Ideally, we don't want to introduce any new service, but if necessary, we will do.

Thank you for all your tips and time! :)

Iodide answered 16/9, 2021 at 13:13 Comment(13)
Did you have message published in the dead letter topic because of 429 error? or is your question a guess on what could happen in the worst case?Hernando
You can set the max instances to 1,000 but you can also set the number of concurrent requests to 1,000. cloud.google.com/run/docs/about-concurrency This means one million requests can be processed at the same time. The key is how long does it take to process an event and how many events can be processed by one instance at the same time. Improving/rewriting the Cloud Run application to be more efficient is a possible solution. Switching from a scripting language like Python to Go/C# could improve overall performance. That would depend on details not present in your question.Drown
@guillaumeblaquiere not assumption, it happens.Iodide
@JohnHanley I know what concurrency is, we need to have 1 request per instance for now. There is this option in GCP so I assume this should work. But it does not. It constantly results in various problems. Number of connections to Cloud SQL can be max 100 per Cloud Run instance, so 1000 would not work anyway. About the programming language, I didn't mention the use case so Go/C# might be worse in our case. We don't plan to switch from Node.js for now. Thank you for your suggestions though.Iodide
PubSub push subscription follow the slow-start algorithm, well explained by Priyanka here. So yes, you can get 429 error when you reach the limit of your system but PubSub will adapt the throughput to your system and the capacity to absorb the traffic. You should have several 429, but not Dead Letter deliver (more than 5 rejections on the same message). Thus my question on "does it happens (dead letter, not 429)?"Hernando
@guillaumeblaquiere The slow-start algorithm handles the cases when you have multiple requests per instance. I use just 1 (on purpose). So slow-start algorithm does not have any effect on this since it cannot increase or decrease the number of requests per instance due to the limit of 1. PubSub tries to deliver all the messages all at once. It then throws the 429 error. It does not coordinate with the settings of the service (concurrency and number of instances).Iodide
@guillaumeblaquiere 4xx by definition acknowledges the message. I have seen a lot of the messages going to the dead letter because the messages run out of retries. Then we increased the number of retries before sending to the dead letter and also the time between tries. But this is not scalable in long run. Also, the messages just travel back and forth.Iodide
We will definitely work on having more than 1 request per instance. But I can imagine applications where you must have 1 request per instance. The option is there so people will use that. I also asked in GCP's Slack group and it seems like we are not the only company experiencing 429 - no available instance.Iodide
Thank you for your suggestions and your time. I can see that there is no solution for this now. I hope it will come with time.Iodide
@MichalMoravik the behavior that you observe is not the expected one. First, PubSub push subscription, pushes to an endpoint; on GCP or not, it doesn't know. It could be on prem and your could have the same availability/quota issue. No correlation possible between the services. Then, the slow start algorithm should wait and reduce the concurrent tries if you have lot of errors. Something strange here...Hernando
@guillaumeblaquiere There was no other error, those messages moved there for sure, we were observing the behavior in real-time.Iodide
This honestly looks like an issue that might be handled by Google Engineering Team, as it's being generated by pushing GCP products towards it's limitations. I would advise you to reach team by following this link.Gynaecomastia
Did you find the reason for this happening? I have the same issue and I am trying to handle it with Google support, but it seems it happens only in my porduction environment and cannot be easily replicated with minimal reproducible example.Catlaina
K
3

Here are some options:

  1. Use a 1st gen event-driven Cloud Function https://mcmap.net/q/607793/-how-to-rate-limit-google-cloud-pub-sub-queue
  2. Use Cloud Tasks to rate limit. But then you don't get dead lettering
  3. Disable dead lettering and have pubsub continuously attempt delivery. May want to set an end condition to avoid infinite retry loops. Also need to setup some alerting to ensure messages don't expire (if the service cannot keep up with the load).
  4. Handle in application code by tracking attempt count for each messageId (IE using Redis), and publishing to a dead letter topic when attempt count has exceeded a threshold. This is easy to abstract, doesn't introduce a new service, and checks all the boxes but definitely adds overhead of state management.
Ka answered 17/5, 2023 at 8:19 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.