How to fix CloudRun error 'The request was aborted because there was no available instance'
Asked Answered
M

7

35

I'm using managed CloudRun to deploy a container with concurrency=1. Once deployed, I'm firing four long-running requests in parallel. Most of the time, all works fine -- But occasionally, I'm facing 500's from one of the nodes within a few seconds; logs only provide the error message provided in the subject.

Using retry with exponential back-off did not improve the situation; the retries also end up with 500s. StackDriver logs also do not provide further information.

Potentially relevant gcloud beta run deploy arguments:

--memory 2Gi --concurrency 1 --timeout 8m --platform managed

What does the error message mean exactly -- and how can I solve the issue?

Multifoil answered 12/7, 2019 at 12:51 Comment(5)
Do you deploy in US-CENTRAL1 ?Cyd
Yes, us-central1 -- as it's still the only choice (for me?) when trying to create a new service through console.cloud.google.com / UI; CLI offered more choices long ago, but it always resulted in errors for me, making me believe it's really only available there?Multifoil
The UI only offers central, but the CLI let's you use others as well. We tried east with success (but it doesn't show up in UI)Roundshouldered
Many new region are now available : cloud.google.com/run/docs/release-notes#july_10_2019Cyd
Main question is: why it fails with 500 error status rather than 429 or something easy to capture and handle?...Bounder
E
21

This error message can appear when the infrastructure didn't scale fast enough to catch up with the traffic spike. Infrastructure only keeps a request in the queue for a certain amount of time (about 10s) then aborts it.

This usually happens when:

  1. traffic suddenly largely increase
  2. cold start time is long
  3. request time is long
Exhortative answered 12/7, 2019 at 20:51 Comment(4)
Can you improve this answer, with the way how to fix this error. Not just why it happens.Roundshouldered
There's performance tips in the docs that could help with thisAntoine
Accepting the answer as helpful despite the room for improvement. Haven't seen the error for a few days now... Should it reoccur, I'll try to add preliminary warm-up requests. IMHO, request time being long should not lead to this error (given I have specified a relatively long timeout).Multifoil
This is only a half-answer: it explains "what does this error mean" but not "how can I solve the issue". Corinne White links to docs which is helpful, but they're pretty generic.Lectern
E
9

We also faced this issue when traffic suddenly increased during business hours. The issue is usually caused by a sudden increase in traffic and a longer instance start time to accommodate incoming requests. One way to handle this is by keeping warm-up instances always running i.e. configuring --min-instances parameters in the cloud run deploy command. Another and recommended way is to reduce the service cold start time (which is difficult to achieve in some languages like Java and Python)

Etna answered 12/4, 2021 at 7:14 Comment(0)
C
6

I also experiment the problem. Easy to reproduce. I have a fibonacci container that process in 6s fibo(45). I use Hey to perform 200 requests. And I set my Cloud Run concurrency to 1.

Over 200 requests I have 8 similar errors. In my case: sudden traffic spike and long processing time. (Short cold start for me, it's in Go)

Cyd answered 14/7, 2019 at 12:48 Comment(0)
T
2

I was able to resolve this on my service by raising the max autoscaling container count from 2 to 10. There really should be no reason that 2 would be even close to too low for the traffic, but I suspect something about the Cloud Run internals were tying up to 2 containers somehow.

Turbary answered 19/11, 2019 at 3:35 Comment(3)
Where can you se "max autoscaling"? I cannot find any documentation on it.Lachrymose
In YAML, use autoscaling.knative.dev/maxScale: '4', I couldn't find GUI knobs and suspect YAML is the design. For me, Cloud Run can spike far over its 4 max to 12! I think because my site is new/unused it scales to zero on idle, so when GoogleBot drives past it scrambles to scale up, and being .NET it takes a while to start, it overshoots. I suspect a smaller VM with autoscaling.knative.dev/minScale: '1' might prevent this, but I'm not sure what's cheaper, brief overscaling or always on.Georgy
See cloud.google.com/run/docs/configuring/max-instances and the sibling doc pages for all this.Georgy
S
1

This error can be caused by one of the following.

  1. A huge sudden increase in traffic.
  2. A long cold start time.
  3. A long request processing time
  4. A sudden increase in request processing time
  5. The service reaching its maximum container instance limit (HTTP 429)

We have faced similar issue sporadically and it was due to a long request processing time when DB latencies are high for few requests.

Spout answered 25/8, 2023 at 19:17 Comment(0)
O
0

Setting the Max Retry Attempts to anything but zero will remedy this, as it did for me.

Onder answered 18/12, 2022 at 18:14 Comment(3)
Do you still have these errors or did they vanish?Rina
@AndreasDyballa the errors/issue was resolved fullyOnder
I found that very strange. Google is saying they guarantee exactly-once even if retry is switched off. So theses errors should be warnings and dont say the handler was not run. But I was seeing also a drop of theses errors, if I switch it on. Im puzzled. I will ignore those errors.Rina
K
0

This error can be business as usual for cloud run during scaling.

During scaling up, GCP networking stack routes your request to the cold starting instance even though it hasn't passed its' health check yet. So client of the request is left hanging for the duration of cold start + duration of the request.

This is suboptimal, since you might have existing cloud run instance resources that could serve the request immediately. Ideally there should be no routing to cold-starting instances if any current instances aren't too overloaded.

Load balancer keeps the client waiting until cold start + request is finished. These error messages pop up at client timeout. Timeout happens on combination of load_balancer timeout, cloud run service timeout, client timeout, GCP infra timeout (10 sec?). On timeout, load balancer says response_sent_by_backend status 500. Even though your "backend" instance aka your container never got the request due to networking.

For me the main problem is, why are cloud run instances scaling in scenarios when they shouldn't be according to docs?

Based on autoscaling logic which is brought out here and here. You might have 0 reason for cloud run to scale up, but it suddenly might scale up.

e.g.

  • CPU usage is 10% (Not 60%)
  • You have concurrency set to 80, max concurrent requests is 5
  • You have 1 instance currently, min instances is set to 1, max instances is 5

For these log cases where duration is displayed, it's important to look at receiveTimestamp of log vs timestamp of log. Timestamp is time of request arrived, receiveTimestamp is time of response sent.

Kisor answered 26/6, 2024 at 12:11 Comment(1)
I got answer from GCP support. They say that slow CPU intensive requests lower the effective concurrency capacity of the container and then that is compared to active concurrent request value. If it exceeds, then scaling is triggered. Basically 3-4 CPU heavy requests into 15 fast light request will trigger scaling.Kisor

© 2022 - 2025 — McMap. All rights reserved.