This error can be business as usual for cloud run during scaling.
During scaling up, GCP networking stack routes your request to the cold starting instance even though it hasn't passed its' health check yet. So client of the request is left hanging for the duration of cold start + duration of the request.
This is suboptimal, since you might have existing cloud run instance resources that could serve the request immediately. Ideally there should be no routing to cold-starting instances if any current instances aren't too overloaded.
Load balancer keeps the client waiting until cold start + request is finished. These error messages pop up at client timeout. Timeout happens on combination of load_balancer timeout, cloud run service timeout, client timeout, GCP infra timeout (10 sec?). On timeout, load balancer says response_sent_by_backend
status 500. Even though your "backend" instance aka your container never got the request due to networking.
For me the main problem is, why are cloud run instances scaling in scenarios when they shouldn't be according to docs?
Based on autoscaling logic which is brought out here and here.
You might have 0 reason for cloud run to scale up, but it suddenly might scale up.
e.g.
- CPU usage is 10% (Not 60%)
- You have concurrency set to 80, max concurrent requests is 5
- You have 1 instance currently, min instances is set to 1, max instances is 5
For these log cases where duration is displayed, it's important to look at receiveTimestamp of log vs timestamp of log. Timestamp is time of request arrived, receiveTimestamp is time of response sent.