Cloud Run finishes but Cloud Scheduler thinks that job has failed

Asked 3/10, 2019 at 17:4 Answered 14/12, 2021 at 1:17

google-cloud-platform google-cloud-run google-cloud-scheduler

I have a Cloud Run service setup and I have a Cloud Scheduler task that calls an endpoint on that service. When the task completes (http handler returns), I'm seeing the following error:

The request failed because the HTTP connection to the instance had an error.

However, the actual handler returns HTTP 200 and successfully exists. Does anyone know what this error means and under what circumstances it shows up?

I'm also attaching a screenshot of the logs.

Imgur

Gulf answered 3/10, 2019 at 17:4 Comment(4)

Can you elaborate on the nature of your cloud run service? How are you sending a 200 response? Are you flushing/closing the connection? – Cordalia 3/10, 2019 at 17:31

It's a simple go app that returns 200 at the end of the request. It takes about ~30 seconds to complete. I just realized that I cut of the right side of the screenshot which includes status code. The 4th line in the logs above is coming from the app and is emitted at the end of the request by the logging middleware in the app. It's a 200 response. Then the line below it is a 503, which seems like a log from Cloud Scheduler service. I ran Cloud scheduler on a different endpoint and it succeeded. That endpoint takes ~8 sec to complete. – Gulf 3/10, 2019 at 17:41

same issue here – Happ 6/10, 2020 at 4:21

I'm having the same issue and I'm also using Go, not Node.js – Videogenic 26/6, 2021 at 13:27

Does your job take longer than 120 seconds? I was having the same issue and figured out node versions prior to 13 has 120 seconds server.timeout limit. I installed node 13 on docker and problem is gone.

Leeannaleeanne answered 6/4, 2020 at 13:15 Comment(2)

If you are using express, you can just do res.connection.setTimeout(0). – Menispermaceous 1/9, 2020 at 9:33

I have the same issue with a go app – Happ 6/10, 2020 at 4:21

I've run an incremental sleep test on my FLASK endpoint which returns 200 within 1 min, 2 min and 10 min of waiting time. Having triggered the endpoint via the Cloud Scheduler, the job failed only in the 10 min test. I've found that it was one of the properties of my Cloud Scheduler job causing the failure. The following solved my issue.

gcloud scheduler jobs describe <my_test_scheduler>

There, you'll see a property called 'attemptDeadline' which was set to 180 seconds by default.

You can update that property using:

gcloud scheduler jobs update http <my_test_scheduler> --attempt-deadline 1000s

Ref: scheduler update

Gerous answered 14/12, 2021 at 1:17 Comment(0)

Error 503 is returned by the Google Frontend (GFE). The Cloud Run service either has a transient issue, or the GFE has determined that your service is not ready or not working correctly.
In your log entries, I see a POST request. 7 ms later is the error 503. This tells me your Cloud Run application is not yet ready (in a ready state determined by Cloud Run).
One minute, 8 seconds before, I see ReplaceService. This tells me that your service is not yet in a running state and that if you retry later, you will see success.

Lucullus answered 3/10, 2019 at 18:56 Comment(1)

In addition, I have 503 when my container crash. Can you paste the logs of cloud run? – Hoshi 3/10, 2019 at 19:20

Recommended topics

Hot tags