Google App Engine deployment debug "Error Response: [4] Timed out waiting for the app infrastructure to become healthy"
Asked Answered
W

1

8

In short: We have web app running at Google App Engine Nodejs runtime, flex environment. Starting 5 days ago all our deployments started to fail with the reason:

ERROR: (gcloud.app.deploy) Error Response: [4] Timed out waiting for the app infrastructure to become healthy.

Full error stack trace:

Updating service [default] (this may take several minutes)...\DEBUG: Operation [apps/PROJECT_ID/operations/45d6fec1-9261-41d2-943a-648976b971ed] not complete. Waiting to retry.
Updating service [default] (this may take several minutes)...-DEBUG: Operation [apps/PROJECT_ID/operations/45d6fec1-9261-41d2-943a-648976b971ed] complete. Result: {
    "metadata": {
        "user": "[email protected]", 
        "target": "apps/PROJECT_ID/services/default/versions/release-0-6-3",
        "@type": "type.googleapis.com/google.appengine.v1.OperationMetadataV1", 
        "insertTime": "2018-02-19T06:08:56.439Z", 
        "method": "google.appengine.v1.Versions.CreateVersion"
    }, 
    "done": true, 
    "name": "apps/PROJECT_ID/operations/45d6fec1-9261-41d2-943a-648976b971ed", 
    "error": {
        "message": "Timed out waiting for the app infrastructure to become healthy.", 
        "code": 4
    }
}
Updating service [default] (this may take several minutes)...failed.                                                                    
DEBUG: (gcloud.app.deploy) Error Response: [4] Timed out waiting for the app infrastructure to become healthy.
Traceback (most recent call last):
  File "/usr/lib/google-cloud-sdk/lib/googlecloudsdk/calliope/cli.py", line 797, in Execute
    resources = calliope_command.Run(cli=self, args=args)
  File "/usr/lib/google-cloud-sdk/lib/googlecloudsdk/calliope/backend.py", line 757, in Run
    resources = command_instance.Run(args)
  File "/usr/lib/google-cloud-sdk/lib/surface/app/deploy.py", line 65, in Run
    parallel_build=False)
  File "/usr/lib/google-cloud-sdk/lib/googlecloudsdk/command_lib/app/deploy_util.py", line 588, in RunDeploy
    flex_image_build_option=flex_image_build_option)
  File "/usr/lib/google-cloud-sdk/lib/googlecloudsdk/command_lib/app/deploy_util.py", line 394, in Deploy
    extra_config_settings)
  File "/usr/lib/google-cloud-sdk/lib/googlecloudsdk/api_lib/app/appengine_api_client.py", line 188, in DeployService
    message=message)
  File "/usr/lib/google-cloud-sdk/lib/googlecloudsdk/api_lib/app/operations_util.py", line 246, in WaitForOperation
    sleep_ms=retry_interval)
  File "/usr/lib/google-cloud-sdk/lib/googlecloudsdk/api_lib/util/waiter.py", line 266, in WaitFor
    sleep_ms=sleep_ms)
  File "/usr/lib/google-cloud-sdk/lib/googlecloudsdk/core/util/retry.py", line 222, in RetryOnResult
    if not should_retry(result, state):
  File "/usr/lib/google-cloud-sdk/lib/googlecloudsdk/api_lib/util/waiter.py", line 260, in _IsNotDone
    return not poller.IsDone(operation)
  File "/usr/lib/google-cloud-sdk/lib/googlecloudsdk/api_lib/app/operations_util.py", line 171, in IsDone
    encoding.MessageToPyValue(operation.error)))
OperationError: Error Response: [4] Timed out waiting for the app infrastructure to become healthy.
ERROR: (gcloud.app.deploy) Error Response: [4] Timed out waiting for the app infrastructure to become healthy.

Before that (3 weeks ago) deploys began to be really slow (5-20 minutes).

Info about operation provided by command:

gcloud beta app operations describe OPERATION_ID

gives this:

done: true
error:
  code: 4
  message: Timed out waiting for the app infrastructure to become healthy.
metadata:
  '@type': type.googleapis.com/google.appengine.v1.OperationMetadataV1
  endTime: '2018-02-19T06:36:02.752Z'
  insertTime: '2018-02-19T06:08:56.439Z'
  method: google.appengine.v1.Versions.CreateVersion
  target: apps/PROJECT_ID/services/default/versions/release-0-6-3
  user: [email protected]
name: apps/PROJECT_ID/operations/45d6fec1-9261-41d2-943a-648976b971ed

Any ideas how to get more info about the operation and what actions are performed?

Best,

Alex

Wanderjahr answered 19/2, 2018 at 8:42 Comment(3)
UPDATE: this bug happens randomly, e.g. today I was able to update the app. Looks like there is no point in getting more info about the operation cause it won't affect the result. Here is what I got at Google Issue Tracker: ""This issue is known to Engineering, who are working towards a fix. There is not estimated time to resolution as yet. You may follow this issue to keep up-to-date with eventual developments."Wanderjahr
We still got this problem in 2019... I hope they have now an estimate to resolve this problem.Amphictyony
@Amphictyony increasing this quota helped us to get through - cloud.google.com/compute/quotas#ip_addresses, console.cloud.google.com/iam-admin/quotas. More info is in the thread here - issuetracker.google.com/issues/73583699Wanderjahr
W
4

At the moment of writing I haven't found the way to get more verbose error messages during Google App Engine deployment from CLI at Ubuntu rather than passing verbosity option gcloud app deploy --verbosity=debug or getting more info about the operation with gcloud beta app operations describe OPERATION_ID.

But they found the cause of the problem: The reason for the Compute Engine VMs never becoming healthy is because your 'In-use IP addresses' quota [1] in the region of your App Engine Flexible application has reached its limit. More here - https://issuetracker.google.com/issues/73583699. And promised to provide better error messages with the next update of the CLI

Wanderjahr answered 20/3, 2018 at 13:13 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.