Continuous integration/deployment/delivery on Google App Engine, too risky?
G

1

6

We have recently setted up continuous integration/deployment/delivery of a nodejs webapp on Google App Engine. The CI server (GitLabCI) runs dependencies installation, build, tests and deployment to integration/prod depending on the branch (develop/master).

At the day of today, the only bugs we've faced to was during the dependencies step, and so we didn't care much about it. But yesterday (21/10/16), there was a wide-scale DNS outage and the pipeline failed in the middle of the deployment step, breaking down the prod. Simply re-run the pipeline has made the job, but the problem can reproduce at any time.

My questions are:

  • How can we handle this sort of network issues, in the continuous deployment process ?
  • Is the continuous deployment on Google App Engine really a good idea ?
  • If so, what is the App Engine deployment methodo ? I don't find any relevant doc about it...

For the moment we have only two versions "dev" and "prod" that are updated after commits, but at random times I could observe strange behaviours.

Any response/suggestions/feedback is very welcome !

Example of stacktrace concerning the networking issues I am talking about:

DEBUG: Error sending result: 'MetadataServerException(HTTPError(),)'. Reason: 'PicklingError("Can't pickle <type 'cStringIO.StringO'>: attribute lookup cStringIO.StringO failed",)'
Traceback (most recent call last):
  File "/google-cloud-sdk/lib/googlecloudsdk/calliope/cli.py", line 733, in Execute
    resources = args.calliope_command.Run(cli=self, args=args)
  File "/google-cloud-sdk/lib/googlecloudsdk/calliope/backend.py", line 1630, in Run
    resources = command_instance.Run(args)
  File "/google-cloud-sdk/lib/surface/app/deploy.py", line 53, in Run
    return deploy_util.RunDeploy(self, args)
  File "/google-cloud-sdk/lib/googlecloudsdk/command_lib/app/deploy_util.py", line 387, in RunDeploy
    all_services)
  File "/google-cloud-sdk/lib/googlecloudsdk/command_lib/app/deploy_util.py", line 247, in Deploy
    manifest = _UploadFiles(service, code_bucket_ref)
  File "/google-cloud-sdk/lib/googlecloudsdk/command_lib/app/deploy_util.py", line 115, in _UploadFiles
    service, code_bucket_ref)
  File "/google-cloud-sdk/lib/googlecloudsdk/api_lib/app/deploy_app_command_util.py", line 277, in CopyFilesToCodeBucketNoGsUtil
    _UploadFiles(files_to_upload, bucket_ref)
  File "/google-cloud-sdk/lib/googlecloudsdk/api_lib/app/deploy_app_command_util.py", line 219, in _UploadFiles
    results = pool.map(_UploadFile, tasks)
  File "/usr/lib/python2.7/multiprocessing/pool.py", line 251, in map
    return self.map_async(func, iterable, chunksize).get()
  File "/usr/lib/python2.7/multiprocessing/pool.py", line 558, in get
    raise self._value
MaybeEncodingError: Error sending result: 'MetadataServerException(HTTPError(),)'. Reason: 'PicklingError("Can't pickle <type 'cStringIO.StringO'>: attribute lookup cStringIO.StringO failed",)'
DEBUG: Exception captured in Error
Traceback (most recent call last):
  File "/google-cloud-sdk/lib/googlecloudsdk/core/metrics.py", line 411, in Wrapper
    return func(*args, **kwds)
TypeError: Error() takes exactly 3 arguments (1 given)
ERROR: gcloud crashed (MaybeEncodingError): Error sending result: 'MetadataServerException(HTTPError(),)'. Reason: 'PicklingError("Can't pickle <type 'cStringIO.StringO'>: attribute lookup cStringIO.StringO failed",)'
Traceback (most recent call last):
  File "/google-cloud-sdk/lib/gcloud.py", line 65, in <module>
    main()
  File "/google-cloud-sdk/lib/gcloud.py", line 61, in main
    sys.exit(googlecloudsdk.gcloud_main.main())
  File "/google-cloud-sdk/lib/googlecloudsdk/gcloud_main.py", line 145, in main
    crash_handling.HandleGcloudCrash(err)
  File "/google-cloud-sdk/lib/googlecloudsdk/command_lib/crash_handling.py", line 107, in HandleGcloudCrash
    _ReportError(err)
  File "/google-cloud-sdk/lib/googlecloudsdk/command_lib/crash_handling.py", line 86, in _ReportError
    util.ErrorReporting().ReportEvent(error_message=stacktrace,
  File "/google-cloud-sdk/lib/googlecloudsdk/api_lib/error_reporting/util.py", line 28, in __init__
    self._API_NAME, self._API_VERSION)
  File "/google-cloud-sdk/lib/googlecloudsdk/core/apis.py", line 254, in GetClientInstance
    http_client = http.Http()
  File "/google-cloud-sdk/lib/googlecloudsdk/core/credentials/http.py", line 60, in Http
    creds = store.Load()
  File "/google-cloud-sdk/lib/googlecloudsdk/core/credentials/store.py", line 282, in Load
    if account in c_gce.Metadata().Accounts():
  File "/google-cloud-sdk/lib/googlecloudsdk/core/credentials/gce.py", line 122, in Accounts
    gce_read.GOOGLE_GCE_METADATA_ACCOUNTS_URI + '/')
  File "/google-cloud-sdk/lib/googlecloudsdk/core/util/retry.py", line 160, in TryFunc
    return func(*args, **kwargs), None
  File "/google-cloud-sdk/lib/googlecloudsdk/core/credentials/gce.py", line 45, in _ReadNoProxyWithCleanFailures
    raise MetadataServerException(e)
googlecloudsdk.core.credentials.gce.MetadataServerException: HTTP Error 503: Service Unavailable
DEBUG: Uploading [/builds/apps/webapp/lib/jinja2/defaults.pyc] to [151c77b4e5bdd2c38b6a2bf914fffa3a6ffa71a6]
INFO: Uploading [/builds/apps/webapp/lib/jinja2/defaults.pyc] to [151c77b4e5bdd2c38b6a2bf914fffa3a6ffa71a6]
INFO: Refreshing access_token
Gaillard answered 22/10, 2016 at 13:10 Comment(0)
C
7

Good/bad? Subjective - thus off-topic for SO. Assuming the question is how to make continuous deployment reliable :)

Well, the trouble is that you're using app versions as your CI environments, which means you can't avoid breakages due to a specific version being bad. You can only hope to recover as fast as possible by re-deploying the version (when the outage ends) - this can be automated.

You should not have your production site running directly off the version overwritten by the CI production pipeline, otherwise you risk site outage on a bad deployment. Instead you could use a new/unique version for each execution of the CI production pipeline and only after that completes successfully you finally switch site traffic to its version using the flow described below (which can also be used inside the CI pipelines if using different apps instead of app versions as CI environments)

From Deploying your program:

By default the deploy command automatically generates a new version ID each time that you use it and will route any traffic to the new version.

To override this behavior, you can specify the version ID with the version flag:

gcloud app deploy --version myID

You can also specify not to send all traffic to the new version immediatey with the --no-promote flag:

gcloud app deploy --no-promote

So make sure you never deploy a version and make that version the default traffic destination one in the same step (possibly not atomic if driven from the client side). Especially for the production app. Instead:

This way the only critical operation is traffic switching, which (hopefully) is an atomic operation which is either successful or it's completely rolled back on GAE side (if not it's a GAE bug). If this step fails the app should still continue to work with the old version.

Of course, this assumes the networking issues are only in between you and GAE, if they're also affecting GAE's internal ops all bets are off (but those I trust should be fixed rather timely).

Cincture answered 22/10, 2016 at 14:35 Comment(6)
Thank you for your very detailed response. You're probably right about using different apps as CI environments, that's a better idea and it could solve the different issues we're facing to. I've a last question: the app is autoscaled, and so I cannot start/stop versions (according to the doc). When a build will create a version, the traffic will be charged, should I set up basic scale ? Or should I delete previous version when new is created ?Gaillard
No need to explicitly start the new version with autoscaling. Just use the respective version's URLs for testing that the version works and GAE will start the instances itself: cloud.google.com/appengine/docs/flexible/python/…Cincture
+1000000 Good answer as it comes right there @DanCornilescuOverindulge
Does the gcloud app deploy --no-promote generate version ID if the version ID not included in the command?Overindulge
@GeekGuy yes, it should - the deployment needs a version ID. That's the default behaviour.Cincture
Great, I prefer the auto-generated ones then I see their order in the consoleOverindulge

© 2022 - 2024 — McMap. All rights reserved.