About once a week our flexible app engine node app goes offline and the following line appears in the logs: Restarting batch of VMs for version 20181008t134234 as part of rolling restart.
We have our app set to automatic scaling with the following settings:
runtime: nodejs
env: flex
beta_settings:
cloud_sql_instances: tuzag-v2:us-east4:tuzag-db
automatic_scaling:
min_num_instances: 1
max_num_instances: 3
liveness_check:
path: "/"
check_interval_sec: 30
timeout_sec: 4
failure_threshold: 2
success_threshold: 2
readiness_check:
path: "/"
check_interval_sec: 15
timeout_sec: 4
failure_threshold: 2
success_threshold: 2
app_start_timeout_sec: 300
resources:
cpu: 1
memory_gb: 1
disk_size_gb: 10
I understand the rolling restarts of GCP/GAE, but am confused as to why Google isn't spinning up another VM before taking our primary one offline. Do we have to run with a min num of 2 instances to prevent this from happening? Is there a way I get configure my app.yaml
to make sure another instance is spun up before it reboots the only running instance? After the reboot finishes, everything comes back online fine, but there's still 10 minutes of downtime, which isn't acceptable, especially considering we can't control when it reboots.
min_num_instances: 1
,min_num_instances: 3
and the majority of the time it comfortably runs on 1. Insanely when GAE restarts the instance and there's only 1 it doesn't bother spinning up a new one beforehand, taking the service offline. Did you ever find a solution other thanmin_instances: 2
? – Sizemore