We have a GKE cluster with auto-upgrading nodes. We recently noticed a node become unschedulable and eventually deleted that we suspect was being upgraded automatically for us. Is there a way to confirm (or otherwise) in Stackdriver that this was indeed the cause what was happening?
You can use the following advanced logs queries with Cloud Logging (previously Stackdriver) to detect upgrades to node pools:
protoPayload.methodName="google.container.internal.ClusterManagerInternal.UpdateClusterInternal"
resource.type="gke_nodepool"
and master:
protoPayload.methodName="google.container.internal.ClusterManagerInternal.UpdateClusterInternal"
resource.type="gke_cluster"
Additionally, you can control when the update are applied with Maintenance Windows (like the user aurelius mentioned).
I know it's not Cloud Logging, but another method to list the auto-upgrade operations is with gcloud. In Cloud Logging I could only find the completion of the upgrade, not the start.
gcloud container operations list
Just to add more value to the post, you may find different GKE related sample log queries here
I think your question has been already answered in the comments. Just as addition automatic upgrades occur at regular intervals at the discretion of the GKE team. To get more control you can create a Maintenance Windows as explained here. This is basically a time frame that you choose in which automatic upgrades should occur.
© 2022 - 2024 — McMap. All rights reserved.
cordon
and drain. In which case the kubelet would produce the below if stackdriver is scraping that. kubelet[1319]: I0624 18:41:04.771532 1319 kubelet_node_status.go:447] Recording NodeNotSchedulable event message for node gke-squareroute-default-pool-9f095a99-s6z9 – CotenantNodeNotSchedulable
popping up at that time. I guess that's not entirely sufficient to know that it was caused by the automatic node upgrade (e.g. someone could be doing it manually), but it gets us a good way towards it. – Garry