I am attempting to have a kubernetes nginx deployment with zero downtime. Part of that process has been to initiate a rollingUpdate, which ensures that at least one pod is running nginx at all times. This works perfectly well.
I am running into errors when the old nginx pod is terminating. According to the kubernetes docs on termination, kubernetes will:
- remove the pod from the endpoints list for the service, so it is not receiving any new traffic when termination begins
- invoke a pre-stop hook if it is defined, and wait for it to complete
- send SIGTERM to all remaining processes
- send SIGKILL to any remaining processes after the grace period expires.
I understand that the command nginx -s quit
is supposed to gracefully terminate nginx by waiting for all workers to complete requests before the master terminates. It responds gracefully to the SIGQUIT command, while SIGTERM results in violent termination. Other forums say that it is as easy as adding the following preStop hook to your deployment:
lifecycle:
preStop:
exec:
command: ["/usr/sbin/nginx", "-s", "quit"]
However, from testing this command I have found that nginx -s quit
returns immediately, instead of waiting for the workers to complete. It also does not return the PID of the master process, which is what I was hoping for D:
What happens is, kubernetes invokes nginx -s quit
, which will send a proper SIGQUIT to the worker children, but not wait for them to complete. Instead it will jump right to step 3 and SIGTERM those processes instead, resulting in violent termination, and thus, lost connections.
QUESTION: Has anyone figured out a good way to gracefully shut down their nginx controller during a rolling deployment and have zero downtime? A sleep
workaround isn't good enough, I'm looking for something more robust.
Below is the full deployment yaml:
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: nginx-ingress-controller
spec:
replicas: 1
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 0
template:
metadata:
labels:
app: nginx-ingress-lb
spec:
terminationGracePeriodSeconds: 60
serviceAccount: nginx
containers:
- name: nginx-ingress-controller
image: gcr.io/google_containers/nginx-ingress-controller:0.9.0-beta.8
imagePullPolicy: Always
readinessProbe:
httpGet:
path: /healthz
port: 10254
scheme: HTTP
livenessProbe:
httpGet:
path: /healthz
port: 10254
scheme: HTTP
initialDelaySeconds: 10
timeoutSeconds: 5
args:
- /nginx-ingress-controller
- --default-backend-service=$(POD_NAMESPACE)/default-backend
- --v=2
env:
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
ports:
- containerPort: 80
lifecycle:
preStop:
exec:
command: ["/usr/sbin/nginx", "-s", "quit"]
STOPSIGNAL SIGQUIT
docs.docker.com/engine/reference/builder/#stopsignal in the nginx docker image does not work because of the race condition mentioned by @Lindsay Landry. You WILL loose connections for a brief moment, so stick with the bash script above. – Flavor