SIGTERM not sent on pod delete
Asked Answered
U

3

5

When deleting a pod or deploying a new version of a pod kubernetes should theoretically send a SIGTERM to the process running and then wait gracePeriodSeconds (30 by default) seconds before it sends a SIGKILL.

I have encountered the problem that this first SIGTERM never seems to be sent. The default settings in my cluster were never changed (kill is sent as expected after 30 seconds), so my assumption is that there might be something wrong, permissions or similar, with my Dockerfile (see below).

I've excluded there being an error in the graceful shutdown logic catching the SIGTERM in the executable by kubectl exec-ing into the pod and using kill -15 on the process which works as expected.

The Dockerfile looks as follows:

FROM debian:bullseye-slim AS app

ARG USERNAME=app
ARG USER_UID=1000
ARG USER_GID=$USER_UID
RUN apt update && apt install -y libssl-dev zstd ca-certificates pkg-config

RUN groupadd --gid $USER_GID $USERNAME \
    && useradd --uid $USER_UID --gid $USER_GID -m $USERNAME
WORKDIR /home/$USERNAME

ARG RELEASE_DIR
ARG SERVICE 

USER $USERNAME

COPY $RELEASE_DIR .

EXPOSE 8080

ENV CMD=./${SERVICE}
CMD ${CMD}

Is there something blatantly wrong here? Or does kubernetes require some additional config to actually send the termination signal as expected?

Uncork answered 8/6, 2022 at 10:57 Comment(0)
M
10

For termination to work correctly, you need to ensure your application is the main container process. With the shell form CMD, the command your container runs is /bin/sh -c '${CMD}', and depending on what's in that environment variable and what /bin/sh actually is, that shell wrapper might keep running as the main container process and intercept the termination signal.

The same mechanisms apply in both plain Docker and Kubernetes and you should see a similar issue if you docker stop the container locally. This may be easier to debug and iterate on.

The easiest way to work around this is to use the exec form of CMD that looks like a JSON array. Since this won't run a shell, it also can't do variable expansions, and you'll have to spell out what you want the command to actually be

CMD ["./service"]

This is still easy to override at runtime, and you in fact don't need that CMD at all:

# instead of `docker run -e CMD='...'`
docker run --rm my-image \
  ls -l /home/app
# or in a Kubernetes pod spec
command:
  - /home/app/another_app
args:
  - --option

You can probably similarly remove pretty much all of the ARG declarations in the Dockerfile (the name or numeric uid of the container user shouldn't matter, for example, and the compiled application filename and host build path are usually fixed) which will simplify the setup.

Moton answered 8/6, 2022 at 11:11 Comment(4)
Hmm that sounds sensible, if I want to deploy different services from the same Dockerfile via CI and just supply the correct SERVICE arg, that's not possible then?Uncork
I'd expect each service to have its own Dockerfile, rather than this "Mad-Libs" style over-generic Dockerfile. Dockerfile environment replacement doesn't happen on CMD so you need to use a shell there; possible the shell exec special command could be helpful here as well.Moton
Maybe another dumb question, but simply running something like [ "`echo ${CMD}`" ] would not work right?Uncork
That looks for an executable file in $PATH (/bin, /usr/bin, etc.) whose file name includes the backquotes, space, dollar sign, and curly braces.Moton
I
2

As your ENTRYPOINT, use a script like following:

#!/bin/bash

# Start the main process and save its PID
# Use exec to replace the shell script process with the main process
exec my_main_process &
pid=$!

# Trap the SIGTERM signal and forward it to the main process
trap 'kill -SIGTERM $pid; wait $pid' SIGTERM

# Wait for the main process to complete
wait $pidh

Source: https://www.linkedin.com/pulse/propagating-sigterm-signal-main-process-kubernetes-pod-chidambaram/

Imbibition answered 4/4 at 20:8 Comment(0)
H
0

It is not recommended to use CMD, the main process must be in ENTRYPOINT

I use it in this way with 'exec' , assigning the node to the main process. This ensures that both the bash and the server are terminated in the correct order. However, the approach of starting the node in the background also works, but be cautious when terminating it.

#!/bin/bash
handle_sigterm() {
  pid=$(pgrep -f "node ./index.js")
  if [ -n "$pid" ]; then
    kill -SIGTERM "$pid"
  fi
  wait "$pid"
}
trap 'handle_sigterm' SIGTERM
exec node index.js
Huntsman answered 24/10 at 5:42 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.