Kubernetes: stop CloudSQL-proxy sidecar container in multi container Pod/Job
Asked Answered
C

8

39

I have a Kubernetes JOB that does database migrations on a CloudSQL database.
One way to access the CloudSQL database from GKE is to use the CloudSQL-proxy container and then connect via localhost. Great - that's working so far. But because I'm doing this inside a K8s JOB the job is not marked as successfully finished because the proxy keeps on running.

$ kubectrl get po
NAME                      READY     STATUS      RESTARTS   AGE
db-migrations-c1a547      1/2       Completed   0          1m

Even though the output says 'completed' one of the initially two containers is still running - the proxy.

How can I make the proxy exit on completing the migrations inside container 1?

Cytherea answered 16/1, 2017 at 15:6 Comment(3)
downvote without a comment? would be helpful add a note what could be improved ...Cytherea
Hi @Phillip, did you find a solution in the meantime, or any alternative approach to do the migrations?Ingurgitate
@MarkVincze I went with Christian's approach of creating a separate proxy service that the job can connect to - no need for the sidecar thenCytherea
D
38

The best way I have found is to share the process namespace between containers and use the SYS_PTRACE securityContext capability to allow you to kill the sidecar.

apiVersion: batch/v1
kind: Job
metadata:
  name: my-db-job
spec:
  template:
    spec:
      restartPolicy: OnFailure
      shareProcessNamespace: true
      containers:
      - name: my-db-job-migrations
        command: ["/bin/sh", "-c"]
        args:
          - |
            <your migration commands>;
            sql_proxy_pid=$(pgrep cloud_sql_proxy) && kill -INT $sql_proxy_pid;
        securityContext:
          capabilities:
            add:
              - SYS_PTRACE
      - name: cloudsql-proxy
        image: gcr.io/cloudsql-docker/gce-proxy:1.17
        command:
          - "/cloud_sql_proxy"
        args:
          - "-instances=$(DB_CONNECTION_NAME)=tcp:5432"
          
Dorcus answered 2/11, 2020 at 17:9 Comment(4)
This answer deserves way more votes. its the only option that worked for me!Choriamb
This did not work for me at all. I don't know why. I can't see what's happening but I get operation not permitted if I exec into the container and try to run this command manually.Crabbing
Thanks, this answer pointed us in the right direction! Please note that the code in above answer won't fail your job if the migration command(s) exit(s) with an error code. If you want your job to fail when the migration fails, you could do something like this: <your migration commands>;migration_exit_code=$?; sql_proxy_pid=$(pgrep cloud_sql_proxy) && kill -INT $sql_proxy_pid && exit $migration_exit_codeLebanon
This works only if you're running my-db-job-migrations container as root userSudhir
T
14

One possible solution would be a separate cloudsql-proxy deployment with a matching service. You would then only need your migration container inside the job that connects to your proxy service.

This comes with some downsides:

  • higher network latency, no pod local mysql communication
  • possible security issue if you provide the sql port to your whole kubernetes cluster

If you want to open cloudsql-proxy to the whole cluster you have to replace tcp:3306 with tcp:0.0.0.0:3306 in the -instance parameter on the cloudsql-proxy.

Tartaric answered 26/1, 2017 at 20:32 Comment(4)
yes, also tried that. but took me a while to figure the 0.0.0.0 part out ;)Cytherea
having a standalone cloudsqlproxy deployment is not recommended by Google SupportEquestrienne
It also worked for me. I took me long time to figure out that my sql proxy by default runs on 127.0.0.1 port and when we have above architecture ( service and sql proxy pods under that ) it needs to be open with 0.0.0.0:3306Caretaker
here's how to do this securely (basically service=pgbouncer+cloudsql proxy), and configure your app to securely connect to pgbouncer github.com/GoogleCloudPlatform/cloud-sql-proxy/tree/main/…Asuncionasunder
S
11

There are 3 ways of doing this.

1- Use private IP to connect your K8s job to Cloud SQL, as described by @newoxo in one of the answers. To do that, your cluster needs to be a VPC-native cluster. Mine wasn't and I was not whiling to move all my stuff to a new cluster. So I wasn't able to do this.

2- Put the Cloud SQL Proxy container in a separate deployment with a service, as described by @Christian Kohler. This looks like a good approach, but it is not recommended by Google Cloud Support.

I was about to head in this direction (solution #2) but I decided to try something else.

And here is the solution that worked for me:

3- You can communicate between different containers in the same Pod/Job using the file system. The idea is to tell the Cloud SQL Proxy container when the main job is done, and then kill the cloud sql proxy. Here is how to do it:

In the yaml file (my-job.yaml)

apiVersion: v1
kind: Pod
metadata:
  name: my-job-pod
  labels:
    app: my-job-app
spec:
  restartPolicy: OnFailure
  containers:
  - name: my-job-app-container
    image: my-job-image:0.1
    command: ["/bin/bash", "-c"]
    args:
      - |
        trap "touch /lifecycle/main-terminated" EXIT
        { your job commands here }
    volumeMounts:
      - name: lifecycle
        mountPath: /lifecycle
  - name: cloudsql-proxy-container
    image: gcr.io/cloudsql-docker/gce-proxy:1.11
    command: ["/bin/sh", "-c"]
    args:
      - |
        /cloud_sql_proxy -instances={ your instance name }=tcp:3306 -credential_file=/secrets/cloudsql/credentials.json &
        PID=$!
        while true
            do
                if [[ -f "/lifecycle/main-terminated" ]] 
                then
                    kill $PID
                    exit 0
                fi
                sleep 1
            done
    securityContext:
      runAsUser: 2  # non-root user
      allowPrivilegeEscalation: false
    volumeMounts:
      - name: cloudsql-instance-credentials
        mountPath: /secrets/cloudsql
        readOnly: true
      - name: lifecycle
        mountPath: /lifecycle
  volumes:
  - name: cloudsql-instance-credentials
    secret:
      secretName: cloudsql-instance-credentials
  - name: lifecycle
    emptyDir:

Basically, when your main job is done, it will create a file in /lifecycle that will be identified by the watcher added to the cloud-sql-proxy container, which will kill the proxy and terminate the container.

I hope it helps! Let me know if you have any questions.

Based on: https://mcmap.net/q/409566/-control-order-of-container-termination-in-a-single-pod-in-kubernetes

Sammer answered 9/5, 2020 at 15:6 Comment(1)
kill: (7) - Operation not permitted, every time, kill commands do not workCrabbing
P
3

Doesn't look like Kubernetes can do this alone, you would need to manually kill the proxy once the migration exits. Similar question asked here: Sidecar containers in Kubernetes Jobs?

Potash answered 16/1, 2017 at 23:12 Comment(0)
A
3

Google cloud sql has recently launched private ip address connectivity for cloudsql. If the cloud sql instance and kubernetes cluster is in same region you can connect to cloudsql without using cloud sql proxy.

https://cloud.google.com/sql/docs/mysql/connect-kubernetes-engine#private-ip

Ayotte answered 5/1, 2019 at 9:14 Comment(2)
Link to documentation or an example?Homochromatic
@Archonic: You can find the docs here cloud.google.com/sql/docs/mysql/…Ayotte
C
2

A possible solution would be to set the concurrencyPolicy: Replace in the job spec ... this will agnostically replace the current pod with the new instance whenever it needs to run again. But, you have to make sure that the subsequent cron runs are separated enough.

Chuckle answered 16/11, 2018 at 15:52 Comment(0)
D
1

Cloud-sql-proxy 2 supports a quitquitquit endpoint that can be used to shut down the sidecar proxy. There is no more need for workarounds such as sharing process spaces or writing to a file.

Simply add the --quitquitquit flag to sql-cloud-proxy and kill the sidecar by sending a POST request in the args of the job definition:

;exit_code=$?; curl -X POST localhost:9091/quitquitquit && exit $exit_code

Read more here: https://github.com/GoogleCloudPlatform/cloud-sql-proxy/issues/828

Donyadoodad answered 28/8, 2023 at 1:11 Comment(1)
I needed to upgrade to v2 cloud-sql-proxy. There is a guide but beware the launch command is /cloud_sql_proxy (no leading . as stated in guide).Endosmosis
F
0

Unfortunately the other answers weren't working for me because of CloudSQLProxy running in a distroless environment where there is no shell.

I managed to get around this by bundling a CloudSQLProxy binary with my deployment and running a bash script to start up CloudSQLProxy followed by my app.

Dockerfile:

FROM golang:1.19.4

RUN apt update
COPY . /etc/mycode/
WORKDIR /etc/mycode
RUN chmod u+x ./scripts/run_migrations.sh
RUN chmod u+x ./bin/cloud_sql_proxy.linux-amd64

RUN go install
ENTRYPOINT ["./scripts/run_migrations.sh"]

Shell Script (run_migrations.sh):

#!/bin/sh

# This script is run from the parent directory
dbConnectionString=$1
cloudSQLProxyPort=$2

echo "Starting Cloud SQL Proxy"
./bin/cloud_sql_proxy.linux-amd64 -instances=${dbConnectionString}=tcp:5432 -enable_iam_login -structured_logs &
CHILD_PID=$!
echo "CloudSQLProxy PID: $CHILD_PID"

echo "Migrating DB..."
go run ./db/migrations/main.go
MAIN_EXIT_CODE=$?

kill $CHILD_PID;
echo "Migrations complete.";

exit $MAIN_EXIT_CODE

K8s (via Pulumi):

import * as k8s from '@pulumi/kubernetes'

const jobDBMigrations = new k8s.batch.v1.Job("job-db-migrations", {
      metadata: {
        namespace: namespaceName,
        labels: appLabels,
      },
      spec: {
        backoffLimit: 4,
        template: {
          spec: {
            containers: [
              {
                image: pulumi.interpolate`gcr.io/${gcpProject}/${migrationsId}:${migrationsVersion}`,
                name: "server-db-migration",
                args: [
                  dbConnectionString,
                ],
              },
            ],
            restartPolicy: "Never",
            serviceAccount: k8sSAMigration.metadata.name,
          },
        },
      },
    },
    {
      provider: clusterProvider,
    });
Flaring answered 20/1, 2023 at 0:1 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.