Init script for Cassandra with docker-compose

O

3

23

I would like to create keyspaces and column-families at the start of my Cassandra container.

I tried the following in a docker-compose.yml file:

# shortened for clarity
cassandra:
    hostname: my-cassandra
    image: my/cassandra:latest
    command: "cqlsh -f init-database.cql"

The image my/cassandra:latest contains init-database.cql in /. But this does not seem to work.

Is there a way to make this happen ?

Obsolesce answered 5/11, 2016 at 21:55 Comment(1)

Possible duplicate of Create keyspace automatically inside docker container with cassandra – Kellikellia 1/12, 2017 at 10:5

I

14

We recently tried to solve a similar problem in KillrVideo, a reference application for Cassandra. We are using Docker Compose to spin up the environment needed by the application which includes a DataStax Enterprise (i.e. Cassandra) node. We wanted that node to do some bootstrapping the first time it was started to install the CQL schema (using cqlsh to run the statements in a .cql file just like you're trying to do). Basically the approach we took was to write a shell script for our Docker entrypoint that:

Starts the node normally but in the background.
Waits until port 9042 is available (this is where clients connect to run CQL statements).
Uses cqlsh -f to run some CQL statements and init the schema.
Stops the node that's running in the background.
Continues on to the usual entrypoint for our Docker image that starts up the node normally (in the foreground like Docker expects).

We just use the existence of a file to indicate whether the node has already been bootstrapped and check that on startup to determine whether we need to do that logic above or can just start it normally. You can see the results in the killrvideo-dse-docker repository on GitHub.

There is one caveat to this approach. This worked great for us because in our reference application, we're only spinning up a single node (i.e. we aren't creating a cluster with more than one node). If you're running multiple nodes, you'll probably want to make sure that only one of the nodes does the bootstrapping to create the schema because multiple clients modifying the schema simultaneously can cause some issues with your cluster. (This is a known issue and will hopefully be fixed at some point.)

Iloilo answered 6/11, 2016 at 17:17 Comment(2)

Great answer, thank you. I was just trying something similar to what you propose. It seems to be a satisfying enough solution for my needs. – Obsolesce 6/11, 2016 at 17:30

You're welcome! I updated the answer to include a caveat about multiple nodes, so be sure to keep that in mind if you're in that situation. – Iloilo 6/11, 2016 at 17:35

F

29

I was also searching for the solution to this question, and here is the way how I accomplished it.
Here the second instance of Cassandra has a volume with the schema.cql and runs CQLSH command

My Version with healthcheck so we can get rid of sleep command

version: '2.2'

services:
  cassandra:
      image: cassandra:3.11.2
      container_name: cassandra
      ports:
        - "9042:9042"
      environment:
        - "MAX_HEAP_SIZE=256M"
        - "HEAP_NEWSIZE=128M"
      restart: always
      volumes:
        - ./out/cassandra_data:/var/lib/cassandra
      healthcheck:
        test: ["CMD", "cqlsh", "-u cassandra", "-p cassandra" ,"-e describe keyspaces"]
        interval: 15s
        timeout: 10s
        retries: 10

  cassandra-load-keyspace:
      container_name: cassandra-load-keyspace
      image: cassandra:3.11.2
      depends_on:
        cassandra:
          condition: service_healthy
      volumes:
        - ./src/main/resources/cassandra_schema.cql:/schema.cql
      command: /bin/bash -c "echo loading cassandra keyspace && cqlsh cassandra -f /schema.cql"

NetFlix Version using sleep

version: '3.5'

services:
  cassandra:
      image: cassandra:latest
      container_name: cassandra
      ports:
        - "9042:9042"
      environment:
        - "MAX_HEAP_SIZE=256M"
        - "HEAP_NEWSIZE=128M"
      restart: always
      volumes:
        - ./out/cassandra_data:/var/lib/cassandra

  cassandra-load-keyspace:
      container_name: cassandra-load-keyspace
      image: cassandra:latest
      depends_on:
        - cassandra
      volumes:
        - ./src/main/resources/cassandra_schema.cql:/schema.cql 
      command: /bin/bash -c "sleep 60 && echo loading cassandra keyspace && cqlsh cassandra -f /schema.cql"

P.S I found this way at one of the Netflix Repos

Flotation answered 23/7, 2018 at 10:2 Comment(7)

Just a heads up that version 3+ of docker-compose no longer allows the condition syntax used in your config (docs.docker.com/compose/compose-file/#depends_on), which is why Netflix is using the sleep. – Memorable 20/5, 2019 at 11:48

@SachinGiri basic idea that Cassandra has its own tool that can execute CQL - CQLSH, so after Cassandra starts its own docker image, the second instance of image runs this tool, which you can see in this line -> command: /bin/bash -c "echo loading cassandra keyspace && cqlsh cassandra -f /schema.cql" – Flotation 18/8, 2020 at 9:45

wow.. just copy pasted and it saved me some time. Thank you for sharing! – Lamkin 4/9, 2021 at 7:50

Extremely good script, still useful in 2023 with Scylla. – Barton 7/2, 2023 at 13:4

@Barton can you share how you got it working with scylla? it doesn't work for me, the node never gets configured – Mendel 16/6, 2023 at 2:11

@Mendel I had the same question but eventually got it working with scylla. I posted my files as an answer to this SO #75869013 – Assassin 24/8, 2023 at 20:19

@DiasAbdraimov When you run the cqlsh on the second cassandra image, won't it store the data or create the keyspaces in its own storage space? Also, the cqlsh in the second cassandra container should only run if cassandra is up and running in it? – Procathedral 16/1 at 14:32

I

14

We recently tried to solve a similar problem in KillrVideo, a reference application for Cassandra. We are using Docker Compose to spin up the environment needed by the application which includes a DataStax Enterprise (i.e. Cassandra) node. We wanted that node to do some bootstrapping the first time it was started to install the CQL schema (using cqlsh to run the statements in a .cql file just like you're trying to do). Basically the approach we took was to write a shell script for our Docker entrypoint that:

Starts the node normally but in the background.
Waits until port 9042 is available (this is where clients connect to run CQL statements).
Uses cqlsh -f to run some CQL statements and init the schema.
Stops the node that's running in the background.
Continues on to the usual entrypoint for our Docker image that starts up the node normally (in the foreground like Docker expects).

We just use the existence of a file to indicate whether the node has already been bootstrapped and check that on startup to determine whether we need to do that logic above or can just start it normally. You can see the results in the killrvideo-dse-docker repository on GitHub.

There is one caveat to this approach. This worked great for us because in our reference application, we're only spinning up a single node (i.e. we aren't creating a cluster with more than one node). If you're running multiple nodes, you'll probably want to make sure that only one of the nodes does the bootstrapping to create the schema because multiple clients modifying the schema simultaneously can cause some issues with your cluster. (This is a known issue and will hopefully be fixed at some point.)

Iloilo answered 6/11, 2016 at 17:17 Comment(2)

Great answer, thank you. I was just trying something similar to what you propose. It seems to be a satisfying enough solution for my needs. – Obsolesce 6/11, 2016 at 17:30

You're welcome! I updated the answer to include a caveat about multiple nodes, so be sure to keep that in mind if you're in that situation. – Iloilo 6/11, 2016 at 17:35

O

6

I solved this problem by patching cassandra's docker-entrypoint.sh so it will execute sh and cql files located in /docker-entrypoint-initdb.d on startup. This is similar to how MySQL docker containers work.

Basically, I add a small script at the end of the docker-entrypoint.sh (right before the last line, exec "$@"), that will run the cql scripts once cassandra is up. A simplified version is:

INIT_DIR=docker-entrypoint-initdb.d
# this whole block will execute in the background
(
    cd $INIT_DIR
    # wait for cassandra to be ready
    while ! cqlsh -e 'describe cluster' > /dev/null 2>&1; do sleep 6; done
    echo "$0: Cassandra cluster ready: executing cql scripts found in $INIT_DIR"
    # find and execute cql scripts, in name order
    for f in $(find . -type f -name "*.cql" -print | sort); do
        echo "$0: running $f"
        cqlsh -f "$f"
        echo "$0: $f executed"
    done
) &

This solution works for all cassandra versions (at least until 3.11, as the time of writing).

Hence, you only have to build and use this cassandra image version, and then add proper initializations scripts to the container using docker-compose volumes.

A complete gist with a more robust entrypoint patch (and example) is available here.

Osithe answered 25/8, 2020 at 12:10 Comment(0)

Recommended topics

Hot tags