How to change Docker stack restarting behaviour?

Asked 24/10, 2018 at 13:4 Answered 6/11, 2018 at 20:8

Solved docker docker-compose docker-stack

In our project we inherited Docker environment with some service stack in it.

I've noticed Docker restarting stack once it faces memory limit.

Unfortunately, I haven't found any info according to my questions on the Docker's website, so I'm asking here:

Is this behaviour configurable? For instance, I don't want Docker to restart my stack under any circumstances. If it is configurable, then how?
Is there any docker journal to keep any stack restarts as it's entries?

Hydracid answered 24/10, 2018 at 13:4 Comment(0)

Is this behaviour configurable? For instance, I don't want Docker to restart my stack under any circumstances. If it is configurable, then how?

With a version 3 stack, the restart policy moved to the deploy section:

version: '3'
services:
  crash:
    image: busybox
    command: sleep 10
    deploy:
      restart_policy:
        condition: none
        # max_attempts: 2

Documentation on this is available at: https://docs.docker.com/compose/compose-file/#restart_policy

Is there any docker journal to keep any stack restarts as it's entries?

Depending on the task history limit (configurable with docker swarm update, you can view the previously run tasks for a service:

$ docker service ps restart_crash
ID                  NAME                  IMAGE               NODE                DESIRED STATE       CURRENT STATE            ERROR               PORTS
30okge1sjfno        restart_crash.1       busybox:latest      bmitch-asusr556l    Shutdown            Complete 4 minutes ago
papxoq1vve1a         \_ restart_crash.1   busybox:latest      bmitch-asusr556l    Shutdown            Complete 4 minutes ago
1hji2oko51sk         \_ restart_crash.1   busybox:latest      bmitch-asusr556l    Shutdown            Complete 5 minutes ago

And you can inspect the state for any one task:

$ docker inspect 30okge1sjfno --format '{{json .Status}}' | jq .
{
  "Timestamp": "2018-11-06T19:55:02.208633174Z",
  "State": "complete",
  "Message": "finished",
  "ContainerStatus": {
    "ContainerID": "8e9310bde9acc757f94a56a32c37a08efeed8a040ce98d84c851d4eef0afc545",
    "PID": 0,
    "ExitCode": 0
  },
  "PortStatus": {}
}

There's also an event history in the docker engine that you can query:

$ docker events --filter label=com.docker.swarm.service.name=restart_crash --filter event=die --since 15m --until 0s
2018-11-06T14:54:09.417465313-05:00 container die f17d945b249a04e716155bcc6d7db490e58e5be00973b0470b05629ce2cca461 (com.docker.stack.namespace=restart, com.docker.swarm.node.id=q44zx0s2lvu1fdduk800e5ini, com.docker.swarm.service.id=uqirm6a8dix8c2n50thmpzj06, com.docker.swarm.service.name=restart_crash, com.docker.swarm.task=, com.docker.swarm.task.id=1hji2oko51skhv8fv1nw71gb8, com.docker.swarm.task.name=restart_crash.1.1hji2oko51skhv8fv1nw71gb8, exitCode=0, image=busybox:latest@sha256:2a03a6059f21e150ae84b0973863609494aad70f0a80eaeb64bddd8d92465812, name=restart_crash.1.1hji2oko51skhv8fv1nw71gb8)
2018-11-06T14:54:32.391165964-05:00 container die d6f98b8aaa171ca8a2ddaf31cce7a1e6f1436ba14696ea3842177b2e5e525f13 (com.docker.stack.namespace=restart, com.docker.swarm.node.id=q44zx0s2lvu1fdduk800e5ini, com.docker.swarm.service.id=uqirm6a8dix8c2n50thmpzj06, com.docker.swarm.service.name=restart_crash, com.docker.swarm.task=, com.docker.swarm.task.id=papxoq1vve1adriw6e9xqdaad, com.docker.swarm.task.name=restart_crash.1.papxoq1vve1adriw6e9xqdaad, exitCode=0, image=busybox:latest@sha256:2a03a6059f21e150ae84b0973863609494aad70f0a80eaeb64bddd8d92465812, name=restart_crash.1.papxoq1vve1adriw6e9xqdaad)
2018-11-06T14:55:00.126450155-05:00 container die 8e9310bde9acc757f94a56a32c37a08efeed8a040ce98d84c851d4eef0afc545 (com.docker.stack.namespace=restart, com.docker.swarm.node.id=q44zx0s2lvu1fdduk800e5ini, com.docker.swarm.service.id=uqirm6a8dix8c2n50thmpzj06, com.docker.swarm.service.name=restart_crash, com.docker.swarm.task=, com.docker.swarm.task.id=30okge1sjfnoicd0lo2g1y0o7, com.docker.swarm.task.name=restart_crash.1.30okge1sjfnoicd0lo2g1y0o7, exitCode=0, image=busybox:latest@sha256:2a03a6059f21e150ae84b0973863609494aad70f0a80eaeb64bddd8d92465812, name=restart_crash.1.30okge1sjfnoicd0lo2g1y0o7)

See more details on the events command at: https://docs.docker.com/engine/reference/commandline/events/

The best practice at larger scale organizations is to send the container logs to a central location (e.g. Elastic) and monitor the metrics externally (e.g. Prometheus/Grafana).

Gelya answered 6/11, 2018 at 20:8 Comment(1)

One more mentionworthy issue is that the memory limits may be set too low. Ideally, the memory limits should be set to a reasonable amount so that Docker can make appropriate scheduling decisions and prevent any one instance of a scheduled service from using all available memory on the host. An OOM kill/restart is a much more ideal outcome than a whole node failing and causing other nodes to go down when the downed workloads get scheduled elsewhere. – Gravesend 6/11, 2018 at 20:11

Since you haven't added any configuration snippet or runtime commands to your post, I'll have to make hypothesis on your actual question.

My assumptions :

you are running multiple services using docker-compose
these services have memory limits configured (in the docker-compose.yml file)
you see them restarting once they hit the configured memory limit, and you want to prevent them from restarting

I assume your docker-compose.yml looks like the following:

version: '2.1'
services:
   service1:
     image: some/image
     restart: always
     mem_limit: 512m
   service2:
     image: another/image
     restart: always
     mem_limit: 512m

With this configuration, any of the service containers would be OOM-Killed by the kernel when it tries to use more than 512Mb of memory. Docker would then automatically restart a fresh container to replace the killed one.

So to answer your 1st point : yes, it is, just change "restart" to "no", or simply remove this line (since "no" is the default value for this parameter). As for your second point, simply look for service restarts in the docker daemon logs.

Yet, if what you need is to keep your service up, this is not going to help you : your service will still try to use more than its allowed memory limit, it will still get killed, ... and not be automatically restarted anymore.

It would be better to review the memory usage pattern of your services, and understand why they are attempting to use more than the configured limit. Eventually, the solution is either to configure your services to use less memory, or raise the mem_limit in your docker-compose.yml.

For example :

for a database service, configure the memory options to force the engine to not use more RAM than mem_limit (SGA and PGA under Oracle, various buffers and cache sizes for MySQL/MariaDB, ...)
for java applications, configure the Xmx to be less enough than the mem_limit (keeping in mind the needs for non-heap memory), or preferably with a recent JDK (latest 8 or 9+) go for -XX:+UnlockExperimentalVMOptions -XX:+UseCGroupMemoryLimitForHeap.

I hope this will help you; to be more precise I would really need more context.

Crosscut answered 6/11, 2018 at 18:57 Comment(1)

This answer assumes that a docker-compose style deploy is being used. The terminology in the question was stack, which implies that this is a swarm mode type deploy that uses docker stack deploy and a version: "3.*" level syntax, and memory limits would be set under the deploy key instead. – Gravesend 6/11, 2018 at 20:9

Recommended topics

Hot tags