No space on device with Jenkins and Docker - how to cleanup properly
Asked Answered
F

5

13

We're running Jenkins (version 2.60.1) on an Ubuntu 16.04.1 server. One of the issues we've been running into recently is that we routinely get the error "no space left on device".

I understand when using Docker there needs to be a strict clean-up process due to the files that are left behind and taking up unnecessary space.

We're using the CloudBees Docker Build and Publish plugin to handle the build and push to AWS ECS. I thought about removing all the unused Images. The thing is if I login to the Jenkins instance (over SSH) and try to run the docker command it gives - "Cannot connect to the Docker daemon. Is the docker daemon running on this host?"

I suppose somehow I need to do this from within Jenkins environment or part of the plugin?

Anyone dealt with this before or have some advice? - I'd really appreciate it.

Fairyfairyland answered 16/7, 2017 at 12:46 Comment(4)
Can you try removing few intermediate docker images and then restart the instance? Anyways its better to take a backup of all the persisted directories in host(If you used any volume mounts)Reminiscence
I just run a daily docker system prune --all when no jobs are running.Illuviation
@Illuviation this is only an option for Docker >= 1.13Fairyfairyland
I don't think "restarting instance" would be good practice. The better way could be setting cron for cleaning unsued/dangling images using "docker prune image" rather prune to all. Also you can set timeout for slave rotation.Sanderling
S
8

Docker < 1.13

For Docker older than 1.13 you can do following for cleaning up some space on your device:

docker ps -a | grep -i 'exited' | awk '{print $1}' | xargs docker rm > /dev/null 2>&1 &
docker images -a | grep "<none>" | awk '{print $3}' | xargs docker rmi > /dev/null 2>&1 &

Alternatively you can try running following docker command:

docker rmi $(docker images --filter "dangling=true" -q --no-trunc)

It will clean old orphan containers and will remove images tagged with <none>. I use these two formulas on one of my CI servers and it works fine. Before that I was facing similar to your issue (no space left on device).

Cleaning orphan volumes

docker volume rm $(docker volume ls -qf dangling=true)
docker volume ls -qf dangling=true | xargs -r docker volume rm

Docker >= 1.13

Docker 1.13 introduces docker system prune command (https://docs.docker.com/engine/reference/commandline/system_prune/). Alternatively you can run:

  • docker image prune
  • docker volume prune
  • docker container prune

You can run those commands as a part of your Jenkins pipeline. In one of the projects I work on we run cleanup after building new Docker images during the release process. Try it as well to fix "Cannot connect to the Docker daemon. Is the docker daemon running on this host?" problem.

Spinning answered 16/7, 2017 at 16:22 Comment(11)
thanks for the tip, in my case I have <1.13 so I tried the first two commands you listed but they didn't appear to effect the disk space. When I checked the syslog it gives me a bunch of messages like this: "dockerd[952]: time="2017-07-17T08:29:28.413567257Z" level=error msg="Handler for DELETE /v1.24/images/10a26800d95a returned error: conflict: unable to delete 10a26800d95a (cannot be forced) - image has dependent child images""Fairyfairyland
Those errors does not affect the result - Docker by default does not allow you to remove images that are still used by other ones. In this case (using -a switch) those kind of errors are expected. You can also try: docker rmi $(docker images --filter "dangling=true" -q --no-trunc) If it doesn't clean your disk space it may mean that there are no orphan images that cause running out of free space.Spinning
I see, when I run the docker rmi command it gives me: "docker rmi" requires at least 1 argument(s). See 'docker rmi --help'. Usage: docker rmi [OPTIONS] IMAGE [IMAGE...] Remove one or more imagesFairyfairyland
It means that following command $(docker images --filter "dangling=true" -q --no-trunc) return nothing (no image candidates to remove). It means that there are no docker images to clean up in your case. I will update answer with a command for cleaning docker volumes. If this wont help it means that there is nothing to remove and you will have to search what else is causes the problem.Spinning
fyi. "docker system prune -a" will execute all the comand you mentioned in the correct order and also remove unused images not just dangling images (and clean networks). I suggest using docker image prune -a to remove all unused images if networks are to be preserved. docs.docker.com/engine/reference/commandline/system_prune/…Shemikashemite
Yeah, but this is for docker >= 1.13, Aaron mentioned he's using version < 1.13Spinning
Gotcha. I tried the command for cleaning orphan volumes but giving me "docker volume rm" requires at least 1 argument(s). See 'docker volume rm --help'. Usage: docker volume rm VOLUME [VOLUME...] Remove one or more volumes" - does this mean there's a completely different issue causing problem?Fairyfairyland
Again, it means that $(docker volume ls -qf dangling=true) returned nothing (no volumes to remove). I would suspect that there is something else that consumes most of your volume space. Try searching largest files unix.stackexchange.com/questions/140367/… and see if there is anything to remove. Alternatively you may end up with need of a larger volume size.Spinning
Thanks, I tried that link and it did give me a few files over 100MB but seems i'm still missing something. I've expanded my disk size so that I could run more builds, each time it continues eating up the space and none of these commands reduce it. Could it be logs? I did find the aufs-diff directory takes up nearly half my drive - here's a screenshot dropbox.com/s/t06vjloe9ya1h8o/docker-aufs-diff.png?dl=0Fairyfairyland
@SzymonStepniak - would it be okay if i script so that the aufs/diff dir is regularly cleared out? seems this would save quite a bit of space...just don't want to break something :)Fairyfairyland
Don't remove this files manually, you may remove something that is in use. Check this blog post to get more details about cleaning volumes: lebkowski.name/docker-volumesSpinning
R
4

Jenkins' Docker plugin has a Remove Volumes checkbox in its Docker Agent Template configuration for this:

Remove the volumes associated to the container during container remove.

Rame answered 26/6, 2019 at 12:59 Comment(2)
How to include this option in a declarative pipeline?Tamarind
This setting is agent global and can be found in <jenkins-url>/configureClouds/Orelle
F
3

After using the script provided by burnettk below it seemed that while some space was freed after time running more builds I was back at the same place, no space on my EBS volume. It simply does not make sense that I would have to add more storage and pay AWS even more on my monthly bill.

In doing some investigation I discovered that for EACH build there were approximately 7 images created (docker images -a) consisting of about 1.4GB each, ie 9GB/build. The first 2 are tagged with the build # and latest while the rest are tagged .

It's really not important that all these images are stored on this server as the purpose is for build and anyway they are pushed to ECR. So I've added the following into my script so that only the latest docker image is kept:

docker rmi $(docker images | sed 1,3d | awk '{print $3}')

Lastly, I have also adjusted my docker build command by adding the --rm argument so that it will remove intermediate containers after building.

docker build --rm

Hope this is helpful!

Fairyfairyland answered 27/9, 2017 at 8:31 Comment(0)
P
2

in order to get past the "cannot connect to the docker docker daemon" issue, figure out what users are in the docker group

grep 'docker' /etc/group

and then run the docker cleanup commands (you'll want to turn it into a script you run on cron or something) as one of those users. or get sudo access with another user and use sudo:

sudo docker rmi [image_name_here]

here's the contents of an example cleanup script (/usr/local/bin/clean_up_docker_stuff_on_ci_agent or similar):

#!/bin/bash

# stop containers that have been running for more than a day (may not be valid in your context if you intend run things for a long time)
docker ps -a | egrep " days" | awk '{print $1}' | grep -v CONTAINER | xargs docker stop

# remove all exited containers
docker ps -a | egrep "Exited|Created" | awk '{print $1}' | grep -v CONTAINER | xargs docker rm

# remove old images
docker images | egrep 'weeks|months' | awk '{print $1 ":" $2}' | xargs docker rmi -f
docker images | egrep 'weeks|months' | grep '<none>' | awk '{ print $3 }' | xargs docker rmi -f

# kill stray volumes
docker volume ls -qf dangling=true | xargs -r docker volume rm

As Szymon Stepniak mentions in his answer, if you're using docker >= 1.13, there are simpler options.

cron example (20 after every hour):

20 * * * * /usr/local/bin/clean_up_docker_stuff_on_ci_agent > /dev/null 2>&1
Philippe answered 16/7, 2017 at 13:35 Comment(7)
Thanks for quick reply. When I run the first command, it gives me "docker:x:999:jenkins" - so I take it's running under jenkins and this is why it cannot connect to docker daemon?Fairyfairyland
yeah, you mentioned that you ssh'ed to the box and ran docker. if you did that with a user other than the jenkins user, you were going to have a bad time unless you used sudo, since only the jenkins user is in the docker group in your setup.Philippe
yea, looks like once i switch to jenkins user its good. so i could then setup cron for jenkins user? which command do you recommend that I set under cron there?Fairyfairyland
Thanks a lot for the script. However seems that it's giving some errors, missing arguments. This is the output when I run the script manually: dropbox.com/s/uvxtmywpzc45xmo/jenkins-clean-up.png?dl=0 and I'm using Docker 1.12.4Fairyfairyland
yeah, when there is nothing to do, piping nothing to xargs docker blah produces that sort of output. this is part of the reason all output is going to /dev/null in the cron script. don't worry, it works. when you upgrade to docker 1.13 it will be easier to write a short script with cleaner ouput. :)Philippe
Let us continue this discussion in chat.Fairyfairyland
While it seems this script was working for awhile - now I'm back at the same place where the drive is maxed out and running the script does not free up any space. Any other ideas why docker might be doing this? :-/Fairyfairyland
F
0

The error you are having is usually because the user is not allowed to use docker cli or docker is stopped.

Regardless, to answer your question how to do a cleanup properly.

  1. Ensure you set your jobs to discard old builds regularly.
  2. If you can't run docker (I have no idea why). Then just have a cron that cleans up your Jenkins server /var/lib/docker directory.
  3. Lastly, use docker container that runs as jenkins slaves. This way, build artifacts are stored in a temporary filesystem and if you re-provision your jenkins slaves regulary you won't encounter your disk space issue anymore.
Ferguson answered 16/7, 2017 at 13:45 Comment(2)
I have the jobs set to discard old builds regularly. When you say clean up /var/lib/docker directory - do you mean remove everything there? I see directories: aufs, containers, image, network, swarm, tmp, trust, volumes...Fairyfairyland
If you are not running any container in that server then you won't need anything in volume directory but I think you should first resolve why you can't run docker. There are docker commands that can help you clean much easier and safer.Ferguson

© 2022 - 2024 — McMap. All rights reserved.