kubelet fails to get cgroup stats for docker and kubelet services

Asked 13/10, 2017 at 9:2 Answered 20/4, 2020 at 18:5

I'm running kubernetes on bare-metal Debian (3 masters, 2 workers, PoC for now). I followed k8s-the-hard-way, and I'm running into the following problem on my kubelet:

Failed to get system container stats for "/system.slice/docker.service": failed to get cgroup stats for "/system.slice/docker.service": failed to get cgroup stats for "/system.slice/docker.service": failed to get container info for "/system.slice/docker.service": unknown container "/system.slice/docker.service"

And I have the same message for kubelet.service.

I have some files about those cgroups:

$ ls /sys/fs/cgroup/systemd/system.slice/docker.service
cgroup.clone_children  cgroup.procs  notify_on_release  tasks

$ ls /sys/fs/cgroup/systemd/system.slice/kubelet.service/
cgroup.clone_children  cgroup.procs  notify_on_release  tasks

And cadvisor tells me:

$ curl http://127.0.0.1:4194/validate
cAdvisor version: 

OS version: Debian GNU/Linux 8 (jessie)

Kernel version: [Supported and recommended]
    Kernel version is 3.16.0-4-amd64. Versions >= 2.6 are supported. 3.0+ are recommended.


Cgroup setup: [Supported and recommended]
    Available cgroups: map[cpu:1 memory:1 freezer:1 net_prio:1 cpuset:1 cpuacct:1 devices:1 net_cls:1 blkio:1 perf_event:1]
    Following cgroups are required: [cpu cpuacct]
    Following other cgroups are recommended: [memory blkio cpuset devices freezer]
    Hierarchical memory accounting enabled. Reported memory usage includes memory used by child containers.


Cgroup mount setup: [Supported and recommended]
    Cgroups are mounted at /sys/fs/cgroup.
    Cgroup mount directories: blkio cpu cpu,cpuacct cpuacct cpuset devices freezer memory net_cls net_cls,net_prio net_prio perf_event systemd 
    Any cgroup mount point that is detectible and accessible is supported. /sys/fs/cgroup is recommended as a standard location.
    Cgroup mounts:
    cgroup /sys/fs/cgroup/systemd cgroup rw,nosuid,nodev,noexec,relatime,xattr,release_agent=/lib/systemd/systemd-cgroups-agent,name=systemd 0 0
    cgroup /sys/fs/cgroup/cpuset cgroup rw,nosuid,nodev,noexec,relatime,cpuset 0 0
    cgroup /sys/fs/cgroup/cpu,cpuacct cgroup rw,nosuid,nodev,noexec,relatime,cpu,cpuacct 0 0
    cgroup /sys/fs/cgroup/memory cgroup rw,nosuid,nodev,noexec,relatime,memory 0 0
    cgroup /sys/fs/cgroup/devices cgroup rw,nosuid,nodev,noexec,relatime,devices 0 0
    cgroup /sys/fs/cgroup/freezer cgroup rw,nosuid,nodev,noexec,relatime,freezer 0 0
    cgroup /sys/fs/cgroup/net_cls,net_prio cgroup rw,nosuid,nodev,noexec,relatime,net_cls,net_prio 0 0
    cgroup /sys/fs/cgroup/blkio cgroup rw,nosuid,nodev,noexec,relatime,blkio 0 0
    cgroup /sys/fs/cgroup/perf_event cgroup rw,nosuid,nodev,noexec,relatime,perf_event 0 0


Managed containers: 
    /kubepods/burstable/pod76099b4b-af57-11e7-9b82-fa163ea0076a
    /kubepods/besteffort/pod6ed4ee49-af53-11e7-9b82-fa163ea0076a/f9da6bf60a186c47bd704bbe3cc18b25d07d4e7034d185341a090dc3519c047a
            Namespace: docker
            Aliases:
                    k8s_tiller_tiller-deploy-cffb976df-5s6np_kube-system_6ed4ee49-af53-11e7-9b82-fa163ea0076a_1
                    f9da6bf60a186c47bd704bbe3cc18b25d07d4e7034d185341a090dc3519c047a
    /kubepods/burstable/pod76099b4b-af57-11e7-9b82-fa163ea0076a/956911118c342375abfb7a07ec3bb37451bbc64a1e141321b6284cf5049e385f

EDIT

Disabling the cadvisor port on kubelet (--cadvisor-port=0) doesn't fix that.

Tumescent answered 13/10, 2017 at 9:2 Comment(1)

what's the Cgroup Driver for your docker? Should included in the output of docker info. – Rhea 24/10, 2017 at 5:43

Try to start kubelet with

--runtime-cgroups=/systemd/system.slice --kubelet-cgroups=/systemd/system.slice

I'm using this solution on RHEL7 with Kubelet 1.8.0 and Docker 1.12

Offset answered 15/10, 2017 at 9:36 Comment(0)

The angeloxx's workaround works also on AWS default image for kops (k8s-1.8-debian-jessie-amd64-hvm-ebs-2017-12-02 (ami-bd229ec4))

sudo vim /etc/sysconfig/kubelet

add at the end of DAEMON_ARGS string:

 --runtime-cgroups=/systemd/system.slice --kubelet-cgroups=/systemd/system.slice

finally:

sudo systemctl restart kubelet

Boring answered 30/12, 2017 at 10:35 Comment(2)

In CentOS7 I had to edit a different file: /etc/systemd/system/kubelet.service.d/10-kubeadm.conf – Gamosepalous 6/2, 2018 at 14:19

Using this pattern, I submitted a PR so the same /etc/sysconfig/kubelet method could be used in systemd-based RHEL systems. github.com/kubernetes/kubernetes/pull/61633 – Mattern 24/3, 2018 at 19:20

I had to do a yum update in addition to this change to make it work. Might be helpful for others trying this workaround.

Procyon answered 25/10, 2017 at 14:46 Comment(1)

Yes.It worked. Its saved my day even in 2019.Thanks – Escheat 5/7, 2019 at 12:40

Thanks angeloxx!

I'm following the kubernetes guide: https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/setup-ha-etcd-with-kubeadm/

In the instructions, they have you make a file: /usr/lib/systemd/system/kubelet.service.d/20-etcd-service-manager.conf

with the line:

ExecStart=/usr/bin/kubelet --address=127.0.0.1 --pod-manifest-path=/etc/kubernetes/manifests --cgroup-driver=systemd

I took your answer and added it to the end of the ExecStart line:

ExecStart=/usr/bin/kubelet --address=127.0.0.1 --pod-manifest-path=/etc/kubernetes/manifests --cgroup-driver=systemd --runtime-cgroups=/systemd/system.slice --kubelet-cgroups=/systemd/system.slice

I'm writing this in case it helps someone else

@ wolmi Thanks for the edit!

One more note: The config I have above is for my etcd cluster, NOT the kubernetes nodes. A file like 20-etcd-service-manager.conf on a node would override all the settings in the "10-kubeadm.conf" file, causing all kinds if missed configurations. Use the "/var/lib/kubelet/config.yaml" file for nodes and/or /var/lib/kubelet/kubeadm-flags.env.

Splashboard answered 20/4, 2020 at 18:5 Comment(1)

For ubuntu I ran kubelet --pod-manifest-path=/etc/kubernetes/manifests --cgroup-driver=systemd --runtime-cgroups=/systemd/system.slice --kubelet-cgroups=/systemd/system.slice and it worked! – Krever 29/8, 2020 at 2:40

For those a little further along, in kops AMI kope.io/k8s-1.8-debian-jessie-amd64-hvm-ebs-2018-02-08 as above, I had to add:

add at the end of DAEMON_ARGS string:

--runtime-cgroups=/lib/systemd/system/kubelet.service --kubelet-cgroups=/lib/systemd/system/kubelet.service

and then: sudo systemctl restart kubelet

but I found I was still getting:

Failed to get system container stats for "/system.slice/docker.service": failed to get cgroup stats for "/system.slice/docker.service": failed to get container info for "/system.slice/docker.service": unknown container "/system.slice/docker.service"

restarting dockerd resolved this error: sudo systemctl restart docker

Thanks

After a little more digging around I found a better resolution to add this into the kops configuration:

https://github.com/kubernetes/kops/issues/4049

Gadid answered 26/3, 2018 at 23:19 Comment(1)

I am also getting same error restarting kubelet and docker didn't help me – Ras 20/4, 2020 at 6:2

Recommended topics

Hot tags