Docker daemon/container real-time scheduling with Ubuntu (Linux) host
Asked Answered
N

2

9

Before I begin, I was at two minds as to whether this question should be raised in SuperUser or Stackoverflow - apologies in advance if it's in the incorrect location.

I have a docker container (contains C/C++ executable code) which performs audio/video processing. As a result, I would like to test the benefits of running the container with RT scheduling constraints. Searching the web, I've come across various bits of information, but I'm struggling to put all the pieces together.

System Environment:

  • Host: Ubuntu (stock) Zesty 17.04 (No RT Kernel patches, Kernel: 4.10.0-35-genric)
  • Docker Version: 17.05.0-ce
  • Docker Images OS: Ubuntu Zesty 17.04.

In an executable nested in the docker image/container, the following code is executed to change the scheduler from 'SCHED_OTHER' to 'SCHED_FIFO' (see docs):

    struct sched_param sched = {};

    const int nMin = sched_get_priority_min(SCHED_FIFO);
    const int nMax = sched_get_priority_max(SCHED_FIFO);

    const int nHlf = (nMax - nMin) / 2;
    const int nPriority = nMin + nHlf + 1;

    sched.sched_priority = boost::algorithm::clamp(nPriority, nMin, nMax);

    if (sched_setscheduler(0, SCHED_FIFO, &sched) < 0)
        std::cerr << "SETSCHEDULER failed - err = " << strerror(errno) << std::endl;
    else
        std::cout << "Priority set to \"" << sched.sched_priority << "\"" << std::endl;

I've been reading varous bits of Docker documentation on using a realtime scheduler. One interesting page states,

Verify that CONFIG_RT_GROUP_SCHED is enabled in the Linux kernel by running zcat /proc/config.gz | grep CONFIG_RT_GROUP_SCHED or by checking for the existence of the file /sys/fs/cgroup/cpu.rt_runtime_us. For guidance on configuring the kernel realtime scheduler, consult the documentation for your operating system.

As per the aforementioned recommendation, the stock Ubuntu Zesty 17.04 OS seems to fail these checks.

First question(s): Cannot I use the RT scheduler? What is 'CONFIG_RT_GROUP_SCHED'? One thing that confuses me is that there are some older posts on the web from 2010-2012 about patching kernels with a RT patch. It seems that there has been some work in the Linux kernel related to soft RT since then.

The quote here has sparked my question:

From kernel version 2.6.18 onward, however, Linux is gradually becoming equipped with real-time capabilities, most of which are derived from the former realtime-preempt patches developed by Ingo Molnar, Thomas Gleixner, Steven Rostedt, and others. Until the patches have been completely merged into the mainline kernel (this is expected to be around kernel version 2.6.30), they must be installed to achieve the best real-time performance. These patches are named:

Carrying on...

Having read additional information, I note that it is important to set ulimits. I've altered /etc/security/limits.conf:

#*               soft    core            0
#root            hard    core            100000
#*               hard    rss             10000

# NEW ADDITION
gavin            hard    rtprio          99

Second question: Presumably the above is required to enable the docker daemon to run RT? It looks as if the daemon is controlled via systemd.

I continued further with my investigation and on the same Docker docs page saw the following snippet:

To run containers using the realtime scheduler, run the Docker daemon with the --cpu-rt-runtime flag set to the maximum number of microseconds reserved for realtime tasks per runtime period. For instance, with the default period of 10000 microseconds (1 second), setting --cpu-rt-runtime=95000 ensures that containers using the realtime scheduler can run for 95000 microseconds for every 10000-microsecond period, leaving at least 5000 microseconds available for non-realtime tasks. To make this configuration permanent on systems which use systemd, see Control and configure Docker with systemd.

Following this page, I discovered there were two parameters to the daemon that were of interest:

  --cpu-rt-period int                     Limit the CPU real-time period in microseconds
  --cpu-rt-runtime int                    Limit the CPU real-time runtime in microseconds

The same page indicates that docker daemon parameters can be specified via '/etc/docker/daemon.json', so I tried:

{
    "cpu-rt-period": 92500,
    "cpu-rt-runtime": 100000
}

Note: The docs do not specify the above options as 'allowed configuration options on Linux'. I thought I would give it a try nonetheless.

Docker daemon output upon restart:

-- Logs begin at Wed 2017-10-04 09:58:38 BST, end at Wed 2017-10-04 10:01:32 BST. --
Oct 04 09:58:47 gavin systemd[1]: Starting Docker Application Container Engine...
Oct 04 09:58:47 gavin dockerd[1501]: time="2017-10-04T09:58:47.885882588+01:00" level=info msg="libcontainerd: new containerd process, pid: 1531"
Oct 04 09:58:48 gavin dockerd[1501]: time="2017-10-04T09:58:48.053986072+01:00" level=warning msg="failed to rename /var/lib/docker/tmp for background deletion: %!s(<nil>).
Oct 04 09:58:48 gavin dockerd[1501]: time="2017-10-04T09:58:48.161303803+01:00" level=info msg="[graphdriver] using prior storage driver: aufs"
Oct 04 09:58:48 gavin dockerd[1501]: time="2017-10-04T09:58:48.303409053+01:00" level=info msg="Graph migration to content-addressability took 0.00 seconds"
Oct 04 09:58:48 gavin dockerd[1501]: time="2017-10-04T09:58:48.304002725+01:00" level=warning msg="Your kernel does not support swap memory limit"
Oct 04 09:58:48 gavin dockerd[1501]: time="2017-10-04T09:58:48.304078792+01:00" level=warning msg="Your kernel does not support cgroup rt period"
Oct 04 09:58:48 gavin dockerd[1501]: time="2017-10-04T09:58:48.304201239+01:00" level=warning msg="Your kernel does not support cgroup rt runtime"
Oct 04 09:58:48 gavin dockerd[1501]: time="2017-10-04T09:58:48.305534113+01:00" level=info msg="Loading containers: start."
Oct 04 09:58:48 gavin dockerd[1501]: time="2017-10-04T09:58:48.730193030+01:00" level=info msg="Default bridge (docker0) is assigned with an IP address 172.17.0.0/16. Daemo
Oct 04 09:58:48 gavin dockerd[1501]: time="2017-10-04T09:58:48.784938130+01:00" level=info msg="Loading containers: done."
Oct 04 09:58:48 gavin dockerd[1501]: time="2017-10-04T09:58:48.888035017+01:00" level=info msg="Daemon has completed initialization"
Oct 04 09:58:48 gavin dockerd[1501]: time="2017-10-04T09:58:48.888104120+01:00" level=info msg="Docker daemon" commit=89658be graphdriver=aufs version=17.05.0-ce
Oct 04 09:58:48 gavin dockerd[1501]: time="2017-10-04T09:58:48.903280645+01:00" level=info msg="API listen on /var/run/docker.sock"
Oct 04 09:58:48 gavin systemd[1]: Started Docker Application Container Engine.

The particular lines of interest:

Oct 04 09:58:48 gavin dockerd[1501]: time="2017-10-04T09:58:48.304078792+01:00" level=warning msg="Your kernel does not support cgroup rt period"
Oct 04 09:58:48 gavin dockerd[1501]: time="2017-10-04T09:58:48.304201239+01:00" level=warning msg="Your kernel does not support cgroup rt runtime"

Not surprising given my earlier discoveries.

Final question: When this is finally working, how will I be able to determine that my container is truly running with RT scheduling? Will the likes of 'top' suffice?

EDIT: I ran a kernel diagnostic script which I found through moby on github. This is the output:

warning: /proc/config.gz does not exist, searching other paths for kernel config ...
info: reading kernel config from /boot/config-4.10.0-35-generic ...

Generally Necessary:
- cgroup hierarchy: properly mounted [/sys/fs/cgroup]
- apparmor: enabled and tools installed
- CONFIG_NAMESPACES: enabled
- CONFIG_NET_NS: enabled
- CONFIG_PID_NS: enabled
- CONFIG_IPC_NS: enabled
- CONFIG_UTS_NS: enabled
- CONFIG_CGROUPS: enabled
- CONFIG_CGROUP_CPUACCT: enabled
- CONFIG_CGROUP_DEVICE: enabled
- CONFIG_CGROUP_FREEZER: enabled
- CONFIG_CGROUP_SCHED: enabled
- CONFIG_CPUSETS: enabled
- CONFIG_MEMCG: enabled
- CONFIG_KEYS: enabled
- CONFIG_VETH: enabled (as module)
- CONFIG_BRIDGE: enabled (as module)
- CONFIG_BRIDGE_NETFILTER: enabled (as module)
- CONFIG_NF_NAT_IPV4: enabled (as module)
- CONFIG_IP_NF_FILTER: enabled (as module)
- CONFIG_IP_NF_TARGET_MASQUERADE: enabled (as module)
- CONFIG_NETFILTER_XT_MATCH_ADDRTYPE: enabled (as module)
- CONFIG_NETFILTER_XT_MATCH_CONNTRACK: enabled (as module)
- CONFIG_NETFILTER_XT_MATCH_IPVS: enabled (as module)
- CONFIG_IP_NF_NAT: enabled (as module)
- CONFIG_NF_NAT: enabled (as module)
- CONFIG_NF_NAT_NEEDED: enabled
- CONFIG_POSIX_MQUEUE: enabled

Optional Features:
- CONFIG_USER_NS: enabled
- CONFIG_SECCOMP: enabled
- CONFIG_CGROUP_PIDS: enabled
- CONFIG_MEMCG_SWAP: enabled
- CONFIG_MEMCG_SWAP_ENABLED: missing
    (cgroup swap accounting is currently not enabled, you can enable it by setting boot option "swapaccount=1")
- CONFIG_LEGACY_VSYSCALL_EMULATE: enabled
- CONFIG_BLK_CGROUP: enabled
- CONFIG_BLK_DEV_THROTTLING: enabled
- CONFIG_IOSCHED_CFQ: enabled
- CONFIG_CFQ_GROUP_IOSCHED: enabled
- CONFIG_CGROUP_PERF: enabled
- CONFIG_CGROUP_HUGETLB: enabled
- CONFIG_NET_CLS_CGROUP: enabled (as module)
- CONFIG_CGROUP_NET_PRIO: enabled
- CONFIG_CFS_BANDWIDTH: enabled
- CONFIG_FAIR_GROUP_SCHED: enabled
- CONFIG_RT_GROUP_SCHED: missing
- CONFIG_IP_VS: enabled (as module)
- CONFIG_IP_VS_NFCT: enabled
- CONFIG_IP_VS_RR: enabled (as module)
- CONFIG_EXT4_FS: enabled
- CONFIG_EXT4_FS_POSIX_ACL: enabled
- CONFIG_EXT4_FS_SECURITY: enabled
- Network Drivers:
  - "overlay":
    - CONFIG_VXLAN: enabled (as module)
      Optional (for encrypted networks):
      - CONFIG_CRYPTO: enabled
      - CONFIG_CRYPTO_AEAD: enabled
      - CONFIG_CRYPTO_GCM: enabled (as module)
      - CONFIG_CRYPTO_SEQIV: enabled
      - CONFIG_CRYPTO_GHASH: enabled (as module)
      - CONFIG_XFRM: enabled
      - CONFIG_XFRM_USER: enabled (as module)
      - CONFIG_XFRM_ALGO: enabled (as module)
      - CONFIG_INET_ESP: enabled (as module)
      - CONFIG_INET_XFRM_MODE_TRANSPORT: enabled (as module)
  - "ipvlan":
    - CONFIG_IPVLAN: enabled (as module)
  - "macvlan":
    - CONFIG_MACVLAN: enabled (as module)
    - CONFIG_DUMMY: enabled (as module)
  - "ftp,tftp client in container":
    - CONFIG_NF_NAT_FTP: enabled (as module)
    - CONFIG_NF_CONNTRACK_FTP: enabled (as module)
    - CONFIG_NF_NAT_TFTP: enabled (as module)
    - CONFIG_NF_CONNTRACK_TFTP: enabled (as module)
- Storage Drivers:
  - "aufs":
    - CONFIG_AUFS_FS: enabled (as module)
  - "btrfs":
    - CONFIG_BTRFS_FS: enabled (as module)
    - CONFIG_BTRFS_FS_POSIX_ACL: enabled
  - "devicemapper":
    - CONFIG_BLK_DEV_DM: enabled
    - CONFIG_DM_THIN_PROVISIONING: enabled (as module)
  - "overlay":
    - CONFIG_OVERLAY_FS: enabled (as module)
  - "zfs":
    - /dev/zfs: missing
    - zfs command: missing
    - zpool command: missing

Limits:
- /proc/sys/kernel/keys/root_maxkeys: 1000000

Line of significance:

- CONFIG_RT_GROUP_SCHED: missing
Natant answered 4/10, 2017 at 11:8 Comment(4)
Thanks for asking this - I'm on a similar quest. I have a containerized C++ program that worked great until I added a separate thread to accept commands via a TCP socket (i.e. it spends most of its time blocking and doing very little). The main thread decodes a compressed video stream. The program (with 2 threads) also works just fine outside a container. So, there's something about the combination of container + threads that causes the decoder to not get scheduled. Did you manage to solve your problem? I think I need the RT scheduler.Demented
I have a container doing encoding/decoding and accepting HTTP requests, and I've not seen what you describe. I would surprised if what you describe is a scheduling problem related to running containers. Of course, debugging problems within a container is not trivial; I often find myself falling back to using trace statements.Natant
thanks for your reply. I'll report back if I figure this outDemented
Hello @Natant it looks like you were able to get it working. I'm trying to run OpenPLC logic on ubuntu inside docker. I want to make the OpenPLC logic real time. However, like you even I'm having the same issue. First of all I'm not able to find CONFIG_RT_GROUP_SCHED file. Can you tell me how you ran RT-ubuntu inside the docker?Winegar
S
11

Container Level

There are two options to do RT scheduling within a container:

  1. Add the SYS_NICE capability
docker run --cap-add SYS_NICE ...
  1. Use privileged mode with --privileged flag
docker run --privileged ...

NOTE: --privileged flag grants more permission than necessary!

The more limited --cap-add SYS_NICE option is much safer.

OS System Configuration

You may also have to enable real-time scheduling in your sysctl. If you are running as the root user (default for Docker container):

sysctl -w kernel.sched_rt_runtime_us=-1

To make that permanent (update your image):

echo 'kernel.sched_rt_runtime_us=-1' >> /etc/sysctl.conf

https://docs.docker.com/engine/reference/run/#runtime-privilege-and-linux-capabilities

Syce answered 27/12, 2017 at 23:21 Comment(0)
R
0

The solution given by Guy de Carufel did not work for me. What I had to do (clearly after having compiled my kernel with support for control groups) was to kill the Docker daemon first with the following commands:

$ sudo systemctl stop docker
$ sudo systemctl stop docker.socket

Then I could re-open the daemon assigning its control group a large time slice (such as 950000):

$ sudo dockerd --cpu-rt-runtime=950000

These changes to the Docker daemon can be made permanent by configuring it as described here and here.

Then finally I could launch my container with the real-time scheduler as follows:

$ sudo docker run -it --cpu-rt-runtime=950000 --ulimit rtprio=99 ubuntu:20.04

In a Docker-Compose file you can achieve this with the following settings (as pointed out in the following documentation: 1, 2, 3):

cpu_rt_runtime: 950000
ulimits:
  rtprio: 99

Launching the container additionally as privileged and net=host helps reduce overhead as discussed here and in this post.

The allocated real-time runtime cpu.rt_runtime_us for each control group can be inspected in the /sys/fs/cgroup/cpu,cpuacct folder. In case you have already allocated a large portion of real-time runtime to another cgroup this might result in the error message failed to write 95000 to cpu.rt_runtime_us: write /sys/fs/cgroup/cpu,cpuacct/system.slice/.../cpu.rt_runtime_us: invalid argument or similar (see here and here). For more details on control groups in general see the corresponding official documentation (4, 5).


For real-time processes from inside a Docker I found the alternative to control groups, the PREEMPT_RT patch, way more useful: You can install it easily from a Debian package and it is sufficient to run the Docker then with the privileged option in order to set real-time priorities to processes from inside it. The advantage is mainly a significantly lower maximum latency compared to control groups. I have discussed this in more details in this post and created a Github repository with guides and scripts that help with the installation of PREEMPT_RT.

Reduplication answered 5/1, 2022 at 18:35 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.