Real time Linux: disable local timer interrupts
Asked Answered
P

1

12

TL;DR : Using Linux kernel real time with NO_HZ_FULL I need to isolate a process in order to have deterministic results but /proc/interrupts tell me there is still local timer interrupts (among other). How to disable it?

Long version :

I want to make sure my program is not being interrupt so I try to use a real time Linux kernel. I'm using the real time version of arch Linux (linux-rt on AUR) and I modified the configuration of the kernel to selection the following options :

CONFIG_NO_HZ_FULL=y
CONFIG_NO_HZ_FULL_ALL=y
CONFIG_RCU_NOCB_CPU=y
CONFIG_RCU_NOCB_CPU_ALL=y

then I reboot my computer to boot on this real time kernel with the folowing options:

nmi_watchdog=0
rcu_nocbs=1
nohz_full=1
isolcpus=1

I also disable the following option in the BIOS :

C state
intel speed step
turbo mode
VTx
VTd
hyperthreading

My CPU (i7-6700 3.40GHz) has 4 cores (8 logical CPU with hyperthreading technology) I can see CPU0, CPU1, CPU2, CPU3 in /proc/interrupts file.

CPU1 is isolated by isolcpus kernel parameter and I want to disable the local timer interrupts on this CPU. I though real-time kernel with CONFIG_NO_HZ_FULL and CPU isolation (isolcpus) was enough to do it and I try to check by running theses command :

cat /proc/interrupts | grep LOC > ~/tmp/log/overload_cpu1
taskset -c 1 ./overload
cat /proc/interrupts | grep LOC >> ~/tmp/log/overload_cpu1

where the overload process is:

***overload.c:***
int main()
{
  for(int i=0;i<100;++i)
    for(int j=0;j<100000000;++j);
}

The file overload_cpu1 contains the result:

LOC:     234328        488      12091      11299   Local timer interrupts
LOC:     239072        651      12215      11323   Local timer interrupts

meanings 651-488 = 163 interrupts from local timer and not 0...

For comparison I do the same experiment but I change the core where my process overload run (I keep watching interrupts on CPU1):

taskset -c 0 :   8 interrupts
taskset -c 1 : 163 interrupts
taskset -c 2 :   7 interrupts
taskset -c 3 :   8 interrupts

One of my question is why there is no 0 interrupts ? why the number of interrupts is bigger when my process run on CPU1 ? (I mean I though NO_HZ_FULL will prevent interrupt if my process was alone : "The CONFIG_NO_HZ_FULL=y Kconfig option causes the kernel to avoid sending scheduling-clock interrupts to CPUs with a single runnable task"(https://www.kernel.org/doc/Documentation/timers/NO_HZ.txt)

Maybe an explaination is there is other process running on CPU1. I checked by using ps command :

CLS CPUID RTPRIO PRI  NI CMD                           PID
TS      1      -  19   0 [cpuhp/1]                      18
FF      1     99 139   - [migration/1]                  20
TS      1      -  19   0 [rcuc/1]                       21
FF      1      1  41   - [ktimersoftd/1]                22
TS      1      -  19   0 [ksoftirqd/1]                  23
TS      1      -  19   0 [kworker/1:0]                  24
TS      1      -  39 -20 [kworker/1:0H]                 25
FF      1      1  41   - [posixcputmr/1]                28
TS      1      -  19   0 [kworker/1:1]                 247
TS      1      -  39 -20 [kworker/1:1H]                501

As you can see, there is threads on the CPU1. Is that possible to disable these processes ? I guess it is because if it is not the case, NO_HZ_FULL will never work right ?

Tasks with class TS doesn't disturb me because they didn't have priority among SCHED_FIFO and I can set this policy to my program. Same things for tasks with class FF and priority less than 99.

However, you can see migration/1 that is in SCHED_FIFO and priority 99. Maybe these process can causes interrupts when they run . This explain the few interrupts when my process in on CPU0, CPU2 and CPU3 (respectively 8,7 and 8 interrupts) but it also mean these processes are not running very often and then doesn't explain why there is many interrupts when my process run on CPU1 (163 interrupts).

I also do the same experiment but with the SCHED_FIFO of my overload process and I get:

taskset -c 0 : 1
taskset -c 1 : 4063
taskset -c 2 : 1
taskset -c 3 : 0

In this configuration there is more interrupts in the case my process use SCHED_FIFO policy on CPU1 and less on other CPU. do you know why ?

Pelligrini answered 6/9, 2017 at 14:15 Comment(4)
I've come across the issue before in the past. If I recall, the problem stems from inter-core communication within to the Intel processors I was using -- the interrupt functionality is built-in to the hardware itself and not configurable from software. It didn't matter what configuration I set for Linux, the issue was not a kernel scheduling problem, but rather inherent in my hardware; I could run FreeDOS and still see the SMI interrupts. Ultimately we decided to utilize an FPGA to achieve our deterministic requirements -- though understandably that may not suit your needs.Raggletaggle
yes your answer doesn't solve my problem but it is still good to know. Thank you for that. SMI are indeed non maskable interrupts so I guess they are include in "NMI" line of /proc/interrupts and I'm lucky because I haven't so much of theses interrupts. However Local timer interrupts should not be in NMI. I also experiment some strange results on timing. I need to continue my tests and later I will probably create a new question because I think it is not really related to local timers interrupts.Pelligrini
AFAIK, while SMI interrupts are non maskable, NMI and SMI are two separate things. I think the SMI interrupt is completely transparent to the OS and not in any of these counters.Ferreous
Did you config your RT throttling( /proc/sys/kernel/sched_rt_period_us), which prevents your RT tasks from completely occupying the core and staving other non-RT tasks. Theoretically, no LOC would occur when an RT task with 99 priority is running and doesn't give up the core voluntarily.Oeillade
B
6

The thing is that a full-tickless CPU (a.k.a. adaptive-ticks, configured with nohz_full=) still receives some ticks.

Most notably the scheduler requires a timer on an isolated full tickless CPU for updating some state every second or so.

This is a documented limitation (as of 2019):

Some process-handling operations still require the occasional scheduling-clock tick. These operations include calculating CPU load, maintaining sched average, computing CFS entity vruntime, computing avenrun, and carrying out load balancing. They are currently accommodated by scheduling-clock tick every second or so. On-going work will eliminate the need even for these infrequent scheduling-clock ticks.

(source: Documentation/timers/NO_HZ.txt, cf. the LWN article (Nearly) full tickless operation in 3.10 from 2013 for some background)

A more accurate method to measure the local timer interrupts (LOC row in /proc/interrupts) is to use perf. For example:

$ perf stat -a -A -e irq_vectors:local_timer_entry ./my_binary

Where my_binary has threads pinned to the isolated CPUs that non-stop utilize the CPU without invoking syscalls - for - say 2 minutes.

There are other sources of additional local timer ticks (when there is just 1 runnable task).

For example, the collection of VM stats - by default they are collected each seconds. Thus, I can decrease my LOC interrupts by setting a higher value, e.g.:

# sysctl vm.stat_interval=60

Another source are periodic checks if the TSC on the different CPUs doesn't drift - you can disable those with the following kernel option:

tsc=reliable

(Only apply this option if you really know that your TSCs don't drift.)

You might find other sources by recording traces with ftrace (while your test binary is running).

Since it came up in the comments: Yes, the SMI is fully transparent to the kernel. It doesn't show up as NMI. You can only detect an SMI indirectly.

Bridwell answered 15/9, 2019 at 10:6 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.