TL;DR : Using Linux kernel real time with NO_HZ_FULL I need to isolate a process in order to have deterministic results but /proc/interrupts tell me there is still local timer interrupts (among other). How to disable it?
Long version :
I want to make sure my program is not being interrupt so I try to use a real time Linux kernel. I'm using the real time version of arch Linux (linux-rt on AUR) and I modified the configuration of the kernel to selection the following options :
CONFIG_NO_HZ_FULL=y
CONFIG_NO_HZ_FULL_ALL=y
CONFIG_RCU_NOCB_CPU=y
CONFIG_RCU_NOCB_CPU_ALL=y
then I reboot my computer to boot on this real time kernel with the folowing options:
nmi_watchdog=0
rcu_nocbs=1
nohz_full=1
isolcpus=1
I also disable the following option in the BIOS :
C state
intel speed step
turbo mode
VTx
VTd
hyperthreading
My CPU (i7-6700 3.40GHz) has 4 cores (8 logical CPU with hyperthreading technology) I can see CPU0, CPU1, CPU2, CPU3 in /proc/interrupts file.
CPU1 is isolated by isolcpus
kernel parameter and I want to disable the local timer interrupts on this CPU.
I though real-time kernel with CONFIG_NO_HZ_FULL and CPU isolation (isolcpus) was enough to do it and I try to check by running theses command :
cat /proc/interrupts | grep LOC > ~/tmp/log/overload_cpu1
taskset -c 1 ./overload
cat /proc/interrupts | grep LOC >> ~/tmp/log/overload_cpu1
where the overload process is:
***overload.c:***
int main()
{
for(int i=0;i<100;++i)
for(int j=0;j<100000000;++j);
}
The file overload_cpu1
contains the result:
LOC: 234328 488 12091 11299 Local timer interrupts
LOC: 239072 651 12215 11323 Local timer interrupts
meanings 651-488 = 163 interrupts from local timer and not 0...
For comparison I do the same experiment but I change the core where my process overload
run (I keep watching interrupts on CPU1):
taskset -c 0 : 8 interrupts
taskset -c 1 : 163 interrupts
taskset -c 2 : 7 interrupts
taskset -c 3 : 8 interrupts
One of my question is why there is no 0 interrupts ? why the number of interrupts is bigger when my process run on CPU1 ? (I mean I though NO_HZ_FULL will prevent interrupt if my process was alone : "The CONFIG_NO_HZ_FULL=y Kconfig option causes the kernel to avoid sending scheduling-clock interrupts to CPUs with a single runnable task"(https://www.kernel.org/doc/Documentation/timers/NO_HZ.txt)
Maybe an explaination is there is other process running on CPU1. I checked by using ps command :
CLS CPUID RTPRIO PRI NI CMD PID
TS 1 - 19 0 [cpuhp/1] 18
FF 1 99 139 - [migration/1] 20
TS 1 - 19 0 [rcuc/1] 21
FF 1 1 41 - [ktimersoftd/1] 22
TS 1 - 19 0 [ksoftirqd/1] 23
TS 1 - 19 0 [kworker/1:0] 24
TS 1 - 39 -20 [kworker/1:0H] 25
FF 1 1 41 - [posixcputmr/1] 28
TS 1 - 19 0 [kworker/1:1] 247
TS 1 - 39 -20 [kworker/1:1H] 501
As you can see, there is threads on the CPU1. Is that possible to disable these processes ? I guess it is because if it is not the case, NO_HZ_FULL will never work right ?
Tasks with class TS doesn't disturb me because they didn't have priority among SCHED_FIFO and I can set this policy to my program. Same things for tasks with class FF and priority less than 99.
However, you can see migration/1 that is in SCHED_FIFO and priority 99. Maybe these process can causes interrupts when they run . This explain the few interrupts when my process in on CPU0, CPU2 and CPU3 (respectively 8,7 and 8 interrupts) but it also mean these processes are not running very often and then doesn't explain why there is many interrupts when my process run on CPU1 (163 interrupts).
I also do the same experiment but with the SCHED_FIFO of my overload process and I get:
taskset -c 0 : 1
taskset -c 1 : 4063
taskset -c 2 : 1
taskset -c 3 : 0
In this configuration there is more interrupts in the case my process use SCHED_FIFO policy on CPU1 and less on other CPU. do you know why ?
RT throttling
( /proc/sys/kernel/sched_rt_period_us), which prevents your RT tasks from completely occupying the core and staving other non-RT tasks. Theoretically, no LOC would occur when an RT task with 99 priority is running and doesn't give up the core voluntarily. – Oeillade