Is it possible to change which core timer interrupts happen on?
Asked Answered
S

2

5

On my Debian 8 system, when I run the command watch -n0.1 --no-title cat /proc/interrupts, I get the output below.

           CPU0       CPU1       CPU2       CPU3       CPU4       CPU5       CPU6       CPU7                                                                                                                                                                                       [0/1808]
  0:         46          0          0      10215          0          0          0          0   IO-APIC-edge      timer
  1:          1          0          0          2          0          0          0          0   IO-APIC-edge      i8042
  8:          0          0          0          1          0          0          0          0   IO-APIC-edge      rtc0
  9:          0          0          0          0          0          0          0          0   IO-APIC-fasteoi   acpi
 12:          0          0          0          4          0          0          0          0   IO-APIC-edge      i8042
 18:          0          0          0          0          8          0          0          0   IO-APIC-fasteoi   i801_smbus
 19:       7337          0          0          0          0          0          0          0   IO-APIC-fasteoi   ata_piix, ata_piix
 21:          0         66          0          0          0          0          0          0   IO-APIC-fasteoi   ehci_hcd:usb1
 23:          0          0         35          0          0          0          0          0   IO-APIC-fasteoi   ehci_hcd:usb2
 40:     208677          0          0          0          0          0          0          0  HPET_MSI-edge      hpet2
 41:          0       4501          0          0          0          0          0          0  HPET_MSI-edge      hpet3
 42:          0          0       2883          0          0          0          0          0  HPET_MSI-edge      hpet4
 43:          0          0          0       1224          0          0          0          0  HPET_MSI-edge      hpet5
 44:          0          0          0          0       1029          0          0          0  HPET_MSI-edge      hpet6
 45:          0          0          0          0          0          0          0          0   PCI-MSI-edge      aerdrv, PCIe PME
 46:          0          0          0          0          0          0          0          0   PCI-MSI-edge      PCIe PME
 47:          0          0          0          0          0          0          0          0   PCI-MSI-edge      PCIe PME
 48:          0          0          0          0          0          0          0          0   PCI-MSI-edge      PCIe PME
 49:          0          0          0          0          0       8570          0          0   PCI-MSI-edge      eth0-rx-0
 50:          0          0          0          0          0          0       1684          0   PCI-MSI-edge      eth0-tx-0
 51:          0          0          0          0          0          0          0          2   PCI-MSI-edge      eth0
NMI:          8          2          2          2          1          2          1         49   Non-maskable interrupts
LOC:         36         31         29         26         21       7611        886       1390   Local timer interrupts
SPU:          0          0          0          0          0          0          0          0   Spurious interrupts
PMI:          8          2          2          2          1          2          1         49   Performance monitoring interrupts
IWI:          0          0          0          1          1          0          1          0   IRQ work interrupts
RTR:          7          0          0          0          0          0          0          0   APIC ICR read retries
RES:        473       1027       1530        739       1532       3567       1529       1811   Rescheduling interrupts
CAL:        846       1012       1122       1047        984       1008       1064       1145   Function call interrupts
TLB:          2          7          5          3         12         15         10          6   TLB shootdowns
TRM:          0          0          0          0          0          0          0          0   Thermal event interrupts
THR:          0          0          0          0          0          0          0          0   Threshold APIC interrupts
MCE:          0          0          0          0          0          0          0          0   Machine check exceptions
MCP:          4          4          4          4          4          4          4          4   Machine check polls
THR:          0          0          0          0          0          0          0          0   Hypervisor callback interrupts
ERR:          0
MIS:          0

Observe that the timer interrupt is firing mostly on CPU3.

Is it possible to move the timer interrupt to CPU0?

Sword answered 2/8, 2017 at 22:50 Comment(4)
Why do you want to do this?Delinda
To reduce interference on core 3.Sword
This sounds like an X-Y problem. What are you really trying to achieve?Delinda
I'm trying to fully allocate a core to a latency sensitive application and minimize other activities.Sword
B
6

The name of the concept is IRQ SMP affinity.

It's possible to set the smp_affinity of an IRQ by setting the affinity mask in /proc/irq/<IRQ_NUMBER>/smp_affinity or the affinity list in /proc/irq/<IRQ_NUMBER>/smp_affinity_list.
The affinity mask is a bit field where each bit represents a core, the IRQ is allowed to be served on the cores corresponding to bits set.

The command

echo 1 > /proc/irq/0/smp_affinity

executed as root should pin the IRQ0 to CPU0.
The conditional is mandatory as setting the affinity for an IRQ is subject to a set of prerequisites, the list includes: an interrupt controller that supports a redirection table (like the IO-APIC), the affinity mask must contains at least one active CPUs, the IRQ affinity must not be managed by the kernel and the feature must be enabled.

In my virtualised Debian 8 system I was unable to set the affinity of the IRQ0, failing with an EIO error.
I was also unable to track down the exact reason.
If you are willing to dive into the Linux source code, you can start from write_irq_affinity in proc.c

Buckler answered 7/8, 2017 at 10:31 Comment(0)
B
0

Use isolcpus. It may not reduce your timer interrupts to 0, but on our servers they are greatly reduced.

If you use isolcpus, then the kernel will not affine interrupts to your CPUs that it might otherwise do. For example, we have systems with 12 core dual CPUs. We noticed NVME interrupts on our CPU1 (the second CPU), even with the CPUs isolated via tuned and its cpu-partitioning scheme. nvme drives on our Dell systems are connected to the PCIe lanes on CPU1, hence the interrupts on those cores.

As per my ticket with Red Hat (and Margaret Bloom, who wrote an excellent answer here), if you don't want the interrupts to be affined to your CPUs, you need to use isolcpus on the kernel boot line. And lo and behold, I tried it and our interrupts went to 0 for the NVME drives on all isolated CPU cores.

I have not attempted to isolate ALL cores on CPU1; I don't know if they'll simply be affined to CPU0 or what.

And, in a short summary: any interrupt in /proc/interrupts with "MSI" in the name, is managed by the kernel.

Boff answered 3/9, 2021 at 17:2 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.