I am also working on this issue and I haven't figured out a way to prevent those kernel threads from being scheduled on the isolated CPU set.
From the documentation of RedHat, it also doesn't seem to be feasible.
Isolating CPUs
You can isolate one or more CPUs from the scheduler with the isolcpus boot parameter. This prevents the scheduler from scheduling any user-space threads on this CPU.
I have been using a combination of isolcpus
and cset shield
in order to prevent the majority of kernel's housekeeping threads being scheduled in my isolated CPUs.
I have used perf sched
in order to record the context switches on my CPUs and perf map
in order to visualize them.
In the first experiment, having used only cset shield
.
$ grep -e '=>' exp_1.sch
*A0 445210.783227 secs A0 => kworker/11:1-ev:165
*. 445210.783275 secs . => swapper:0
*B0 445210.783304 secs B0 => kworker/u24:4-e:130904
*C0 445210.783420 secs C0 => WORKER2:160974
. *D0 C0 445210.783844 secs D0 => kworker/10:0-ev:1672
*E0 . C0 445210.784703 secs E0 => WORKER0:160969
*F0 . C0 445210.789628 secs F0 => kworker/9:1-eve:163
E0 *G0 . 445210.802886 secs G0 => WORKER1:160973
E0 *H0 . 445210.811638 secs H0 => ksoftirqd/10:76
E0 *I0 . 445210.939469 secs I0 => kworker/u24:2-e:158157
*J0 G0 . 445211.527639 secs J0 => ksoftirqd/9:70
E0 G0 *K0 445212.087622 secs K0 => ksoftirqd/11:82
E0 *L0 . 445212.347277 secs L0 => kworker/10:1H-k:277
*M0 I0 C0 445213.321971 secs M0 => kworker/u24:1-e:160121
E0 *N0 . 445214.463593 secs N0 => migration/10:75
*O0 N0 . 445214.463597 secs O0 => migration/9:69
O0 N0 *P0 445214.463598 secs P0 => migration/11:81
*Q0 G0 M0 445225.372366 secs Q0 => kworker/9:1H-kb:330
Here you may see my workload threads (WORKER{0,1,2}
), the kworker threads (kworker/{9,10,11}:
) corresponding to CPUs [9-11], and the rest ksoftirqd/{9,10,11}:
, migration/{9,10,11}:
, kworker/u24
and the "idle" thread swapper
.
In the second experiment, I used cset shield
with isolcpus
.
$ grep -e '=>' exp_2.sch
*A0 1033.342241 secs A0 => WORKER0:3646
A0 *B0 1033.342675 secs B0 => kworker/11:1-ev:165
A0 *. 1033.342694 secs . => swapper:0
A0 *C0 . 1033.343470 secs C0 => WORKER1:3647
A0 C0 *D0 1033.344634 secs D0 => WORKER2:3648
A0 *E0 D0 1033.346306 secs E0 => kworker/10:1-ev:164
*F0 . D0 1033.364736 secs F0 => kworker/9:1-eve:163
A0 *G0 . 1036.433541 secs G0 => migration/10:75
*H0 G0 . 1036.433541 secs H0 => migration/9:69
A0 G0 *I0 1036.433548 secs I0 => migration/11:81
In this case, you see only the WORKER{0,1,2}
, kworker/{9,10,11}
, migration/{9,10,11}
and the swapper
tasks.
run_on
every core since kernel code on the isolated CPU could still have been reading an RCU kernel variable. So this might be the best you can do. – Schonthal