Understanding Scheduling and /proc/[pid]/sched
Asked Answered
S

0

2

I have a single threaded application that is (spinning and) pinned to a core with taskset and CPU isolation (i.e. isolcpus=12-23 nohz_full=21,22,23 rcu_nocbs=12-23; I have 2 CPUs, and 12 core each -- Skylake):

exec taskset -c 22 setuidgid myuser envdir ./env -c = /opt/bin/UtilityServer > /tmp/logs/utility-server.log

For some reason, after running few hours, I checked the stats:

UtilityServer (1934, #threads: 1)
-------------------------------------------------------------------
se.exec_start                                :      78998944.120048
se.vruntime                                  :      78337609.962134
se.sum_exec_runtime                          :      78337613.040860
se.nr_migrations                             :                    6
nr_switches                                  :                   41
nr_voluntary_switches                        :                   31
nr_involuntary_switches                      :                   10
se.load.weight                               :                 1024
policy                                       :                    0
prio                                         :                  120
clock-delta                                  :                   13
mm->numa_scan_seq                            :                  925
numa_migrations, 0
numa_faults_memory, 0, 0, 0, 0, 1
numa_faults_memory, 1, 0, 0, 0, 1
numa_faults_memory, 0, 1, 1, 1, 0
numa_faults_memory, 1, 1, 0, 1, 9

Questions:

  1. How can I stop the switching and migration (i.e. se.nr_migrations, nr_switches, nr_voluntary_switches, nr_involuntary_switches are zero)? Given that my application really wants to use the whole core.
  2. Why did the kernel try to migrate? Given that I have already isolated the core and only assigned one single threaded application there.
  3. nr_voluntary_switches tracked the # of times that my application voluntarily gave up the core? If yes, under what situation that my application would give up the core? My application did do some non blocking disk I/O (i.e. fwrite_unlocked(), etc) but zero networking stuffs.
  4. Under what situation my application would be forced to switch? I see that nr_involuntary_switches = 10, it means my application was forced to switch 10 times?
  5. What do the numbers after numa_faults_memory entries mean?

I am on 3.10.0-862.2.3.el7.x86_64 if it matters.

Thanks!

Spoofery answered 10/10, 2018 at 9:24 Comment(3)
Sorry, there are too many questions and no code ;) Have a look at the fwrite_unlock(). Usually, it ends with a write(2) syscall and this is the source of context switches... For more info look into the kernel __schedule(). That is the place where the context switch counters get increased.Examinee
Haha, I just tried to ask all questions at once instead of adding one by one later; otherwise, would upset people trying to answer. The code is a bit long -- few thousand lines. I actually call __fsetlocking() to advise glibc not to lock unless caller asks for. And then I call fprintf() 3 times + fflush() every ~5 sec and vfprintf(), fwrite_unlocked(), and fflush_unlocked() every few hours. I will take a look at the sources of all those functions in glibc. Thanks for the pointers :)Spoofery
Actually, those switches should have nothing to do with all those writes because I have cronjobs to capture at the stats at 9AM, 3:15PM, 9PM, and 2:45AM. Those switch counts didn't increase for few days. But you have inspired me -- context switch due to I/O access. I believe those are related to my application reading the configuration file at startup. So #3 is kind of answered. Thanks!Spoofery

© 2022 - 2024 — McMap. All rights reserved.