What does "cpu_time" represent exactly in libvirt?
Asked Answered
H

3

23

I can pull the following CPU values from libvirt:

virsh domstats vm1 --cpu-total
Domain: 'vm1'
  cpu.time=6173016809079111
  cpu.user=26714880000000
  cpu.system=248540680000000

virsh cpu-stats vm1 --total
Total:
    cpu_time       6173017.263233824 seconds
    user_time        26714.890000000 seconds
    system_time     248540.700000000 seconds

What does the cpu_time figure represent here exactly?

I'm looking to calculate CPU utilization as a percentage using this data.

Thanks

Huai answered 7/11, 2016 at 15:5 Comment(0)
H
40

This was a surprisingly difficult question to answer! After pouring over the kernel code for a good while I've figured out what's going on here and its quite nice to learn what's going on.

Normally for a process on Linux, the overall CPU usage is simply the sum of the time spent in userspace and the time spent on kernel space. So naively one would have expected user_time + system_time to equal cpu_time. What I've discovered is that Linux tracks time spent by vCPU threads executing guest code separately from either userspace or kernelspace time.

Thus cpu_time == user_time + system_time + guest_time

So you can think of system_time + user_time as giving the overhead of QEMU / KVM on the host side. And cpu_time - (user_time + system_time) as giving the actual amount of time the guest OS was running its CPUs.

To calculate CPU usage, you probably just want to record cpu_time every N seconds and calculate the delta between two samples. eg usage % = 100 * (cpu_time 2 - cpu_time 1) / N

Hierology answered 6/1, 2017 at 15:0 Comment(2)
Fanastic answer in my opinion!Cygnet
If anyone ends up here looking at the CPU time given from the virNodeGetCPUStats function in libvirt, make sure you divide by nanoseconds since the time there is in nanoseconds. :)Dwinnell
F
1

As per master pulled 2018-07-10 from https://github.com/libvirt/libvirt/ and as far as QEMU/KVM is concerned, it comes down to:

  • cpu.time = cpuacct.usage cgroup metric
  • cpu.{user,system} = cpuacct.stat cgroup metrics

Problem one may encounter is guest load = time load - system load - user load sometime leads to negative values (?!?), example given for a running QEMU/KVM guest (values are seconds), with Debian 9 stock kernel (4.9):

time                   system    user     total
2018-07-10T13:19:20Z 62308.67 9278.59 107968.33
2018-07-10T13:20:20Z 62316.08 9279.73 107970.73
               delta     7.41    1.14      2.40 (2.40 < 7.41+1.14 ?!?)

Kernel bug ? (at least one person experiments something similar: https://lkml.org/lkml/2017/11/1/101)
One thing is certain: cpuacct.usage and cpuacct.stat do use a different logic to gather their metrics; this might explain the discrepancy (?).

Fianna answered 10/7, 2018 at 15:49 Comment(0)
D
1

Unfortunately, the above answers are NOT correct in the CPUACCT controller for a KVM guest:

cpu_time == user_time + system_time + guest_time (<-- wrong)

If you run a CPU-intensive benchmark compared to an I/O or network-intensive benchmark in the VM, you'll see that "guest time" does not match up in the formula.

Guest time (according to /proc/< pid >/stat) represents ONLY the time used by the VCPUs to run the guest virtual machine (While not exiting or handling I/O).

The CPUACCT controller's top-level parent directory for each KVM/libvirt guest includes both the time spent on the "emulator" and "vcpuX" sub-directories in their totality, including vhost kernel threads and non-VCPU pthreads running inside the QEMU main process, not just guest time or user/system time.

That makes the above formula wrong. The correct formula would be:

guest_time = sum(vcpuX)=>cpu.time - sum(vcpuX)=>(for each child: cpuacct.stat=>user + cpuacct.stat=>system)

You cannot simply use the top-level parent files to calculate guest time. That would be totally inaccurate under any I/O bound workload.

Dragline answered 4/11, 2020 at 17:29 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.