perf power consumption measure: How does it work?
Asked Answered
H

2

12

I noticed that perf list now has the option to measure power consumption. You can use it as follows:

$ perf stat -e power/energy-cores/ ./a.out 
Performance counter stats for 'system wide':

              8.55 Joules power/energy-cores/

       0.949871058 seconds time elapsed

How accurate is this measurement, and how does perf estimate the power consumption?

Hass answered 2/5, 2019 at 16:3 Comment(0)
B
13

The power/energy-cores/ perf counter is based on an MSR register called MSR_PP0_ENERGY_STATUS, which is part of the Intel RAPL interface (Intel seems to call each individual RAPL MSR a RAPL interface). A complicated model based on system activity events is used to estimate (static and dynamic) energy consumption. The MSR register name has PP0 in it, which refers to power plane 0, which is one of the RAPL domains that contains all the cores of the socket including the private caches of the cores. PP0, however, excludes the last-level cache, the interconnect, the memory controller, the graphics processor, and everything else that is in the uncore. It's impossible to measure the accuracy of MSR_PP0_ENERGY_STATUS because there is no other way to estimate the energy consumption of power plane 0 only.

It's possible to measure the accuracy of other RAPL domains though. These include the Package, DRAM, and PSys domains. For example, the accuracy of the Package domain energy estimation can be measured by comparing against the energy consumption of the whole system (which can be measured using a power meter) and running a workload that keeps the energy consumption of everything outside the package a known constant as much as possible. The accuracy of MSR_PKG_ENERGY_STATUS and MSR_DRAM_ENERGY_STATUS have been measured in different ways by different people on many different processors. You can refer to the recent paper entitled RAPL in Action: Experiences in Using RAPL for Power Measurements for more information, which also includes summaries of previous works. The paper covers Sandy Bridge, Ivy Bridge, Haswell, and Skylake. The conclusion is that MSR_PKG_ENERGY_STATUS and MSR_DRAM_ENERGY_STATUS appear to be accurate on Haswell and Skylake (the implementation has changed on Haswell, see : An Energy Efficiency Feature Survey of the Intel Haswell Processor). But this is not necessarily true on all kinds of workloads, P states, and processors. So the accuracy does not just depend on the microarchitecture.

The RAPL interface is discussed in Section 14.9 of the Intel Manual Volume 3. I noticed there are errors in the section. For example, it says client processors don't support the DRAM domain, which is not true. The client Haswell processor I'm using to write this answer supports the DRAM domain. The section is probably outdated and applies only Sandy Bridge and Ivy Bridge processors. I think it's better to read the datasheet of the processor on which you want to use RAPL.

The power/energy-pkg/ perf counter can be used to measure energy consumption of the package domain. This is the only domain that is known be supported on all Intel processors starting from Sandy Bridge.

Byrann answered 2/5, 2019 at 22:42 Comment(0)
B
11

On x86 systems, these values are based on RAPL (Running Average Power Limit) - an interface that provides built in CPU energy counters. While originally designed by Intel, AMD also provides a compatible interface on Zen systems.

The accuracy depends on the actual microarchitecture. Originally, RAPL was backed by a model with certain biases. On Intel CPUs since the Haswell architecture, it is based on measurements which are quite accurate. As far as I know there is no good understanding of the accuracy on AMD's Zen RAPL implementation.

One important thing you have to consider is the scope of the measurements. On most systems, only package and DRAM is covered1. So if you need to know how much power / energy your entire system consumes - you usually cannot easily answer that with RAPL.

Also note that RAPL is updated every 1 ms, so short workloads will have significant errors from the update rate.

1 - Skylake Desktop systems can implement a full-system RAPL. It's accuracy depends on the manufacturer.

Batsheva answered 2/5, 2019 at 21:23 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.