Hardware cache events and perf
Asked Answered
S

1

10

When I run perf list I see a bunch of Hardware Cache Events, as follows:

$ perf list | grep 'cache event'
  L1-dcache-load-misses                              [Hardware cache event]
  L1-dcache-loads                                    [Hardware cache event]
  L1-dcache-stores                                   [Hardware cache event]
  L1-icache-load-misses                              [Hardware cache event]
  LLC-load-misses                                    [Hardware cache event]
  LLC-loads                                          [Hardware cache event]
  LLC-store-misses                                   [Hardware cache event]
  LLC-stores                                         [Hardware cache event]
  branch-load-misses                                 [Hardware cache event]
  branch-loads                                       [Hardware cache event]
  dTLB-load-misses                                   [Hardware cache event]
  dTLB-loads                                         [Hardware cache event]
  dTLB-store-misses                                  [Hardware cache event]
  dTLB-stores                                        [Hardware cache event]
  iTLB-load-misses                                   [Hardware cache event]
  iTLB-loads                                         [Hardware cache event]
  node-load-misses                                   [Hardware cache event]
  node-loads                                         [Hardware cache event]
  node-store-misses                                  [Hardware cache event]
  node-stores                                        [Hardware cache event]

These events mostly seem to return reasonable values based on tests, but I would like to know how to determine to map these events to hardware performance counter events on my system?

That is, these events are certainly implemented using one or more underlying x86 PMU counters on my Skylake CPU - but how do I know which ones?

You can look in /sys/devices/cpu/events for other hardware events, but not for "Hardware cache events".

Shields answered 4/9, 2018 at 16:58 Comment(4)
Does this help?Elbrus
@MargaretBloom, yes for someone with enough motivation for reading source :). I was trying not to make it an "answer your own question" type question but I guess it might be...Shields
If you are patient I'll try to put an answer together as soon as I have some free time :)Elbrus
@MargaretBloom Today I guess I am not, it seems the pointer to the right file was enough to get me started, as I had already written an answer by the time you made your offer! Of course, better answers may be possible. I suppose you might have some insight on this related question.Shields
S
8

User @Margaret points towards a reasonable answer in the comments - read the kernel source to see the mapping for the PMU events.

We can check arch/x86/events/intel/core.c for the event definitions. I don't actually know if "core" here refers to the Core architecture, of just that this is the core fine with most definitions - but in any case it's the file you want to look at.

The key part is this section, which defines skl_hw_cache_event_ids:

static __initconst const u64 skl_hw_cache_event_ids
                [PERF_COUNT_HW_CACHE_MAX]
                [PERF_COUNT_HW_CACHE_OP_MAX]
                [PERF_COUNT_HW_CACHE_RESULT_MAX] =
{
 [ C(L1D ) ] = {
    [ C(OP_READ) ] = {
        [ C(RESULT_ACCESS) ] = 0x81d0,  /* MEM_INST_RETIRED.ALL_LOADS */
        [ C(RESULT_MISS)   ] = 0x151,   /* L1D.REPLACEMENT */
    },
    [ C(OP_WRITE) ] = {
        [ C(RESULT_ACCESS) ] = 0x82d0,  /* MEM_INST_RETIRED.ALL_STORES */
        [ C(RESULT_MISS)   ] = 0x0,
    },
    [ C(OP_PREFETCH) ] = {
        [ C(RESULT_ACCESS) ] = 0x0,
        [ C(RESULT_MISS)   ] = 0x0,
    },
},
...

Decoding the nested initializers, you get that the L1D-dcahe-load corresponds to MEM_INST_RETIRED.ALL_LOAD and L1-dcache-load-misses to L1D.REPLACEMENT.

We can double check this with perf:

$ ocperf stat -e mem_inst_retired.all_loads,L1-dcache-loads,l1d.replacement,L1-dcache-load-misses,L1-dcache-loads,mem_load_retired.l1_hit head -c100M /dev/zero > /dev/null

 Performance counter stats for 'head -c100M /dev/zero':

        11,587,793      mem_inst_retired_all_loads                                   
        11,587,793      L1-dcache-loads                                             
            20,233      l1d_replacement                                             
            20,233      L1-dcache-load-misses     #    0.17% of all L1-dcache hits  
        11,587,793      L1-dcache-loads                                             
        11,495,053      mem_load_retired_l1_hit                                     

       0.024322360 seconds time elapsed

The "Hardware Cache" events show exactly the same values as using the underlying PMU events we guessed at by checking the source.

Shields answered 4/9, 2018 at 19:39 Comment(4)
Great answer thank you! What's going on though with the events that have 0x0 value such as L1-dcache-write-misses. Also what about the node-read-misses, node-read-accesses, node-write-misses, node-write-accesses which have all the same value ?Greenway
@OrionPapadakis - good question, I am not sure. Maybe worth another question. Since this question was posted the way perf exposes some events has gotten better too. Another way to look stuff up is event-rmap from pmu-tools.Shields
I'm not sure how to identify which intel processor maps to which of the *_hw_cache_event_ids. Is there a reference for mapping e.g. "Skylake" -> skl_ somewhere?Rooky
AMD seems somewhat easier, as there's only two, and if the CPU family is >= 0x17, then it uses the second. Some care must be taken with /proc/cpuinfo and lscpu, though, since it appears those print the family in decimal, and I believe my AMD processor that has CPU family: 23 would use the second table, since 17h is 23d.Rooky

© 2022 - 2024 — McMap. All rights reserved.