how to interpret perf iTLB-loads,iTLB-load-misses
Asked Answered
P

1

15

I have a test case to observe perf iTLB-loads,iTLB-load-misses by

perf stat -e dTLB-loads,dTLB-load-misses,iTLB-loads,iTLB-load-misses -p 22479

and get the output :

Performance counter stats for process id '22479':

     1,262,817      dTLB-loads                                                  
        13,950      dTLB-load-misses          #    1.10% of all dTLB cache hits 
            75      iTLB-loads                                                  
         6,882      iTLB-load-misses          # 9176.00% of all iTLB cache hits 

   3.999720948 seconds time elapsed

I have no idea how to interpret iTLB-loads only 75 but iTLB-load-misses 6,882 ?!

lscpu showes : Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz

Edit :

May I interpret it as the following :

do (75+6882) times of iTLB-loads , there are 75 times hits but 6882 times misses ?

Edit :

ocperf.py list | wc -l
Downloading https://download.01.org/perfmon/mapfile.csv to mapfile.csv

Traceback (most recent call last):
File "/home/marschen/tools/pmu-tools-master/ocperf.py", line 1012, in <module>
emap = find_emap()
File "/home/marschen/tools/pmu-tools-master/ocperf.py", line 831, in find_emap
event_download.download(el, toget)
File "/home/marschen/tools/pmu-tools-master/event_download.py", line 105, in download
getfile(modelpath, dir, "mapfile.csv")
File "/home/marschen/tools/pmu-tools-master/event_download.py", line 86, in getfile
f = urlopen(url)
File "/usr/lib64/python2.7/urllib2.py", line 154, in urlopen
return opener.open(url, data, timeout)
File "/usr/lib64/python2.7/urllib2.py", line 431, in open
response = self._open(req, data)
File "/usr/lib64/python2.7/urllib2.py", line 449, in _open
'_open', req)
File "/usr/lib64/python2.7/urllib2.py", line 409, in _call_chain
result = func(*args)
File "/usr/lib64/python2.7/urllib2.py", line 1258, in https_open
context=self._context, check_hostname=self._check_hostname)
File "/usr/lib64/python2.7/urllib2.py", line 1211, in do_open
h.request(req.get_method(), req.get_selector(), req.data, headers)
File "/usr/lib64/python2.7/httplib.py", line 1017, in request
self._send_request(method, url, body, headers)
File "/usr/lib64/python2.7/httplib.py", line 1051, in _send_request
self.endheaders(body)
File "/usr/lib64/python2.7/httplib.py", line 1013, in endheaders
self._send_output(message_body)
File "/usr/lib64/python2.7/httplib.py", line 864, in _send_output
self.send(msg)
File "/usr/lib64/python2.7/httplib.py", line 826, in send
self.connect()
File "/usr/lib64/python2.7/httplib.py", line 1227, in connect
HTTPConnection.connect(self)
File "/usr/lib64/python2.7/httplib.py", line 807, in connect
self.timeout, self.source_address)
File "/usr/lib64/python2.7/socket.py", line 562, in create_connection
sock.connect(sa)
File "/usr/lib64/python2.7/socket.py", line 224, in meth
return getattr(self._sock,name)(*args)
Pooka answered 20/4, 2018 at 3:8 Comment(19)
That is odd. I tried on Skylake and could repro the behaviour of iTLB misses > iTLB accesses. I'm not sure what actual counter iTLB-loads is mapped to. Skylake doesn't seem to have a counter for iTLB accesses, only for misses (frontend_retired.itlb_miss in ocperf.py). The uop cache is virtually addressed, so fetching uops from the uop cache (DSB) doesn't require TLB accesses if it hits.Chiquitachirico
@Peter , I google several webpages for more information , but still failed to get the coorect way to interpret what I observed for this data .Pooka
We need to figure out what hardware event perf is actually using for iTLB-loads, and find out what it means. I tried using perf --debug verbose=2, but I'm not sure if those numbers are the same event / mask numbers that you can find documentation like oprofile.sourceforge.net/docs/intel-haswell-events.php, or like you can see with ocperf.py stat -e frontend_retired.itlb_missChiquitachirico
@Peter , thanks , ocperf.py pmu-tools won't work for me , I am not familiar with python , some error messages happened in ocperf.py .Pooka
IDK, looks like you installed it wrong, or maybe it's missing a dependency, despite github.com/andikleen/pmu-tools saying it only needs that standard stuff. Your error output doesn't include the actual exception, just the traceback. I haven't updated mine from git for a while.Chiquitachirico
@Peter , I just download pmu-tools from github.com/andikleen/pmu-tools just now , it won't work in my server . pity .Pooka
How old is your server's software? What are you running on it? Maybe you can manually download the required file, or do it on another computer, and put it in the right place. (If it's only the downloading libraries that are a problem; it might be that you'd run into more problems in other functions later.)Chiquitachirico
I just download pmu-tools-master.zip and unzip it , then run ocperf.py with no luck .Pooka
there is a Makefile in pmu-tools-master directory , I did not make it though .Pooka
No, how old is the Linux distro you're using? I forget if make helps. I don't think I had to let it install anything in /usr/local; I just run it from a symlink into the source directory (where I have a git clone of the repo)Chiquitachirico
Linux testhost 3.10.0-693.el7.x86_64 , is it what you means ?!Pooka
Ok, that's your kernel version, so you're on RHEL7 I think. I'm not sure how old their Python is. I think the git repo said only RHEL5 was too old for perf, but IDK how up to data that readme is. Current Linux is 4.15 or so, but RHEL does have kernel patches...Chiquitachirico
sounds like just what you said , my production server is the same kernel version , so maybe I will try other source , thanks for your kind help.Pooka
@PeterCordes I think perf list pmu or perf list --long-desc pmu should print all aliases and the events they are mapped to.Uintathere
@HadiBrais: It doesn't, perf list pmu looks like what you get from ocperf.py list, but without the simple/generic event names like LLC-loads. perf list --details iTLB-loads sounds like it's supposed to be useful from the docs, but it isn't. It just says "[Hardware cache event]" and prints some generic stuff.Chiquitachirico
@PeterCordes According to the source code of perf, the alias names are obtained from the names of the files in /sys/bus/event_source/devices/cpu/events. The name of the file is itself the alias and each file contains the event code of the actual performance event. The alias names of other performance events for devices other than the CPU can be found in /sys/bus/event_source/devices/<dev>/events.Uintathere
@HadiBrais: Cool, thanks for digging that up. Unfortunately that doesn't include any TLB events on Linux 4.15 on Skylake. find -L /sys/bus/event_source/ -iname '*tlb*' doesn't find any tlb events anywhere. .../cpu/events has what perf calls "Hardware event", but not any of the "Hardware cache event" names.Chiquitachirico
@PeterCordes After a lot more digging, on Skylake, iTLB-loads is mapped to ITLB_MISSES.STLB_HIT and iTLB-load-misses is mapped to ITLB_MISSES.WALK_COMPLETED. The numbers make sense now.Uintathere
On Broadwell (the OP's processor), iTLB-loads is mapped to ITLB_MISSES.STLB_HIT and iTLB-load-misses is mapped to ITLB_MISSES.MISS_CAUSES_A_WALK.Uintathere
U
14

On your Broadwell processor, perf maps iTLB-loads to ITLB_MISSES.STLB_HIT, which represents the event of a TLB lookup that misses the L1 ITLB but hits the unified TLB for all page sizes, and iTLB-load-misses to ITLB_MISSES.MISS_CAUSES_A_WALK, which represents the event of a TLB lookup that misses both the L1 ITLB and the unified TLB (causing a page walk) for all page sizes. Therefore, iTLB-load-misses can be larger or smaller than or equal to iTLB-loads. They are independent events.

Uintathere answered 21/4, 2018 at 19:6 Comment(6)
Seems like a very odd design choice. Would have made more sense for perf to just say that the iTLB-loads event isn't available on those CPUs, instead of confusingly using it for hits in the 2nd-level unified TLB after an iTLB miss.Chiquitachirico
To be clear, this is a bug in perf (at the very least in the way they compare the two numbers), this isn't a "design choice"Breeze
@Breeze Is it a confirmed bug? BTW, this is not the only inconsistency in the perf events. See for example my answer to this other question. These inconsistencies seem to be by design.Uintathere
@HadiBrais well ITLB_MISSES.MISS_CAUSES_A_WALK / ITLB_MISSES.STLB_HIT is a pretty meaningless number right? It seems clear that iTLB-loads is mapped to the wrong underlying event. I don't think there's a plan to fix it or a ticket afaik. I think perf is a shitshow, but I appreciate people like you who dig into the details and document these weird quirksBreeze
How can I get the mapping for my processor?Entree
@PriyankPalod (& anyone else, like myself, who had the same question), see https://mcmap.net/q/429594/-hardware-cache-events-and-perf ; the linux kernel has a static mapping for each processor family, and it appears possible to read the event name -> raw event id out of the initializer (with sometimes a comment to make the "look up raw event ID in the processor's manual" step unnecessary). The trick seems to be knowing the three-letter abbreviation for your intel processor, i.e. Skylake -> skl_hw_cache... vs Golden Cove -> glc_hw_cache....Nombles

© 2022 - 2024 — McMap. All rights reserved.