intel-pmu Questions
2
I am attempting to run Alex Ionescu's WinIPT interface in a virtual machine, and having no success. (This is a Windows 10 Pro host running a Windows 10 VM and both are the 18363 update)
I have suc...
Holophrastic asked 7/2, 2020 at 21:24
1
Solved
I am running a C++ benchmark test for a specific application. In this test, I open the performance counter file (__NR_perf_event_open syscall) before the critical section, proceed with the section ...
Ginnifer asked 30/9, 2021 at 16:36
2
Solved
Last Branch Record refers to a collection of register pairs (MSRs) that store the source and destination addresses related to recently executed branches. http://css.csail.mit.edu/6.858/2012/reading...
Rockefeller asked 3/2, 2013 at 8:7
5
Solved
Can the Intel PMU be used to measure per-core read/write memory bandwidth usage? Here "memory" means to DRAM (i.e., not hitting in any cache level).
Entresol asked 2/12, 2017 at 21:37
2
Is there a perf stat equivalent on Mac OS? I would like to do the same thing for a CLI command and googling is not yielding anything.
Caston asked 6/4, 2020 at 21:15
2
The description of the RESOURCE_STALLS.RS hardware performance event for Intel Broadwell is the following:
This event counts stall cycles caused by absence of eligible entries
in the reservatio...
Manhandle asked 5/10, 2018 at 0:15
2
Solved
I'm trying to understand the rdpmc instruction. As such I have the following asm code:
segment .text
global _start
_start:
xor eax, eax
mov ebx, 10
.loop:
dec ebx
jnz .loop
mov ecx, 1<&l...
Casebook asked 17/5, 2019 at 19:43
1
Solved
Summary
Consider the following loop:
loop:
movl $0x1,(%rax)
add $0x40,%rax
cmp %rdx,%rax
jne loop
where rax is initialized to the address of a buffer that is larger than the L3 cache size. Ever...
Planter asked 5/3, 2019 at 2:59
1
Solved
Some built-in perf events are mapped to offcore events. For example, LLC-loads and LLC-load-misses are mapped to OFFCORE_RESPONSE. events. This can be easily determined as discussed in here. Howeve...
Cornia asked 16/1, 2019 at 18:19
0
Consider the following simple code:
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <time.h>
#include <err.h>
int cpu_ms() {
return (int)(clock(...
Ewers asked 29/9, 2018 at 5:9
2
Solved
Consider the following loop:
.loop:
add rsi, OFFSET
mov eax, dword [rsi]
dec ebp
jg .loop
where OFFSET is some non-negative integer and rsi contains a pointer to a buffer defined in the bss...
Swanhilda asked 26/9, 2018 at 23:25
2
Solved
I was experimenting with AVX -AVX2 instruction sets to see the performance of streaming on consecutive arrays. So I have below example, where I do basic memory read and store.
#include <iostrea...
Tiemroth asked 27/10, 2013 at 18:8
0
On Intel x86, Linux uses the event l1d.replacements to implement its L1-dcache-load-misses event.
This event is defined as follows:
Counts L1D data line replacements including opportunistic
re...
1
Solved
When I run perf list I see a bunch of Hardware Cache Events, as follows:
$ perf list | grep 'cache event'
L1-dcache-load-misses [Hardware cache event]
L1-dcache-loads [Hardware cache event]
L1-...
Shields asked 4/9, 2018 at 16:58
1
Solved
I was playing investigating the capabilities of the branch unit on port 0 of my Haswell starting with a very simple loop:
BITS 64
GLOBAL _start
SECTION .text
_start:
mov ecx, 10000000
.loop:...
Slumberous asked 28/8, 2018 at 9:32
1
Solved
Newer Linux kernels have a sysfs tunable /proc/sys/kernel/perf_event_paranoid which allows the user to adjust the available functionality of perf_events for non-root users, with higher numbers bein...
Barbabra asked 18/8, 2018 at 18:8
2
Solved
I've profiled my code using Instrument's time profiler, and zooming in to the disassembly, here's a snippet of its results:
I wouldn't expect a mov instruction to take 23.3% of the time while a ...
Isoagglutinin asked 21/1, 2018 at 16:58
2
I am trying to understand the multiplex and scaling of "cycles" event in the "perf" output.
The following is the output of perf tool:
144094.487583 task-clock (msec) # 1.017 CPUs utilized
53991...
Ervin asked 24/1, 2018 at 4:24
1
Solved
In a nutshell, I'm trying to achieve the following inside a userland benchmark process (pseudo-code, assuming x86_64 and a UNIX system):
results[] = ...
for (iteration = 0; iteration < num_iter...
Remonstrance asked 18/8, 2016 at 15:6
1
© 2022 - 2024 — McMap. All rights reserved.