intel-pmu - McMap

2

How does one enable Intel Processor Tracing (IPT) in a virtualized environment?

I am attempting to run Alex Ionescu's WinIPT interface in a virtual machine, and having no success. (This is a Windows 10 Pro host running a Windows 10 VM and both are the 18363 update) I have suc...

kernel intel trace virtualization intel-pmu

Holophrastic asked 7/2, 2020 at 21:24

1

Solved

PMU x86-64 performance counters not showing in perf under AWS

I am running a C++ benchmark test for a specific application. In this test, I open the performance counter file (__NR_perf_event_open syscall) before the critical section, proceed with the section ...

amazon-web-services amazon-ec2 linux-kernel perf intel-pmu

Ginnifer asked 30/9, 2021 at 16:36

2

Solved

What is the overhead of using Intel Last Branch Record?

Last Branch Record refers to a collection of register pairs (MSRs) that store the source and destination addresses related to recently executed branches. http://css.csail.mit.edu/6.858/2012/reading...

x86 intel trace branch-prediction intel-pmu

Rockefeller asked 3/2, 2013 at 8:7

5

Solved

Can the Intel performance monitor counters be used to measure memory bandwidth?

Can the Intel PMU be used to measure per-core read/write memory bandwidth usage? Here "memory" means to DRAM (i.e., not hitting in any cache level).

performance x86 intel-pmu memory-bandwidth

Entresol asked 2/12, 2017 at 21:37

2

Perf stat equivalent for Mac OS?

Is there a perf stat equivalent on Mac OS? I would like to do the same thing for a CLI command and googling is not yielding anything.

macos profiling performancecounter perf intel-pmu

Caston asked 6/4, 2020 at 21:15

2

Is it possible for the RESOURCE_STALLS.RS event to occur even when the RS is not completely full?

The description of the RESOURCE_STALLS.RS hardware performance event for Intel Broadwell is the following: This event counts stall cycles caused by absence of eligible entries in the reservatio...

performance x86 intel cpu-architecture intel-pmu

Manhandle asked 5/10, 2018 at 0:15

2

Solved

rdpmc: surprising behavior

I'm trying to understand the rdpmc instruction. As such I have the following asm code: segment .text global _start _start: xor eax, eax mov ebx, 10 .loop: dec ebx jnz .loop mov ecx, 1<&l...

performance assembly x86 performancecounter intel-pmu

Casebook asked 17/5, 2019 at 19:43

1

Solved

Why are the user-mode L1 store miss events only counted when there is a store initialization loop?

Summary Consider the following loop: loop: movl $0x1,(%rax) add $0x40,%rax cmp %rdx,%rax jne loop where rax is initialized to the address of a buffer that is larger than the L3 cache size. Ever...

x86 intel performancecounter cpu-cache intel-pmu

Planter asked 5/3, 2019 at 2:59

1

Solved

How does perf use the offcore events?

Some built-in perf events are mapped to offcore events. For example, LLC-loads and LLC-load-misses are mapped to OFFCORE_RESPONSE. events. This can be easily determined as discussed in here. Howeve...

linux-kernel x86 perf intel-pmu

Cornia asked 16/1, 2019 at 18:19

0

On Skylake (SKL) why are there L2 writebacks in a read-only workload that exceeds the L3 size?

Consider the following simple code: #include <stdlib.h> #include <stdio.h> #include <string.h> #include <time.h> #include <err.h> int cpu_ms() { return (int)(clock(...

performance x86 cpu-cache perf intel-pmu

Ewers asked 29/9, 2018 at 5:9

2

Solved

Why does the number of uops per iteration increase with the stride of streaming loads?

Consider the following loop: .loop: add rsi, OFFSET mov eax, dword [rsi] dec ebp jg .loop where OFFSET is some non-negative integer and rsi contains a pointer to a buffer defined in the bss...

assembly x86 cpu-architecture intel-pmu

Swanhilda asked 26/9, 2018 at 23:25

2

Solved

Haswell memory access

I was experimenting with AVX -AVX2 instruction sets to see the performance of streaming on consecutive arrays. So I have below example, where I do basic memory read and store. #include <iostrea...

performance x86 cpu-architecture avx2 intel-pmu

Tiemroth asked 27/10, 2013 at 18:8

0

Why does Linux perf use event l1d.replacement for "L1 dcache misses" on x86?

On Intel x86, Linux uses the event l1d.replacements to implement its L1-dcache-load-misses event. This event is defined as follows: Counts L1D data line replacements including opportunistic re...

linux x86 profiling perf intel-pmu

Preconscious asked 4/9, 2018 at 20:20

1

Solved

Hardware cache events and perf

When I run perf list I see a bunch of Hardware Cache Events, as follows: $ perf list | grep 'cache event' L1-dcache-load-misses [Hardware cache event] L1-dcache-loads [Hardware cache event] L1-...

linux performance x86 perf intel-pmu

Shields asked 4/9, 2018 at 16:58

1

Solved

Can the LSD issue uOPs from the next iteration of the detected loop?

I was playing investigating the capabilities of the branch unit on port 0 of my Haswell starting with a very simple loop: BITS 64 GLOBAL _start SECTION .text _start: mov ecx, 10000000 .loop:...

assembly x86 cpu-architecture intel-pmu

Slumberous asked 28/8, 2018 at 9:32

1

Solved

What restriction is perf_event_paranoid == 1 actually putting on x86 perf?

Newer Linux kernels have a sysfs tunable /proc/sys/kernel/perf_event_paranoid which allows the user to adjust the available functionality of perf_events for non-root users, with higher numbers bein...

linux-kernel x86 profiling perf intel-pmu

Barbabra asked 18/8, 2018 at 18:8

2

Solved

Reliability of Xcode Instrument's disassembly time profiling

I've profiled my code using Instrument's time profiler, and zooming in to the disassembly, here's a snippet of its results: I wouldn't expect a mov instruction to take 23.3% of the time while a ...

xcode x86 profiling instruments intel-pmu

Isoagglutinin asked 21/1, 2018 at 16:58

2

Perf tool stat output: multiplex and scaling of "cycles"

I am trying to understand the multiplex and scaling of "cycles" event in the "perf" output. The following is the output of perf tool: 144094.487583 task-clock (msec) # 1.017 CPUs utilized 53991...

linux linux-kernel intel perf intel-pmu

Ervin asked 24/1, 2018 at 4:24

1

Solved

How to Configure and Sample Intel Performance Counters In-Process

In a nutshell, I'm trying to achieve the following inside a userland benchmark process (pseudo-code, assuming x86_64 and a UNIX system): results[] = ... for (iteration = 0; iteration < num_iter...

linux x86 benchmarking perf intel-pmu

Remonstrance asked 18/8, 2016 at 15:6

intel-pmu Questions

Recommended topics

Hot tags