What is the difference between PAPI_L1_LDM
in papi and L1-dcache-load-misses
in perf?
I've used the same setting, like this post here.
So, as a result I get for papi:
PAPI_L1_DCM: 515 <- L1 data cache miss (probably L1D_READ_MISSES_ALL + L1D_READ_MISSES_RETRIED?)
PAPI_L1_ICM: 300 <- L1 Instruction cache miss
PAPI_L1_LDM: 441 <- L1 Load data miss
PAPI_L1_TCM: 815 <- L1 Total cache miss
Unfortunately PAPI_L1_DCA
is not supported at this machine.
And for perf (only in the user-space, since papi measures also only user-space and no kernel space):
call: perf stat -B -e L1-dcache-load-misses:u,cache-misses:u ./perf
16,539 L1-dcache-load-misses
128 cache-misses:u
16,539 seems to be more reasonable for N=1000000
. What is the difference between a load-data-miss (PAPI_L1_LDM in papi) and a data cache miss (PAPI_L1_DCM in papi) and why do these numbers differ in papi and perf? Is the cache-misses:u
in perf related to the L2 cache-misses?
edit: Hardware (Xeon E5-2600 v3 family, Haswell-EP 12 cores)