Different read and write count using cachegrind and callgrind

Asked 3/4, 2013 at 14:44 Answered 21/5, 2013 at 3:33

I am doing some experiments with Cachegrind, Callgrind and Gem5. I noticed that a number of accesses were counted as read for cachegrind, as write for callgrind and for both read and write by gem5.

Let's take a very simple example:

int main() {
    int i, l;

    for (i = 0; i < 1000; i++) {
        l++;
        l++;
        l++;
        l++;
        l++;
        l++;
        l++;
        l++;
        l++;
        l++;
        ... (100 times)
     }
 }

I compile with:

gcc ex.c --static -o ex

So basically, according to the asm file, addl $1, -8(%rbp) is executed 100,000 times. Since it's both a read and a write, I was expecting 100k read and 100k write. However, cachegrind only counts them as read and callgrind only as write.

 % valgrind --tool=cachegrind --I1=512,8,64 --D1=512,8,64
--L2=16384,8,64 ./ex
==15356== Cachegrind, a cache and branch-prediction profiler
==15356== Copyright (C) 2002-2012, and GNU GPL'd, by Nicholas Nethercote et al.
==15356== Using Valgrind-3.8.1 and LibVEX; rerun with -h for copyright info
==15356== Command: ./ex
==15356== 
--15356-- warning: L3 cache found, using its data for the LL simulation.
==15356== 
==15356== I   refs:      111,535
==15356== I1  misses:        475
==15356== LLi misses:        280
==15356== I1  miss rate:    0.42%
==15356== LLi miss rate:    0.25%
==15356== 
==15356== D   refs:      104,894  (103,791 rd   + 1,103 wr)
==15356== D1  misses:        557  (    414 rd   +   143 wr)
==15356== LLd misses:        172  (     89 rd   +    83 wr)
==15356== D1  miss rate:     0.5% (    0.3%     +  12.9%  )
==15356== LLd miss rate:     0.1% (    0.0%     +   7.5%  )
==15356== 
==15356== LL refs:         1,032  (    889 rd   +   143 wr)
==15356== LL misses:         452  (    369 rd   +    83 wr)
==15356== LL miss rate:      0.2% (    0.1%     +   7.5%  )

 % valgrind --tool=callgrind --I1=512,8,64 --D1=512,8,64
--L2=16384,8,64 ./ex
==15376== Callgrind, a call-graph generating cache profiler
==15376== Copyright (C) 2002-2012, and GNU GPL'd, by Josef Weidendorfer et al.
==15376== Using Valgrind-3.8.1 and LibVEX; rerun with -h for copyright info
==15376== Command: ./ex
==15376== 
--15376-- warning: L3 cache found, using its data for the LL simulation.
==15376== For interactive control, run 'callgrind_control -h'.
==15376== 
==15376== Events    : Ir Dr Dw I1mr D1mr D1mw ILmr DLmr DLmw
==15376== Collected : 111532 2777 102117 474 406 151 279 87 85
==15376== 
==15376== I   refs:      111,532
==15376== I1  misses:        474
==15376== LLi misses:        279
==15376== I1  miss rate:    0.42%
==15376== LLi miss rate:    0.25%
==15376== 
==15376== D   refs:      104,894  (2,777 rd + 102,117 wr)
==15376== D1  misses:        557  (  406 rd +     151 wr)
==15376== LLd misses:        172  (   87 rd +      85 wr)
==15376== D1  miss rate:     0.5% ( 14.6%   +     0.1%  )
==15376== LLd miss rate:     0.1% (  3.1%   +     0.0%  )
==15376== 
==15376== LL refs:         1,031  (  880 rd +     151 wr)
==15376== LL misses:         451  (  366 rd +      85 wr)
==15376== LL miss rate:      0.2% (  0.3%   +     0.0%  )

Could someone give me a reasonable explanation? Would I be correct to consider there are in fact ~100k reads and ~100k writes (i.e. 2 cache accesses for an addl)?

Clubman answered 3/4, 2013 at 14:44 Comment(0)

From cachegrind manual: 5.7.1. Cache Simulation Specifics

Instructions that modify a memory location (e.g. inc and dec) are counted as doing just a read, i.e. a single data reference. This may seem strange, but since the write can never cause a miss (the read guarantees the block is in the cache) it's not very interesting.

Thus it measures not the number of times the data cache is accessed, but the number of times a data cache miss could occur.

It would seem that callgrind's cache simulation logic is different from cachegrind. I would think that callgrind should produce the same results as cachegrind, so maybe this is a bug?

Commissionaire answered 21/5, 2013 at 3:33 Comment(2)

That's exactly my thoughts. Probably a bug but it's surprising. Did they wrote a cache simulation twice? – Amhara 24/5, 2013 at 15:14

From what I can see, it seems they implemented atleast part of the cache simulation twice. I don't fully understand the native->VEX IR conversion and instrumentation logic. – Commissionaire 25/5, 2013 at 2:43

-1

callgrind does not full cache simulation by default. see here: http://valgrind.org/docs/manual/cl-manual.html#cl-manual.options.cachesimulation

To enable data read access you need to add --cache-sim=yes for callgrind. Having said this, why even using callgrind on this code? There is not a single function call (which is what callgrind is for)

Eyewitness answered 22/4, 2013 at 23:7 Comment(1)

Adding cache-sim=yes doesn't change anything: specifying the cache sizes automatically activates the cache simulation. – Amhara 24/4, 2013 at 12:4

Recommended topics

Hot tags