I have a model code on which kcachegrind/callgrind reports strange results. It is kind of dispatcher function. The dispatcher is called from 4 places; each call says, which actual do_J
function to run (so the first2
will call only do_1
and do_2
and so on)
Source (this is a model of actual code)
#define N 1000000
int a[N];
int do_1(int *a) { int i; for(i=0;i<N/4;i++) a[i]+=1; }
int do_2(int *a) { int i; for(i=0;i<N/2;i++) a[i]+=2; }
int do_3(int *a) { int i; for(i=0;i<N*3/4;i++) a[i]+=3; }
int do_4(int *a) { int i; for(i=0;i<N;i++) a[i]+=4; }
int dispatcher(int *a, int j) {
if(j==1) do_1(a);
else if(j==2) do_2(a);
else if(j==3) do_3(a);
else do_4(a);
}
int first2(int *a) { dispatcher(a,1); dispatcher(a,2); }
int last2(int *a) { dispatcher(a,4); dispatcher(a,3); }
int inner2(int *a) { dispatcher(a,2); dispatcher(a,3); }
int outer2(int *a) { dispatcher(a,1); dispatcher(a,4); }
int main(){
first2(a);
last2(a);
inner2(a);
outer2(a);
}
Compiled with gcc -O0
; Callgrinded with valgrind --tool=callgrind
; kcachegrinded with kcachegrind
and qcachegrind-0.7
.
Here is a full callgraph of the application. All paths to do_J go through dispatcher and this is good (the do_1 is just hided as too fast, but it is here really, just left to do_2)
Lets focus on do_1
and check, who called it (this picture is incorrect):
And this is very strange, I think, only first2
and outer2
called do_1
but not all.
Is it a limitation of callgrind/kcachegrind? How can I get accurate callgraph with weights (proportional to running time of every function, with and without its childs)?
--separate-callers=N
option of callgrind to record N slots of callstack – Jibe