I'm trying to profile some C code but one of the most intuitively costly functions isn't showing up in the GProf output.
int main() {
initialise...
haloSwap();
for(...) {
functions...
propagate();
functions...
}
}
void propagate() {
for (x)
for (y)
for (z)
grid[xNew][yNew][zNew] = grid[x][y][z];
haloSwap();
}
void haloSwap() {
// Horizontal swap
create buffers...
MPI_Sendrecv(buffers);
recreate grid from buffers...
// Vertical swap
create buffers...
MPI_Sendrecv(buffers);
recreate grid from buffers...
}
Hopefully that pseudo-code goes some way to explaining the set up. haloSwap()
involves a lot of communication between threads and I feel it's an expensive part of the algorithm. It's called during initialisation and then repeatedly during the loop of the algorithm.
GProf shows only 1 call to haloSwap
(during init), even though I know it's called 1000+ times from inside propagate()
.
propagate()
is showing as the most expensive part of the code, but I'd like to know whether it's the xyz loop(s) or the MPI comminucation.
Does anyone know why the calls to haloSwap
from propagate
are seemingly ignored in both the number of calls and the time spent in the function?
haloSwap
is defined within another .c file, which may be a factor?
If I move the call of haloSwap
to the main loop after calling propagate
(instead of inside it), GProf still only shows 1 call to it.
propagate()
ends up being inlined. Tell your compiler to not inline functions. For example, with GCC the option is-fno-inline
. – Max