How exactly does gprof work?

Asked 12/2, 2011 at 22:46 Answered 18/2, 2011 at 20:11

This is something of a bloated question, so I apologize ahead of time. I'm curious about how gprof works on a low technical level. I understand it's done by timers, but then why does the executable need to be specially compiled to be profiled? Does the compilation cause space to be allocated for statistics?

Also, how is the timing done exactly?

Esteresterase answered 12/2, 2011 at 22:46 Comment(0)

Well, this gives a good explanation. Also this explains statistical profiling

Essentially gprof will change the executable of your program (this is called instrumenting the code) to store some book-keeping information, e.g. how many times a function is called.

The statistical profiling bit comes from snooping the program counter regularly to get a sample of what your code is doing.

Gprof does both. It instruments the code and collects samples from looking at the program counter.

Sup answered 12/2, 2011 at 23:3 Comment(2)

wouldn't it have to crawl up the stack to get reliable traces as well? – Esteresterase 12/2, 2011 at 23:9

@Alex: The fact that it didn't is one of its major problems. – Biquadrate 13/2, 2011 at 17:28

Having just read the paper (again), let me try to explain it.

Suppose it takes samples at 100Hz, except not when the process is blocked for IO or some other reason. Each sample records the PC, which is in some function. The count of samples in that function is incremented.

So, at the end of a run, if there were (say) 1000 samples, that means the total execution time (CPU-only) was 10 seconds. If routine B recorded, say, 500 of those samples, that means its total execution time is 1/2 of the total, or 5 seconds. This is its self time because the PC is in it. This does not tell how long it takes to execute, on average. To tell that, you need to know how many times it was called. Also it does not include time spent in callees.

When the code is compiled with the -pg flag, a special call is inserted in each routine's entry code. That notices that routine B is entered, and it notices that it is called from a call site in routine A. There is a table, indexed by that call site, where that call can be counted. So at the end, gprof can tell how many times B was called in total, and how many of those were from A.

To get the average self time of B, its total self time is divided by the number of times it is called.

In order to get the total cumulative time (self plus callees) of routine A, gprof needs the self time of A, plus the total number of times it calls each subordinate routine B, times the average cumulative time of B. That number is then divided by the total number of calls to A to get the average cumulative time of A.

Sounds good, until recursion enters the picture, where it gets even more confusing.

It's all very clever, and as the authors point out, full of caveats.

Biquadrate answered 18/2, 2011 at 20:11 Comment(2)

hi, what's the difference betwenn this option and -finstrument-functions option? – Fulbright 16/1, 2017 at 12:14

@elinx: -finstrument-functions is a gcc option (not gprof) that lets you put entry and exit functions that you write into every routine the compiler processes. It only gives you the addresses of the caller and callee, so you have to do some post-processing. If you want to do timing, you have to figure out how. It's a long way from diagnosing performance issues. I show a simple method of doing so in this short video. – Biquadrate 16/1, 2017 at 13:57