How does gperftools work under the hood?
Asked Answered
D

1

7

I am looking for a simple explanation of how gperftools works. So far, this is what I have learned:

  • It runs a stop-the-world sampler. In other words, it periodically stops the program being profiled to collect information.
  • Golang's pprof library uses gperftools underneath.

Besides a general overview, here are some specific questions I would like answered:

  • Is gperftools an "event based profiler" or "instrumentation profiler". From what I understand, these profilers modify the way a program runs and collect samples via those modifications
  • At what 'level' in the OS does gperftools profile? Does it profile the kernal like SystemTap or perf?
  • Is gperftools safe to run on a high-traffic production server?

I am asking this question to reason about the overhead introduced by using pprof on a Go server.

Duchess answered 17/6, 2015 at 23:48 Comment(0)
P
7

It is a sampling profiler.

Basically, there are two types of profiling: either you keep track of everything the program does (keeping count of every call, wrapping every function in a timer, in other words, permeating the code with your instruments) or else you let it run itself but just briefly check up on it every now and then (taking samples).

The problem with instrumentation is that it changes the way the program performs. It slows down the program, in a way which also distorts the results. (For example, the production code may be spending too much time waiting for IO, but the instrumented code might not exhibit this.) It also collects far more data than is statistically necessary (if ultimately all you care about is identifying where most time is spent).

By running strace, you can see that Google-perftools works using SIGPROF signals (as do HPCToolkit and Open|SpeedShop). Presumably it just sets up an event handler then lingers in memory, not consuming any CPU cycles, until the hardware/OS interrupts your program (which can be as infrequent as you like), and then presumably it just saves a copy of the call stack (and schedules the next interrupt) before letting control return to your program. The call stack lists what function your program was up to (and which parent function had invoked that, and so, which is how "return" statements work..).

Pouter answered 11/7, 2015 at 3:13 Comment(2)
+1 It is not very well understood just how informative stack samples actually are, especially if individual ones are examined. Sadly, this lack of understanding leads to the general assumption that a large number of them are needed, and therefore they must be summarized (into self-time, cumulative-time, call graphs, flame-graphs, etc.) It is very easy for big speedups to hide in those summaries, but they can't hide from a programmer just examining a small number of samples. Examples.Loopy
And SIGPROF is generated by interval timer, the setitimer() - github.com/gperftools/gperftools/blob/… setitimer(timer_type_, &timer, 0);Kristofer

© 2022 - 2024 — McMap. All rights reserved.