Linux time sample based profiler

Asked 15/3, 2010 at 17:34 Answered 22/1, 2019 at 6:17

short version:

Is there a good time based sampling profiler for Linux?

long version:

I generally use OProfile to optimize my applications. I recently found a shortcoming that has me wondering.

The problem was a tight loop, spawning c++filt to demangle a c++ name. I only stumbled upon the code by accident while chasing down another bottleneck. The OProfile didn't show anything unusual about the code so I almost ignored it but my code sense told me to optimize the call and see what happened. I changed the popen of c++filt to abi::__cxa_demangle. The runtime went from more than a minute to a little over a second. About a x60 speed up.

Is there a way I could have configured OProfile to flag the popen call? As the profile data sits now OProfile thinks the bottle neck was the heap and std::string calls (which BTW once optimized dropped the runtime to less than a second, more than x2 speed up).

Here is my OProfile configuration:

$ sudo opcontrol --status
Daemon not running
Event 0: CPU_CLK_UNHALTED:90000:0:1:1
Separate options: library
vmlinux file: none
Image filter: /path/to/executable
Call-graph depth: 7
Buffer size: 65536

Is there another profiler for Linux that could have found the bottleneck?

I suspect the issue is that OProfile only logs its samples to the currently running process. I'd like it to always log its samples to the process I'm profiling. So if the process is currently switched out (blocking on IO or a popen call) OProfile would just place its sample at the blocked call.

If I can't fix this, OProfile will only be useful when the executable is pushing near 100% CPU. It can't help with executables that have inefficient blocking calls.

Snyder answered 15/3, 2010 at 17:34 Comment(0)

Glad you asked. I believe OProfile can be made to do what I consider the right thing, which is to take stack samples on wall-clock time when the program is being slow and, if it won't let you examine individual stack samples, at least summarize for each line of code that appears on samples, the percent of samples the line appears on. That is a direct measure of what would be saved if that line were not there. Here's one discussion. Here's another, and another. And, as Paul said, Zoom should do it.

If your time went from 60 sec to 1 sec, that implies every single stack sample would have had a 59/60 probability of showing you the problem.

Leper answered 15/3, 2010 at 18:55 Comment(4)

Mike, your point is very valid, I agree the technique with 100%. Any ideas on how to enable time based sampling via OProfile or in a more automated approach than just breaking in the debugger? – Snyder 15/3, 2010 at 20:3

@Caspin: I'm on Windows, and I'm not a user of OProfile, but this link (oprofile.sourceforge.net/doc/opreport.html) talks about its use and presentation of stack sample data. Also this link (oprofile.sourceforge.net/doc/detailed-parameters.html#timer) talks about timer interrupts. I can't tell if it will take samples during I/O or other blocking calls. – Leper 15/3, 2010 at 20:27

... note that the frequency of sampling does not need to be fast, but it does need to be able to sample during blocking calls unless you want to be blind to those. – Leper 15/3, 2010 at 20:29

... I forgot to mention, lsstack is an easy way to get stack samples manually, as is pstack (but without symbols). – Leper 15/3, 2010 at 20:43

Try Zoom - I believe it will let you profile all processes - it would be interesting to know if it highlights your problem in this case.

Exotoxin answered 15/3, 2010 at 18:57 Comment(1)

Zoom version 1.6.6 does not find the problem either. The next version of Zoom will supposedly have a sampling mode ("thread time profiling") that could find the problem. – Snyder 16/6, 2010 at 23:2

I wrote this a long time ago, only because I couldn't find anything better: https://github.com/dicej/profile

I just found this, too, though I haven't tried it: https://github.com/oliver/ptrace-sampler

Sartin answered 4/4, 2014 at 22:29 Comment(0)

After trying everything suggested here (except for the now-defunct Zoom, which is still available as huge file from dropbox), I found that NOTHING does what Mr. Dunlavey recommends. The "quick hacks" listed above in some of the answers wouldn't build for me, or didn't work for me either. Spent all day trying stuff... and nothing could find fseek as a hotspot in an otherwise simple test program that was I/O bound.

So I coded up yet another profiler, this time with no build dependencies, based on GDB, so it should "just work" for almost any debuggable code. A single CPP file.

https://github.com/jasonrohrer/wallClockProfiler

It automates the manual process suggested by Mr. Dunlavey, interrupting the target process with GDB periodically and harvesting a stack trace, and then printing a report at the end about which stack traces are the most common. Those are your true wall-clock hotspots. And it actually works.

Prat answered 22/1, 2019 at 6:17 Comment(0)

Quickly hacked up trivial sampling profiler for linux: http://vi-server.org/vi/simple_sampling_profiler.html

It appends backtrace(3) to a file on SIGUSR1, and then converts it to annotated source.

Gahnite answered 22/10, 2010 at 2:24 Comment(4)

Take a look at lsstack is implemented. It doesn't need special driver code of get the current backtrace. Also put your code out on bitbucket or google code. If you get a decently done project I'll contribute bug fixes as I use it. – Snyder 22/10, 2010 at 17:54

@caspin, OK, Now I'm looking how to make use of gdb (#3999964) to do the same thing. – Gahnite 22/10, 2010 at 18:37

I tried your code, but it hangs, because you are using fopen from a signal handler, which is known to hang (you can only use very few functions from a signal handler). – Paraphrastic 24/4, 2014 at 18:52

Should it work reliably with low-level "open/write/close" instead of fopen? – Gahnite 25/4, 2014 at 15:23

Recommended topics

Hot tags