Microsecond accurate (or better) process timing in Linux

C

8

14

I need a very accurate way to time parts of my program. I could use the regular high-resolution clock for this, but that will return wallclock time, which is not what I need: I needthe time spent running only my process.

I distinctly remember seeing a Linux kernel patch that would allow me to time my processes to nanosecond accuracy, except I forgot to bookmark it and I forgot the name of the patch as well :(.

I remember how it works though:

On every context switch, it will read out the value of a high-resolution clock, and add the delta of the last two values to the process time of the running process. This produces a high-resolution accurate view of the process' actual process time.

The regular process time is kept using the regular clock, which is I believe millisecond accurate (1000Hz), which is much too large for my purposes.

Does anyone know what kernel patch I'm talking about? I also remember it was like a word with a letter before or after it -- something like 'rtimer' or something, but I don't remember exactly.

(Other suggestions are welcome too)

The Completely Fair Scheduler suggested suggested by Marko is not what I was looking for, but it looks promising. The problem I have with it is that the calls I can use to get process time are still not returning values that are granular enough.

times() is returning values 21, 22, in milliseconds.
clock() is returning values 21000, 22000, same granularity.
getrusage() is returning values like 210002, 22001 (and somesuch), they look to have a bit better accuracy but the values look conspicuously the same.

So now the problem I'm probably having is that the kernel has the information I need, I just don't know the system call that will return it.

Candlefish answered 8/10, 2008 at 12:48 Comment(1)

Maybe this will help with CFS: kerneltrap.org/node/8059 There is an email from the author, containing rough instructions for configuration. – Hirudin 9/10, 2008 at 11:28

L

5

If you are looking for this level of timing resolution, you are probably trying to do some micro-optimization. If that's the case, you should look at PAPI. Not only does it provide both wall-clock and virtual (process only) timing information, it also provides access to CPU event counters, which can be indispensable when you are trying to improve performance.

http://icl.cs.utk.edu/papi/

Loaning answered 30/12, 2008 at 18:9 Comment(0)

J

5

See this question for some more info.

Something I've used for such things is gettimeofday(). It provides a structure with seconds and microseconds. Call it before the code, and again after. Then just subtract the two structs using timersub, and you can get the time it took in seconds from the tv_usec field.

Juanjuana answered 30/12, 2008 at 16:11 Comment(0)

L

5

If you are looking for this level of timing resolution, you are probably trying to do some micro-optimization. If that's the case, you should look at PAPI. Not only does it provide both wall-clock and virtual (process only) timing information, it also provides access to CPU event counters, which can be indispensable when you are trying to improve performance.

http://icl.cs.utk.edu/papi/

Loaning answered 30/12, 2008 at 18:9 Comment(0)

J

3

If you need very small time units to for (I assume) testing the speed of your software, I would reccomend just running the parts you want to time in a loop millions of times, take the time before and after the loop and calculate the average. A nice side-effect of doing this (apart from not needing to figure out how to use nanoseconds) is that you would get more consistent results because the random overhead caused by the os sceduler will be averaged out.

Of course, unless your program doesn't need to be able to run millions of times in a second, it's probably fast enough if you can't measure a millisecond running time.

Jampan answered 8/10, 2008 at 14:49 Comment(1)

This is exactly what I do if I want to measure speed. You don't say what your goal is. If I want to find out what to optimize, that is a different goal from measurement, and needs different methods. For that, sampling the call stack is what I use. – Lues 30/12, 2008 at 18:38

H

1

I believe CFC (Completely Fair Scheduler) is what you're looking for.

Hirudin answered 8/10, 2008 at 14:50 Comment(0)

N

1

You can use the High Precision Event Timer (HPET) if you have a fairly recent 2.6 kernel. Check out Documentation/hpet.txt on how to use it. This solution is platform dependent though and I believe it is only available on newer x86 systems. HPET has at least a 10MHz timer so it should fit your requirements easily.

I believe several PowerPC implementations from Freescale support a cycle exact instruction counter as well. I used this a number of years ago to profile highly optimized code but I can't remember what it is called. I believe Freescale has a kernel patch you have to apply in order to access it from user space.

Nummulite answered 8/10, 2008 at 14:57 Comment(0)

T

1

http://allmybrain.com/2008/06/10/timing-cc-code-on-linux/

might be of help to you (directly if you are doing it in C/C++, but I hope it will give you pointers even if you're not)... It claims to provide microsecond accuracy, which just passes your criterion. :)

Tallyman answered 8/10, 2008 at 16:53 Comment(0)

C

1

I think I found the kernel patch I was looking for. Posting it here so I don't forget the link:

http://user.it.uu.se/~mikpe/linux/perfctr/ http://sourceforge.net/projects/perfctr/

Edit: It works for my purposes, though not very user-friendly.

Candlefish answered 19/11, 2008 at 10:53 Comment(0)

P

1

try the CPU's timestamp counter? Wikipedia seems to suggest using clock_gettime().

Pedaiah answered 30/12, 2008 at 19:13 Comment(1)

You also need to synchronise TSC reads to prevent instruction re-ordering breaking the test period, alongside usual affinity requirements. – Anlage 31/12, 2010 at 6:7

Recommended topics

Hot tags