What is the accuracy of interval timers in Linux?
Asked Answered
P

2

16

I am trying to characterize timer jitter on Linux. My task was to run 100ms timers and see how the numbers work out.

I'm working on a multicore machine. I used a standard user program with setitimer(), the same run as root, then with processor affinity, and finally with processor affinity and with process priority. Then I ran the same with the PREEMPT_RT kernel and then ran the examples using clock_nanosleep() as in the demo code on the PREEMPT_RT page. Of all the runs, the timer performance was very similar, with no real difference despite the changes.

Our end goal is a steady timer. The best worst-case I could get regularly was about 200us. The histogram for all cases shows really odd behavior. For one, I wouldn't expect timers to fire early. But they do (edit: okay they don't, but they appear to). And as you can see in the histogram, I get troughs on either side of 0 offset. These are visible in three bands in the second graph. In the first graph, the X axis is in microseconds. In the second graph, the Y axis is in microseconds.

I ran a 30s test (that is, 300 timer events) 100 times to generate some numbers. You can see them in the following diagrams. There is a big drop off at 200us. All 30000 timer event clock offsets are graphed in the second graph, where you can see some outliers.

X axis is in microsecondsY axis is in microseconds

So the question is, has anyone else done this kind of analysis before? Did you see the same sort of behavior? My assumption is the RT kernel helps on systems with heavy loads, but in our case it didn't help elimitating timer jitter. Is that your experience?

Here's the code. Like I said before, I modified the example code on PREEMPT_RT site that uses the clock_nanosleep() function, so I won't include my minimal changes for that.

#include <signal.h>
#include <stdio.h>
#include <string.h>
#include <sys/time.h>
#include <stdlib.h>

#define US_PER_SEC 1000000
#define WAIT_TIME 100000
#define MAX_COUNTER 300

int counter = 0;
long long last_time = 0;
static long long times[MAX_COUNTER];
int i = 0;

struct sigaction sa;

void timer_handler(int signum)
{
    if (counter > MAX_COUNTER)
    {
        sigaction(SIGALRM, &sa, NULL);
        for (i = 0; i < MAX_COUNTER; i++)
        {
            printf("%ld\n", times[i]);
        }
        exit(EXIT_SUCCESS);
    }

    struct timeval t;
    gettimeofday(&t, NULL);

    long long elapsed = (t.tv_sec * US_PER_SEC + t.tv_usec);

    if (last_time != 0)
    {
        times[counter] = elapsed - last_time;
        ++counter;
    }

    last_time = elapsed; 
}

int main()
{
    struct itimerval timer;

    memset(&sa, 0, sizeof(sa));

    sa.sa_handler = &timer_handler;

    sigaction(SIGALRM, &sa, NULL);

    timer.it_value.tv_sec = 0;
    timer.it_value.tv_usec = 1;

    timer.it_interval.tv_sec = 0;
    timer.it_interval.tv_usec = WAIT_TIME;

    setitimer(ITIMER_REAL, &timer, NULL);

    while (1)
    {
        sleep(1);
    }
}

EDIT: this is on a Xeon E31220L, running at 2.2 GHz, running x86_64 Fedora Core 19.

Phosphoric answered 24/11, 2013 at 22:48 Comment(11)
This is on x86 right? What hardware? There are so many variables involved here, I think it's going to be incredibly hard to make any valid generalizations.Hardpressed
Added information about the CPU architecture. Thanks!Phosphoric
Please include the source of the timer_handler() function as well.Untraveled
Added code to timer_handler().Phosphoric
are you running ntpd or any other process that may be messing with the clock?Lituus
I installed base Fedora Core, with only developer tools. No GUI or anything. There may be something lurking. I'll have to look. From what I read, ntpd runs at 64s intervals, so that specifically shouldn't have such a profound, regular impact.Phosphoric
One may also point out that setitimer() is supposed to be obsolete and the whole signal() event delivery route unreliable. Consider using timerfd_create() and friends, you may get better results out of the box.Hadlock
@PaxtonSanders What kind of library did you use to generate these nice graphs?Khano
@Ian It's just GnuPlot. Old but works great once you figure out some of the basics. I can send you the scripts I used to generate the graphs if you'd like to have them.Phosphoric
That'd be great! [email protected] thanks!Khano
While Linux is awesome by itself, what PREEMPT-RT does is to make sure your realtime task runs no matter what. If you run your test when there is absolutely nothing else going on, Linux already does a good job. However, try watching a video, copy stuff, compile the kernel and run your test at the same time. You would see the difference between vanilla and PREEMPT-RT Linux then.Sergius
U
2

You're right not to expect timers to fire early - and they don't. The apparent early firing is because you're not measuring the time since the previous timer expired - you're measuring the time since the previous gettimeofday() call. If there was a delay between the timer expiring and the process actually getting scheduled, then you will see this gettimeofday() running late, and the next one running early by the same amount.

Instead of logging the difference between subsequent gettimeofday() calls, try logging the absolute times returned, then compare the returned times against N * 100ms after the initial time.

If you want PREEMPT_RT to help you, you will need to set a real-time scheduler policy for your test program (SCHED_FIFO or SCHED_RR), which requires root.

Untraveled answered 24/11, 2013 at 23:24 Comment(3)
For PREEMPT_RT, I did exactly that. I have suspected my gettimeofday() calls to be suspect, and your explanation clears things up. The reason the PREEMPT_RT exhibited the same behavior is because I use the same piece of code to compare times. Thanks for the reply.Phosphoric
As an aside, instead of sleep(1); you can use pause(); in that loop.Untraveled
Also, I've heard that gettimeofday() is not realtime-safe under PREEMPT_RT; the correct alternative is clock_gettime()Salto
L
2

I made some changes to your code, and mainly replaced the timer as follow and run the process as a RT progress(SCHED_FIFO).

setitimer()      ->    timer_create()/timer_settime()
gettimeofday()   ->    clock_gettime()

my testbed is i9-9900k CPU and PREEMPT-RT patched Linux with 5.0.21 kernel. The time interval of the timer is 1ms and the program run about 10 hours to generate the following result.

enter image description here

I also run Cyclictest(based on nanosleep()) on my machine, and it show better latency control (maximum latency less than 15us). So, In my opinion, if you you want to realize a high-resolution timer by your self, an standalone RT thread running nanosleep on an isolated core may be helpful. I am new in RT system, comments are welcome.

Lunch answered 30/10, 2019 at 15:21 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.