I'm trying to determine the granularity I can accurately schedule tasks to occur in C/C++. At the moment I can reliably schedule tasks to occur every 5 microseconds, but I'm trying to see if I can lower this further.
Any advice on how to achieve this / if it is possible would be greatly appreciated.
Since I know timer granularity can often be OS dependent: I am currently running on Linux, but would use Windows if the timing granularity is better (although I don't believe it is, based on what I've found for the QueryPerformanceCounter)
I execute all measurements on bare-metal (no VM). /proc/timer_info
confirms nanosecond timer resolution for my CPU (but I know that doesn't translate to nanosecond alarm resolution)
Current
My current code can be found as a Gist here
At the moment, I'm able to execute a request every 5 microseconds (5000 nanoseconds) with less then 1% late arrivals. When late arrivals do occur, they are typically only one cycle (5000 nanoseconds) behind.
I'm doing 3 things at the moment
Setting the process to real-time priority (some pointed out by @Spudd86 here)
struct sched_param schedparm;
memset(&schedparm, 0, sizeof(schedparm));
schedparm.sched_priority = 99; // highest rt priority
sched_setscheduler(0, SCHED_FIFO, &schedparm);
Minimizing the timer slack
prctl(PR_SET_TIMERSLACK, 1);
Using timerfds (part of the 2.6 Linux kernel)
int timerfd = timerfd_create(CLOCK_MONOTONIC,0);
struct itimerspec timspec;
bzero(&timspec, sizeof(timspec));
timspec.it_interval.tv_sec = 0;
timspec.it_interval.tv_nsec = nanosecondInterval;
timspec.it_value.tv_sec = 0;
timspec.it_value.tv_nsec = 1;
timerfd_settime(timerfd, 0, &timspec, 0);
Possible improvements
- Dedicate a processor to this process?
- Use a nonblocking timerfd so that I can create a tight loop, instead of blocking (tight loop will waste more CPU, but may also be quicker to respond to an alarm)
- Using an external embedded device for triggering (can't imagine why this would be better)
Why
I'm currently working on creating a workload generator for a benchmarking engine. The workload generator simulates an arrival rate (X requests / second, etc.) using a Poisson process. From the Poisson process, I can determine the relative times at which requests must be made from the benchmarking engine.
So for instance, at 10 requests a second, we may have requests made at: t = 0.02, 0.04, 0.05, 0.056, 0.09 seconds
These requests need to be scheduled in advance and then executed. As the number of requests per second increases, the granularity required for scheduling these requests increases (thousands of requests per second requires sub-millisecond accuracy). As a result, I'm trying to figure out how to scale this system further.