std::this_thread::sleep_for() and nanoseconds
Asked Answered
D

3

19

If I put two calls side-by-side to determine the smallest measurable time duration:

// g++ -std=c++11 -O3 -Wall test.cpp
#include <chrono>
typedef std::chrono::high_resolution_clock hrc;

hrc::time_point start = hrc::now();
hrc::time_point end   = hrc::now();
std::chrono::nanoseconds duration = end - start;
std::cout << "duration: " << duration.count() << " ns" << std::endl;

I've run this thousands of times in a loop, and I consistently get 40 ns +/- 2 ns on my particular 3.40GHz desktop.

However, when I look to see what is the shortest time I can sleep:

#include <thread>

hrc::time_point start = hrc::now();
std::this_thread::sleep_for( std::chrono::nanoseconds(1) );
hrc::time_point end   = hrc::now();
std::chrono::nanoseconds duration = end - start;
std::cout << "slept for: " << duration.count() << " ns" << std::endl;

This tells me I slept on average 55400 nanoseconds, or 55.4 microseconds. Much greater than the time I expected.

Putting the above code into a for() loop, I tried sleeping for different amounts, and this is the result:

  • sleep_for( 4000 ns ) => slept for 58000 ns
  • sleep_for( 3000 ns ) => slept for 57000 ns
  • sleep_for( 2000 ns ) => slept for 56000 ns
  • sleep_for( 1000 ns ) => slept for 55000 ns
  • sleep_for( 0 ns ) => slept for 54000 ns
  • sleep_for( -1000 ns ) => slept for 313 ns
  • sleep_for( -2000 ns ) => slept for 203 ns
  • sleep_for( -3000 ns ) => slept for 215 ns
  • sleep_for( -4000 ns ) => slept for 221 ns

Some questions I have:

  • What could explain these numbers?
  • Why does sleeping for a negative amount of time return 200+ ns, while sleeping for 0+ nanoseconds results in 50,000+ nanoseconds?
  • Is negative numbers as a sleep time a documented/supported feature, or did I accidentally stumble across some strange bug I cannot rely upon?
  • Is there a better C++ sleep call which would give me more consistent/predictable sleep times?
Draughtsman answered 6/8, 2013 at 4:9 Comment(5)
That's just how long nanosleep({0,1}, NULL) takes (if you have linux)Farrica
I can't replicate these results, sleep_for(1ns) gives me 0ns.Catalinacatalo
sleep_for sleeps for at least the specified duration. If you provide a negative value, it doesn't have to sleep at all. sleep is a niche tool, though. You should probably be using some sort of timer mechanism (sad standard C++ doesn't have any)Symbol
Note that your negative sleep_for is measuring the overhead of the sleep_for call: you can't expect it to take zero time, because it has to check if it should take zero time, which takes more than zero time! Remember, however, that if you sleep for negative time, it is conforming to sleep for 7 years and 7 months. It is always conforming to sleep for longer than requested. If you need ridiculously small sleep times, you need to busy-loop, because cpu-saving sleep times end up waiting for an interrupt while other code runes...Antilog
In the Linux kernel, a userspace process calling nanosleep() will trigger on at the next scheduler wake interval, usually a 'jiffy', unless the process is flagged realtime priority. See elixir.free-electrons.com/linux/latest/source/kernel/time/…Robbirobbia
B
16

What could explain these numbers?

There's a pretty obvious pattern, all your results are consistently 54000ns greater than the time you request to sleep. If you look at how GCC's this_thread::sleep_for() is implemented on GNU/Linux you'll see it just uses nanospleep and as Cubbi's comment says, calling that function can take around 50000ns. I would guess some of that cost is making a system call, so switching from user-space to the kernel and back.

Why does sleeping for a negative amount of time return 200+ ns, while sleeping for 0+ nanoseconds results in 50,000+ nanoseconds?

At a guess I'd say that the C library checks for the negative number and doesn't make a system call.

Is negative numbers as a sleep time a documented/supported feature, or did I accidentally stumble across some strange bug I cannot rely upon?

The standard doesn't forbid passing negative arguments, so it is allowed, and the function should return "immediately" because the time specified by the relative timeout has already passed. You can't rely on negative arguments returning faster than non-negative arguments though, that's an artefact of your specific implementation.

Is there a better C++ sleep call which would give me more consistent/predictable sleep times?

I don't think so - if I knew of one then we'd be using it in GCC to implement this_thread::sleep_for().

Edit: In more recent versions of GCC's libstdc++ I added:

if (__rtime <= __rtime.zero())
  return;

so there will be no system call when a zero or negative duration is requested.

Botel answered 6/8, 2013 at 16:59 Comment(1)
libstdc++ implements this_thread::sleep_for with nanosleep, and nanosleep returns EINVAL (presumably immediately) if tv_sec is negative.Forsythe
L
6

Inspired by Straight Fast’s answer I evaluated the effects of timer_slack_ns and of SCHED_FIFO. For timer_slack_ns you have to add

#include <sys/prctl.h> // prctl
⋮
prctl (PR_SET_TIMERSLACK, 10000U, 0, 0, 0);

meaning that for the current process the timer slack shall be set to 10µs instead of the default value of 50µs. The effect is a better responsiveness at the expense of a slightly higher energy consumption. The process can still run by a non-priviledged user. To change the scheduler policy to SCHED_FIDO you must be “root”. The code required is

#include <unistd.h>    // getpid
#include <sched.h>     // sched_setscheduler
⋮
    const pid_t pid {getpid ()};
    struct sched_param sp = {.sched_priority = 90};
    if (sched_setscheduler (pid, SCHED_FIFO, &sp) == -1) {
        perror ("sched_setscheduler");
        return 1;
    }

I ran Stéphane’s code snippets on a Desktop system with GUI (Debian 9.11, kernel 4.9.189-3+deb9u2, g++ 9.2 -O3, Intel® Core™ i5-3470T CPU @ 2.90GHz). The results for the first case (subsequent time measurements) are

Because there is no system call in between, the delay is about 260ns and is not significantly effected by the process settings. For normally distributed timings the graphs are straight lines with abscissa value for the ordinate value of 0.5 being the mean and the slope representing the standard deviation. The measured values differ from that in that there are outliers for higher delays.

In contrast to that, the second case (sleeping one nanosecond) differs between process setups because it contains system calls. Because the sleep time is so small, the sleeping does not add any time. Therefore, the graphs show the overhead only:

As stated by Stéphane, the overhead defaults to about 64µs (It’s a bit bigger here.). The time can be reduced to about 22µs by lowering the timer_slack_ns to 10µs. And by invoking the priviledged sched_setscheduler() the overhead can be cut down to about 12µs. But as the graph shows, even in this case the delay can be longer than 50µs (in 0.0001% of the runs).

The measurements show the basic dependencies of the overhead from the process settings. Other measurements have shown that the fluctuations are lower by more than a order of magnitude on non-GUI XEON server systems.

Lamented answered 10/2, 2020 at 15:20 Comment(0)
E
3

in kernel init/init_task.c in struct task_struct init_task defined param

.timer_slack_ns = 50000, /* 50 usec default slack */

which added to non RT processes in hrtimer_nanosleep() kernel function to make timer's hardirqs less often.

Eonism answered 16/4, 2019 at 4:32 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.