QueryPerformanceCounter and overflows
Asked Answered
R

5

6

I'm using QueryPerformanceCounter to do some timing in my application. However, after running it for a few days the application seems to stop functioning properly. If I simply restart the application it starts working again. This makes me a believe I have an overflow problem in my timing code.

// Author: Ryan M. Geiss
// http://www.geisswerks.com/ryan/FAQS/timing.html
class timer
{
public:
    timer()
    {
        QueryPerformanceFrequency(&freq_);
        QueryPerformanceCounter(&time_);
    }

    void tick(double interval)
    {       
        LARGE_INTEGER t;
        QueryPerformanceCounter(&t);

        if (time_.QuadPart != 0)
        {
            int ticks_to_wait = static_cast<int>(static_cast<double>(freq_.QuadPart) * interval);
            int done = 0;
            do
            {
                QueryPerformanceCounter(&t);

                int ticks_passed = static_cast<int>(static_cast<__int64>(t.QuadPart) - static_cast<__int64>(time_.QuadPart));
                int ticks_left = ticks_to_wait - ticks_passed;

                if (t.QuadPart < time_.QuadPart)    // time wrap
                    done = 1;
                if (ticks_passed >= ticks_to_wait)
                    done = 1;

                if (!done)
                {
                    // if > 0.002s left, do Sleep(1), which will actually sleep some 
                    //   steady amount, probably 1-2 ms,
                    //   and do so in a nice way (cpu meter drops; laptop battery spared).
                    // otherwise, do a few Sleep(0)'s, which just give up the timeslice,
                    //   but don't really save cpu or battery, but do pass a tiny
                    //   amount of time.
                    if (ticks_left > static_cast<int>((freq_.QuadPart*2)/1000))
                        Sleep(1);
                    else                        
                        for (int i = 0; i < 10; ++i) 
                            Sleep(0);  // causes thread to give up its timeslice
                }
            }
            while (!done);            
        }

        time_ = t;
    }       
private:    
    LARGE_INTEGER freq_;
    LARGE_INTEGER time_;
};

My question is whether the code above should work deterministically for weeks of running continuously?

And if not where the problem is? I thought the overflow was handled by

if (t.QuadPart < time_.QuadPart)    // time wrap
    done = 1;

But maybe thats not enough?

EDIT: Please observe that I did not write the original code, Ryan M. Geiss did, the link to the original source of the code is in the code.

Rombert answered 14/3, 2011 at 9:48 Comment(9)
static_cast<int>((freq_.QuadPart*2)/1000)) - this will slice off anything beyond 32 bits. You'd rather expand ticks_left to 64 bits.Luanneluanni
Do you think thats a problem though? Shouldn't QueryPerformanceFrequency always fit into a 32bit integer?Rombert
No idea, I just find this code potentially erroneous.Luanneluanni
When do you think ticks_left would be larger than 32 bits?Rombert
Yea, that should probably be changed. I'm just trying to find out with some certainty whether it might be causing the problem or not. As it takes ~1 week of testing to verify it.Rombert
Why would you want to perform a (partially) busy-wait such as this?Antons
Too improve accuracy of the wait.Rombert
Because one of the requirements of my application is that it needs to "tick" (run a certain function) in precise intervals. This is due to a dependency on a third-party component.Rombert
The problem with ticks_left being 32 bit is not that the number of ticks can be too large to fit into 32 bits, the problem is the right side of the comparison is forcing a 64 bit value into 32 bits and this can yield a negative value and then you comparison will be evaluated wrong.Luanneluanni
L
17

QueryPerformanceCounter is notorious for its unreliability. It's fine to use for individual short-interval timing, if you're prepared to handle abnormal results. It is not exact - It's typically based on the PCI bus frequency, and a heavily loaded bus can lead to lost ticks.

GetTickCount is actually more stable, and can give you 1ms resolution if you've called timeBeginPeriod. It will eventually wrap, so you need to handle that.

__rdtsc should not be used, unless you're profiling and have control of which core you're running on and are prepared to handle variable CPU frequency.

GetSystemTime is decent for longer periods of measurements, but will jump when the system time is adjusted.

Also, Sleep(0) does not do what you think it does. It will yield the cpu if another context wants it - otherwise it'll return immediately.

In short, timing on windows is a mess. One would think that today it'd be possible to get accurate long-term timing from a computer without going through hoops - but this isn't the case. In our game framework we're using several time sources and corrections from the server to ensure all connected clients have the same game time, and there's a lot of bad clocks out there.

Your best bet would likely be to just use GetTickCount or GetSystemTime, wrap it into something that adjusts for time jumps/wrap arounds.

Also, you should convert your double interval to an int64 milliseconds and then use only integer math - this avoids problems due to floating point types' varying accuracy based on their contents.

Lapidify answered 14/3, 2011 at 10:6 Comment(2)
timeBeginPeriod() didn't seem to effect the resolution of GetTickCount() for me. I think you meant timeGetTime() instead of GetTickCount().Perfumery
timeBeginPeriod() do/did affect GetTickCount() resolution on XP, and I believe Vista. I haven't had a need for nor tested this since thenLapidify
A
6

Based on your comment, you probably should be using Waitable Timers instead.

See the following examples:

Antons answered 14/3, 2011 at 12:27 Comment(2)
From my understanding the accuracy of those are pretty bad?Rombert
@ronag: Probably not worse than Sleep(1) which may sleep a couple of ms.Antons
L
5

Performance counters are 64-bit, so they are large enough for years of running continuously. For example, if you assume the performance counter increments 2 billion times each second (some imaginary 2 GHz processor) it will overflow in about 290 years.

Luanneluanni answered 14/3, 2011 at 9:51 Comment(2)
This is my understanding as well. However, since the problem occurs always after ~1 week of running and this is the only code doing timing I suspected that there was an overflow somewhere which I've missed.Rombert
I suspect there might be an implicit convert-to-double somewhere hidden in your code. A double has only 52 bits of signed precision, so other than a int64_t (I say int64_t, not uint64_t, because LARGE_INTEGER is signed too, though funnily the tick counter is not) it would overflow every 8.6 days (~1 week) on a 3GHz machine.Goff
G
4

Using a nanosecond-scale timer to control something like Sleep() that at best is precise to several milliseconds (and usually, several dozen milliseconds) is somewhat controversary anyway.

A different approach you might consider would be to use WaitForSingleObject or a similar function. This burns less CPU cycles, causes a trillion fewer context switches over the day, and is more reliable than Sleep(0), too.

You could for example create a semapore and never touch it in normal operation. The semaphore exists only so you can wait on something, if you don't have anything better to wait on. Then you can specify a timeout in milliseconds up to 49 days long with a single syscall. And, it will not only be less work, it will be much more accurate too.

The advantage is that if "something happens", so you want to break up earlier than that, you only need to signal the semaphore. The wait call will return instantly, and you will know from the WAIT_OBJECT_0 return value that it was due to being signaled, not due to time running out. And all that without complicated logic and counting cycles.

Goff answered 14/3, 2011 at 12:55 Comment(5)
Sleep is quite accurate if you use beginTimePeriod(1)Rombert
Sleep is at best accurate to 1ms, if as you pointed out, you use beginTimePeriod(1). However, this comes at the cost of a lot more context switches, in every application on the system. Sleep(0) on the other hand, is not accurate regardless of beginTimePeriod. It will give up its time slice if there is another thread wanting some, and will immediately return otherwise. And, sometimes, it just won't return for 50, 100, or 200 milliseconds at all, for no obvious reason. WaitForSingleObject has no such quirks. Code like: for(int i = 0; i < 10; ++i) Sleep(0); is totally unpredictable.Goff
But, even assuming that Sleep worked perfectly accurate within 1ms resolution, my point remains. It is not much use to use a nanosecond-scale timer to account for it, if 1ms is the best you can theoretically ever hope to aim at. In that case, it's better to just count in milliseconds. The numbers are smaller, it's easier, and it does not depend on CPU frequencies, power saving, and so on.Goff
So what is the accuracy of WaitForSingleObject with beginTimePeriod(1)? Is it around +/- 1 ms? Would it be deterministic when running for a long time?Rombert
It will be the granularity of the scheduler, so yes... with beginTimePeriod(1) it is 1ms. Plus, it can be woken up by signalling the object you wait on, which is a big plus. It means that if you want to block 5 seconds in one go, you don't have 5,000 function calls, but just one. And, you have an accuracy of roughly 5,001 +/- 1ms, instead of 5000*(2 +/- 1) ms, and you have 2 context switches instead of 10,000. And you can always abort early in a perfectly clean way by signalling the object.Goff
B
2

The problem you asked about most directly: if (t.QuadPart < time_.QuadPart) should instead be this: if (t.QuadPart - time_.QuadPart < 0)

The reason for that is that you want to look for wrapping in relative time, not absolute time. Relative time will wrap (1ull<<63) time units after the reference call to QPC. Absolute time might wrap (1ull<<63) time units after reboot, but it could wrap at any other time it felt like it, that's undefined.

QPC is a little bugged on some systems (older RDTSC-based QPCs on early multicore CPUs, for instance) so it may be desirable to allow small negative time deltas like so: if (t.QuadPart - time_.QuadPart < -1000000) //time wrap

An actual wrap will produce a very large negative time deltas, so that's safe. It shouldn't be necessary on modern systems, but trusting microsoft is rarely a good idea.

... However, the bigger problem there with time wrapping is in the fact that ticks_to_wait, ticks_passed, and ticks_left are all int, not LARGE_INT or long long like they should be. This makes most of that code wrap if any significant time periods are involved - and "significant" in this context is platform dependent, it can be on the order of 1 second in a few (rare these days) cases, or even less on some hypothetical future system.

Other issues:

if (time_.QuadPart != 0)

Zero is not a special value there, and should not be treated as such. My guess is that the code is conflating QPC returning a time of zero with QPCs return value being zero. The return value is not the 64 bit time passed by pointer, it's the BOOL that QPC actually returns.

Also, that loop of Sleep(0) is foolish - it appears to be tuned to behave correctly only on a particular level of contention and a particular per-thread CPU performance. If you need resolution that's a horrible idea, and if you don't need resolution then that entire function should have just been a single call to Sleep.

Barnstorm answered 15/4, 2014 at 12:4 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.