Heres an interesting article! says not to use RDTSC, but to instead use QueryPerformanceCounter.
Conclusion:
Using regular old timeGetTime()
to do
timing is not reliable on many
Windows-based operating systems
because the granularity of the system
timer can be as high as 10-15
milliseconds, meaning that
timeGetTime()
is only accurate to
10-15 milliseconds. [Note that the
high granularities occur on NT-based
operation systems like Windows NT,
2000, and XP. Windows 95 and 98 tend
to have much better granularity,
around 1-5 ms.]
However, if you call
timeBeginPeriod(1)
at the beginning of
your program (and timeEndPeriod(1)
at
the end), timeGetTime()
will usually
become accurate to 1-2 milliseconds,
and will provide you with extremely
accurate timing information.
Sleep()
behaves similarly; the length
of time that Sleep()
actually sleeps
for goes hand-in-hand with the
granularity of timeGetTime()
, so after
calling timeBeginPeriod(1)
once,
Sleep(1)
will actually sleep for 1-2
milliseconds,Sleep(2)
for 2-3, and so
on (instead of sleeping in increments
as high as 10-15 ms).
For higher precision timing
(sub-millisecond accuracy), you'll
probably want to avoid using the
assembly mnemonic RDTSC because it is
hard to calibrate; instead, use
QueryPerformanceFrequency
and
QueryPerformanceCounter
, which are
accurate to less than 10 microseconds
(0.00001 seconds).
For simple timing, both timeGetTime
and QueryPerformanceCounter work well,
and QueryPerformanceCounter is
obviously more accurate. However, if
you need to do any kind of "timed
pauses" (such as those necessary for
framerate limiting), you need to be
careful of sitting in a loop calling
QueryPerformanceCounter, waiting for
it to reach a certain value; this will
eat up 100% of your processor.
Instead, consider a hybrid scheme,
where you call Sleep(1) (don't forget
timeBeginPeriod(1) first!) whenever
you need to pass more than 1 ms of
time, and then only enter the
QueryPerformanceCounter 100%-busy loop
to finish off the last < 1/1000th of a
second of the delay you need. This
will give you ultra-accurate delays
(accurate to 10 microseconds), with
very minimal CPU usage. See the code
above.