Is gettimeofday() guaranteed to be of microsecond resolution?
Asked Answered
D

10

106

I am porting a game, that was originally written for the Win32 API, to Linux (well, porting the OS X port of the Win32 port to Linux).

I have implemented QueryPerformanceCounter by giving the uSeconds since the process start up:

BOOL QueryPerformanceCounter(LARGE_INTEGER* performanceCount)
{
    gettimeofday(&currentTimeVal, NULL);
    performanceCount->QuadPart = (currentTimeVal.tv_sec - startTimeVal.tv_sec);
    performanceCount->QuadPart *= (1000 * 1000);
    performanceCount->QuadPart += (currentTimeVal.tv_usec - startTimeVal.tv_usec);

    return true;
}

This, coupled with QueryPerformanceFrequency() giving a constant 1000000 as the frequency, works well on my machine, and gives me a 64-bit variable that contains uSeconds since the program's start-up.

So is this portable? I don't want to discover it works differently if the kernel was compiled in a certain way or anything like that. I am fine with it being non-portable to something other than Linux, however.

Dol answered 1/8, 2008 at 14:36 Comment(0)
M
63

Maybe. But you have bigger problems. gettimeofday() can result in incorrect timings if there are processes on your system that change the timer (ie, ntpd). On a "normal" Linux, though, I believe the resolution of gettimeofday() is 10us. It can jump forward and backward and time, consequently, based on the processes running on your system. This effectively makes the answer to your question no.

You should look into clock_gettime(CLOCK_MONOTONIC) for timing intervals. It suffers from several fewer issues due to things like multi-core systems and external clock settings.

Also, look into the clock_getres() function.

Mantra answered 1/8, 2008 at 14:53 Comment(5)
clock_gettime is present only on newest Linux. other system have only gettimeofday()Taxiplane
@Taxiplane it's POSIX so it's not Linux only and 'newist'? even 'Enterprise' distros like Red Hat Enterprise Linux are based on 2.6.18 which has clock_gettime so no, not very new.. (manpage date in RHEL is 2004-March-12 so it's been around for a while) unless you're talking about REALLY FREAKING OLD kernels WTF do you mean?Hagberry
clock_gettime was included in POSIX in 2001. as far as I know currently clock_gettime() implemented in Linux 2.6 and qnx. but linux 2.4 is currently used in many production systems.Taxiplane
It was introduced in 2001, but not mandatory until POSIX 2008.Samples
From the Linux FAQ for lock_gettime (see David Schlosnagle's answer) "CLOCK_MONOTONIC...is frequency adjusted by NTP via adjtimex(). In the future (I'm still trying to get the patch in) there will be a CLOCK_MONOTONIC_RAW that will not be modified at all, and will have a linear correlation with the hardware counters." I don't think the _RAW clock ever made it into the kernel (unless it was renamed _HR, but my research suggests that efforts being abandonned too).Naturopathy
L
45

High Resolution, Low Overhead Timing for Intel Processors

If you're on Intel hardware, here's how to read the CPU real-time instruction counter. It will tell you the number of CPU cycles executed since the processor was booted. This is probably the finest-grained counter you can get for performance measurement.

Note that this is the number of CPU cycles. On linux you can get the CPU speed from /proc/cpuinfo and divide to get the number of seconds. Converting this to a double is quite handy.

When I run this on my box, I get

11867927879484732
11867927879692217
it took this long to call printf: 207485

Here's the Intel developer's guide that gives tons of detail.

#include <stdio.h>
#include <stdint.h>

inline uint64_t rdtsc() {
    uint32_t lo, hi;
    __asm__ __volatile__ (
      "xorl %%eax, %%eax\n"
      "cpuid\n"
      "rdtsc\n"
      : "=a" (lo), "=d" (hi)
      :
      : "%ebx", "%ecx");
    return (uint64_t)hi << 32 | lo;
}

main()
{
    unsigned long long x;
    unsigned long long y;
    x = rdtsc();
    printf("%lld\n",x);
    y = rdtsc();
    printf("%lld\n",y);
    printf("it took this long to call printf: %lld\n",y-x);
}
Lionfish answered 2/8, 2008 at 8:8 Comment(4)
Note that the TSC might not always be synchronized between cores, might stop or change its frequency when the processor enters lower power modes (and you have no way of knowing it did so), and in general is not always reliable. The kernel is able to detect when it is reliable, detect other alternatives like HPET and ACPI PM timer, and automatically select the best one. It's a good idea to always use the kernel for timing unless you are really sure the TSC is stable and monotonic.Liber
The TSC on Core and above Intel platforms is synchronized across multiple CPUs and increments at a constant frequency independent of power management states. See Intel Software Developer’s Manual, Vol. 3 Section 18.10. However the rate at which the counter increments is not the same as the CPU's frequency. The TSC increments at “the maximum resolved frequency of the platform, which is equal to the product of scalable bus frequency and maximum resolved bus ratio” Intel Software Developer’s Manual, Vol. 3 Section 18.18.5. You get those values from the CPU's model-specific registers (MSRs).Andri
You can obtain the scalable bus frequency and maximum resolved bus ratio by querying the CPU’s model-specific registers (MSRs) as follows: Scalable bus frequency == MSR_FSB_FREQ[2:0] id 0xCD, Maximum resolved bus ratio == MSR_PLATFORM_ID[12:8] id 0x17. Consult Intel SDM Vol.3 Appendix B.1 to interpret the register values. You can use the msr-tools on Linux to query the registers. kernel.org/pub/linux/utils/cpu/msr-toolsAndri
Shouldn't your code use CPUID again after the first RDTSC instruction and before executing the code being benchmarked? Otherwise, what's to stop the benchmarked code being executed before/in-parallel-with the first RDTSC, and consequently underrepresented in the RDTSC delta?Naturopathy
L
19

@Bernard:

I have to admit, most of your example went straight over my head. It does compile, and seems to work, though. Is this safe for SMP systems or SpeedStep?

That's a good question... I think the code's ok. From a practical standpoint, we use it in my company every day, and we run on a pretty wide array of boxes, everything from 2-8 cores. Of course, YMMV, etc, but it seems to be a reliable and low-overhead (because it doesn't make a context switch into system-space) method of timing.

Generally how it works is:

  • declare the block of code to be assembler (and volatile, so the optimizer will leave it alone).
  • execute the CPUID instruction. In addition to getting some CPU information (which we don't do anything with) it synchronizes the CPU's execution buffer so that the timings aren't affected by out-of-order execution.
  • execute the rdtsc (read timestamp) execution. This fetches the number of machine cycles executed since the processor was reset. This is a 64-bit value, so with current CPU speeds it will wrap around every 194 years or so. Interestingly, in the original Pentium reference, they note it wraps around every 5800 years or so.
  • the last couple of lines store the values from the registers into the variables hi and lo, and put that into the 64-bit return value.

Specific notes:

  • out-of-order execution can cause incorrect results, so we execute the "cpuid" instruction which in addition to giving you some information about the cpu also synchronizes any out-of-order instruction execution.

  • Most OS's synchronize the counters on the CPUs when they start, so the answer is good to within a couple of nano-seconds.

  • The hibernating comment is probably true, but in practice you probably don't care about timings across hibernation boundaries.

  • regarding speedstep: Newer Intel CPUs compensate for the speed changes and returns an adjusted count. I did a quick scan over some of the boxes on our network and found only one box that didn't have it: a Pentium 3 running some old database server. (these are linux boxes, so I checked with: grep constant_tsc /proc/cpuinfo)

  • I'm not sure about the AMD CPUs, we're primarily an Intel shop, although I know some of our low-level systems gurus did an AMD evaluation.

Hope this satisfies your curiosity, it's an interesting and (IMHO) under-studied area of programming. You know when Jeff and Joel were talking about whether or not a programmer should know C? I was shouting at them, "hey forget that high-level C stuff... assembler is what you should learn if you want to know what the computer is doing!"

Lionfish answered 4/8, 2008 at 0:51 Comment(2)
... The kernel people have been trying to get people to stop using rdtsc for a while... and generally avoid using it in the kernel because it's just that unreliable.Hagberry
For reference, the question I asked (In a separate reply -- before comments) was: "I have to admit, most of your example went straight over my head. It does compile, and seems to work, though. Is this safe for SMP systems or SpeedStep?"Dol
Q
14

You may be interested in Linux FAQ for clock_gettime(CLOCK_REALTIME)

Quinquevalent answered 18/8, 2008 at 15:51 Comment(0)
M
11

Wine is actually using gettimeofday() to implement QueryPerformanceCounter() and it is known to make many Windows games work on Linux and Mac.

Starts http://source.winehq.org/source/dlls/kernel32/cpu.c#L312

leads to http://source.winehq.org/source/dlls/ntdll/time.c#L448

Merlenemerlin answered 4/8, 2008 at 14:44 Comment(0)
I
9

The actual resolution of gettimeofday() depends on the hardware architecture. Intel processors as well as SPARC machines offer high resolution timers that measure microseconds. Other hardware architectures fall back to the system’s timer, which is typically set to 100 Hz. In such cases, the time resolution will be less accurate.

I obtained this answer from High Resolution Time Measurement and Timers, Part I

Infeld answered 1/8, 2008 at 14:55 Comment(0)
B
9

So it says microseconds explicitly, but says the resolution of the system clock is unspecified. I suppose resolution in this context means how the smallest amount it will ever be incremented?

The data structure is defined as having microseconds as a unit of measurement, but that doesn't mean that the clock or operating system is actually capable of measuring that finely.

Like other people have suggested, gettimeofday() is bad because setting the time can cause clock skew and throw off your calculation. clock_gettime(CLOCK_MONOTONIC) is what you want, and clock_getres() will tell you the precision of your clock.

Bronchitis answered 2/8, 2008 at 17:57 Comment(2)
So what happens in your code when gettimeofday() jumps forward or backward with daylight savings?Digraph
clock_gettime is present only on newest Linux. other system have only gettimeofday()Taxiplane
K
6

This answer mentions problems with the clock being adjusted. Both your problems guaranteeing tick units and the problems with the time being adjusted are solved in C++11 with the <chrono> library.

The clock std::chrono::steady_clock is guaranteed not to be adjusted, and furthermore it will advance at a constant rate relative to real time, so technologies like SpeedStep must not affect it.

You can get typesafe units by converting to one of the std::chrono::duration specializations, such as std::chrono::microseconds. With this type there's no ambiguity about the units used by the tick value. However, keep in mind that the clock doesn't necessarily have this resolution. You can convert a duration to attoseconds without actually having a clock that accurate.

Keven answered 26/6, 2012 at 15:57 Comment(0)
I
4

From my experience, and from what I've read across the internet, the answer is "No," it is not guaranteed. It depends on CPU speed, operating system, flavor of Linux, etc.

Infeld answered 1/8, 2008 at 14:46 Comment(0)
S
3

Reading the RDTSC is not reliable in SMP systems, since each CPU maintains their own counter and each counter is not guaranteed to by synchronized with respect to another CPU.

I might suggest trying clock_gettime(CLOCK_REALTIME). The posix manual indicates that this should be implemented on all compliant systems. It can provide a nanosecond count, but you probably will want to check clock_getres(CLOCK_REALTIME) on your system to see what the actual resolution is.

Schlep answered 18/8, 2008 at 15:40 Comment(1)
clock_getres(CLOCK_REALTIME) will not give the real resolution. It always return "1 ns" (one nanosecond) when hrtimers are available, check include/linux/hrtimer.h file for define HIGH_RES_NSEC 1 (more at https://mcmap.net/q/20954/-clock_getres-and-kernel-2-6)Counterpart

© 2022 - 2024 — McMap. All rights reserved.