std::chrono::clock, hardware clock and cycle count

Asked 15/6, 2018 at 23:41 Answered 16/6, 2018 at 1:49

Solved c++time cpu benchmarking c++-chrono

std::chrono offer several clocks to measure times. At the same time, I guess the only way a cpu can evaluate time, is by counting cycles.

Question 1: Does a cpu or a gpu has any other way to evaluate time than by counting cycles?

If that is the case, because the way a computer count cycles will never be as precise as an atomic clock, it means that a "second" (period = std::ratio<1>) for a computer can be actually shorter or bigger than an actual second, causing differences in the long run for time measurements between the computer clock and let's say GPS.

Question 2: Is that correct?

Some hardware have varying frequencies (for example idle mode, and turbo modes). In that case, it would mean that the number of cycles would vary during a second.

Question 3: Is the "cycle count" measured by cpu and gpus varying depending on the hardware frequency? If yes, then how std::chrono deal with it? If not, what does a cycle correspond to (like what is the "fundamental" time)? Is there a way to access the conversion at compile-time? Is there a way to access the conversion at runtime?

Phthisic answered 15/6, 2018 at 23:41 Comment(2)

superuser.com/questions/253471/… – Fertility 15/6, 2018 at 23:56

Fwiw, all modern time pieces work by counting a regularly occurring event. This trend started in 1656 with the first pendulum clock that "counted" the swings of an oscillating pendulum. In time this would change what was counted to quartz crystal vibrations and ultimately to atomic vibrations. But the underlying "measure time by counting" methodology has remained constant for centuries now. EXCEPT: The latest advance is to have one clock ask another group of clocks what time it is, have a conversation about it, and converge upon a consensus of the correct time. E.g. this is NTP. – Scrawly 16/6, 2018 at 3:1

Counting cycles, yes, but cycles of what?

On a modern x86, the timesource used by the kernel (internally and for clock_gettime and other system calls) is typically a timer interrupt, or hardware timer (e.g. HPET) that's read occasionally. (I actually don't know the details; when I wrote this I thought everything was just based on rdtsc but I don't think that's correct.) If networking is available, often NTP is used to correct the scale factors to keep the system time correct.

Fine-grained timing comes from a fixed-frequency counter that counts "reference cycles" regardless of turbo, power-saving, or clock-stopped idle. (This is the counter you get from rdtsc, or __rdtsc() in C/C++, see this for more details, e.g. that on older CPUs it actually did count core clock cycles, and didn't tick during sleep states, so was less useful for wall-clock time.)

Normal std::chrono implementations will use an OS-provided function like POSIX clock_gettime.

On Linux, this can run purely in user-space. Code + data in a VDSO page are mapped by the kernel into every process's address space. The data includes coarse timestamps updated by timer interrupts (CLOCK_REALTIME_COARSE or CLOCK_MONOTONIC_COARSE just return these directly, I think), plus an offset and scale factor for using the TSC to get a fine-grained offset from the last tick of the system clock. Low-overhead timesources are nice. Avoiding a user->kernel->user round trip helps a lot, even moreso with Meltdown + Spectre mitigation enabled because that makes true system calls even more expensive.

Profiling a tight loop that's not memory bound might want to use actual core clock cycles, so it will be insensitive to the actual speed of the current core. (And doesn't have to worry about ramping up the CPU to max turbo, etc.) e.g. using perf stat ./a.out or perf record ./a.out. e.g. Can x86's MOV really be "free"? Why can't I reproduce this at all?

Some systems didn't / don't have a wall-clock-equivalent counter built right in to the CPU, so you only had a coarse time available, updated in RAM on timer interrupts. Or time-query functions would read the time from a separate chip, possibly with high precision.

(System call + hardware I/O = higher overhead, which is part of the reason that x86's rdtsc instruction morphed from a profiling thing into a clocksource thing.)

All of these clock frequencies are ultimately derived from a crystal oscillator on the mobo. But the scale factors to extrapolate time from cycle counts can be adjusted to keep the clock in sync with atomic time, typically using the Network Time Protocol (NTP), as @Tony points out.

Genoese answered 16/6, 2018 at 0:1 Comment(0)

Question 1: Does a cpu or a gpu has any other way to evaluate time than by counting cycles?

Different hardware may provide different facilities. For example, x86 PCs have employed several hardware facilities for timing: for the last decade or so x86 CPUs have Time Stamp Counters operating at their processing frequency or - more recently - some fixed frequency (a "constant rate" aka "invariant" TSC); there may be a High Precision Event Timer, and going back further there were Programmable Interrupt Timers (https://en.wikipedia.org/wiki/Programmable_interval_timer).

If that is the case, because the way a computer count cycles will never be as precise as an atomic clock, it means that a "second" (period = std::ratio<1>) for a computer can be actually shorter or bigger than an actual second, causing differences in the long run for time measurements between the computer clock and let's say GPS.

Yes, a computer without an atomic clock (they're now available on a chip) isn't going to be as accurate as an atomic clock. That said, services such as Network Time Protocol allow you to maintain tighter coherence across a bunch of computers. It is sometimes aided by use of Pulse Per Second (PPS) techniques. More modern and accurate variants include Precision Time Protocol (PTP) (which can often achieve sub-microsecond accuracy across a LAN).

Question 3: Is the "cycle count" measured by cpu and gpus varying depending on the hardware frequency?

That depends. For TSC, newer "constant rate" TSC implementations don't vary, others do vary.

If yes, then how std::chrono deal with it?

I'd expect most implementations to call an OS provided time service, as the OS tends to have best knowledge of and access to the hardware. There are a lot of factors that need to be considered - e.g. whether the TSC readings are in sync across cores, what happens if the PC goes into some kind of sleep mode, what manner of memory fences are desirable around the TSC sampling....

If not, what does a cycle correspond to (like what is the "fundamental" time)?

For Intel CPUs, see this answer.

Is there a way to access the conversion at compile-time? Is there a way to access the conversion at runtime?

std::chrono::duration::count exposes raw tick counts for whatever time source was used, and you can duration_cast to other units of time (e.g. seconds). C++20 is expected to introduce further facilities like clock_cast. AFAIK, there's no constexpr conversion available: seems dubious too if a program might end up running on a machine with a different TSC rate than the machine it was compiled on.

Seventeen answered 16/6, 2018 at 1:49 Comment(0)

Recommended topics

Hot tags