Counting cycles, yes, but cycles of what?
On a modern x86, the timesource used by the kernel (internally and for clock_gettime
and other system calls) is typically a timer interrupt, or hardware timer (e.g. HPET) that's read occasionally. (I actually don't know the details; when I wrote this I thought everything was just based on rdtsc
but I don't think that's correct.) If networking is available, often NTP is used to correct the scale factors to keep the system time correct.
Fine-grained timing comes from a fixed-frequency counter that counts "reference cycles" regardless of turbo, power-saving, or clock-stopped idle. (This is the counter you get from rdtsc
, or __rdtsc()
in C/C++, see this for more details, e.g. that on older CPUs it actually did count core clock cycles, and didn't tick during sleep states, so was less useful for wall-clock time.)
Normal std::chrono
implementations will use an OS-provided function like POSIX clock_gettime
.
On Linux, this can run purely in user-space. Code + data in a VDSO page are mapped by the kernel into every process's address space. The data includes coarse timestamps updated by timer interrupts (CLOCK_REALTIME_COARSE
or CLOCK_MONOTONIC_COARSE
just return these directly, I think), plus an offset and scale factor for using the TSC to get a fine-grained offset from the last tick of the system clock. Low-overhead timesources are nice. Avoiding a user->kernel->user round trip helps a lot, even moreso with Meltdown + Spectre mitigation enabled because that makes true system calls even more expensive.
Profiling a tight loop that's not memory bound might want to use actual core clock cycles, so it will be insensitive to the actual speed of the current core. (And doesn't have to worry about ramping up the CPU to max turbo, etc.) e.g. using perf stat ./a.out
or perf record ./a.out
. e.g. Can x86's MOV really be "free"? Why can't I reproduce this at all?
Some systems didn't / don't have a wall-clock-equivalent counter built right in to the CPU, so you only had a coarse time available, updated in RAM on timer interrupts. Or time-query functions would read the time from a separate chip, possibly with high precision.
(System call + hardware I/O = higher overhead, which is part of the reason that x86's rdtsc
instruction morphed from a profiling thing into a clocksource thing.)
All of these clock frequencies are ultimately derived from a crystal oscillator on the mobo. But the scale factors to extrapolate time from cycle counts can be adjusted to keep the clock in sync with atomic time, typically using the Network Time Protocol (NTP), as @Tony points out.