I want to insert some time measurement into my code. On x64 I use __rdtscp. Is there something similar for the mac m1/m2? Specifically something that isn't a system call and high resolution.
Just use clock_gettime(CLOCK_MONOTONIC,...)
It is a VDSO
function. That means that the kernel injects code into the userspace program that "does the right thing", so the userspace program can access the time stamp counter without doing a syscall.
On x86, it will [usually] invoke rdtsc
[or a PET], and adjust the counter value to represent nanoseconds.
On arm, the TSC is a control register, accessible only in kernel mode. But, higher end arm arches allow this to be mapped for R/O access by userspace. The kernel enables the mapping. Then, the VDSO snippet will know how to access the values via the mapping.
Calls to clock_gettime
are fast. So fast that it's not worth trying to access the counter register directly.
Also, it's not terribly meaningful to access the counter directly, because we still have to convert it to some standard unit (e.g. nanoseconds). The VDSO snippet will do this.
UPDATE:
Is it a VDSO call on macOS, too? – fuz
My direct experience was with arm was on an nVidia Jetson [under linux].
But, AFAIK, macOS provides [has to provide] clock_gettime
.
On older kernels, it may have to issue a syscall equivalent.
But, since the architecture provides the means to do the direct access for userspace to a given OS/kernel, there is every reason to believe the VDSO method is available under macOS as well. In fact, it does: https://www.unix.com/man-page/osx/7/vdso/
The way to see the specific mechanism is to build a program that uses clock_gettime
and [using gdb
] single step it a bit. Then, it is possible to have gdb
disassemble the clock_gettime
code.
We have to use gdb
[vs. objdump
and/or readelf
] for the disassembly because the snippet is loaded/injected by the kernel dynamically, so it's not easily accessible with static analysis.
Further, the injected code can be processor model specific. The kernel probes the CPU arch and its features during boot. It crafts the snippet based on the features it finds.
Using gdb
is how I examined clock_gettime
[about 3 years ago for a commercial product], to verify that it would access the H/W without a syscall and that it provided the correct nanosecond values. In that particular case, I also looked at the arch specific sections in the kernel source code.
clock_gettime(CLOCK_MONOTONIC, ...)
. –
Figured clock_gettime
is the right solution. But it's not true that the time stamp counter is privileged. The CNTVCT_EL0
system register is, as its name suggests, readable from EL0. (It can be set to trap for virtualization, but based on speed tests, it doesn't look like either Linux or MacOS do so.) This is how Linux implements its userspace clock_gettime
, and I would not be surprised if MacOS does the same. So you could read the register directly from inline asm (or maybe there's an intrinsic?) if you wanted a raw tick count without the arithmetic overhead. –
Arriaga © 2022 - 2024 — McMap. All rights reserved.