__rdtsc/__rdtscp for ARM Mac M1/M2?

Just use clock_gettime(CLOCK_MONOTONIC,...)

It is a VDSO function. That means that the kernel injects code into the userspace program that "does the right thing", so the userspace program can access the time stamp counter without doing a syscall.

On x86, it will [usually] invoke rdtsc [or a PET], and adjust the counter value to represent nanoseconds.

On arm, the TSC is a control register, accessible only in kernel mode. But, higher end arm arches allow this to be mapped for R/O access by userspace. The kernel enables the mapping. Then, the VDSO snippet will know how to access the values via the mapping.

Calls to clock_gettime are fast. So fast that it's not worth trying to access the counter register directly.

Also, it's not terribly meaningful to access the counter directly, because we still have to convert it to some standard unit (e.g. nanoseconds). The VDSO snippet will do this.

UPDATE:

Is it a VDSO call on macOS, too? – fuz

My direct experience was with arm was on an nVidia Jetson [under linux].

But, AFAIK, macOS provides [has to provide] clock_gettime.

On older kernels, it may have to issue a syscall equivalent.

But, since the architecture provides the means to do the direct access for userspace to a given OS/kernel, there is every reason to believe the VDSO method is available under macOS as well. In fact, it does: https://www.unix.com/man-page/osx/7/vdso/

The way to see the specific mechanism is to build a program that uses clock_gettime and [using gdb] single step it a bit. Then, it is possible to have gdb disassemble the clock_gettime code.

We have to use gdb [vs. objdump and/or readelf] for the disassembly because the snippet is loaded/injected by the kernel dynamically, so it's not easily accessible with static analysis.

Further, the injected code can be processor model specific. The kernel probes the CPU arch and its features during boot. It crafts the snippet based on the features it finds.

Using gdb is how I examined clock_gettime [about 3 years ago for a commercial product], to verify that it would access the H/W without a syscall and that it provided the correct nanosecond values. In that particular case, I also looked at the arch specific sections in the kernel source code.

Recommended topics

Hot tags