Why should I use 'rdtsc' differently on x86 and x86_x64?
Asked Answered
F

2

19

I know that rdtsc loads the current value of the processor's time-stamp counter into the two registers: EDX and EAX. In order to get it on x86 I need to do it like that (assuming using Linux):

    unsigned long lo, hi;
    asm( "rdtsc" : "=a" (lo), "=d" (hi));
    return lo;

and for x86_x64:

        unsigned long lo, hi;
        asm( "rdtsc" : "=a" (lo), "=d" (hi) ); 
        return( lo | (hi << 32) );

why is that? Can anybody explain it to me?

Fixed answered 1/7, 2013 at 10:10 Comment(2)
Those definitions are missing volatile on the asm; they're not safe for timing if the compiler can see the start and end. I wonder if that's intentional in Linux because they're never using it for microbenchmarking inside the kernel? But IDK where in Linux it could usefully CSE.Maugham
TL:DR: you shouldn't use it differently. See Get CPU cycle count? for asm that works on both 32 and 64-bit. (And my answer which shows how to use the __rdtsc() intrinsic instead).Maugham
S
13

RDTSC always writes its 64-bit result split into hi/lo halves in EDX and EAX, even in 64-bit mode (see the manual), unfortunately not packing the 64-bit TSC into just RAX. That's why extra work is needed after the asm statement.

To make a single 64-bit integer from it, you need to shift hi to the place it belongs as part of an unsigned long. lo is already in the right place, and writing those 32-bit register zeroed the upper bits of both registers, so we can just OR the (shifted) halves together without having to AND the low half.

In x86-64 Linux, unsigned long is a 64-bit type so the kernel actually uses both halves of the RDTSC return value.

The only reason the 32-bit version is simpler is that the kernel is truncating the result to 32-bit by throwing away the high half. If you do want a 64-bit TSC in 32-bit mode, the same C source works there, too (with uint64_t or unsigned long long), although it wouldn't compile to shift and OR instructions. The compiler would just know that it has a 64-bit integer whose halves are in EDX and EAX.

See also How to get the CPU cycle count in x86_64 from C++? - and for real use, don't forget to make these asm volatile. Otherwise the compiler can assume that repeated executions of this produce the same output, e.g. end-start = 0 after optimization.

Spiritoso answered 1/7, 2013 at 10:36 Comment(2)
So am I absolutely right, that rdtsc loads the current value of the processor's time-stamp counter into the two registers: EDX and EAX, not into registers from EAX to EDX? (EAX, EBX, ECX, EDX)Fixed
rdtsc always return 64 bit value so for 32 bit machine it stores into EDX and EAX and yes you are right.Spiritoso
A
10

The difference is not in rdtsc, but in what the Linux kernel wants to do with it.

In 32bit, it returns a 32bit value. So the value in eax is good enough.
In 64bit, it returns a 64bit value. So it needs to combine the values from both registers.

Ardine answered 1/7, 2013 at 11:10 Comment(6)
According to the gcc doc, even in 32bit OSes, 64bit value is returned for __asm__ __volatile__("rdtsc":"=A"(tick)). Search for rdtscin the referenced link.Famine
@wlnirvana, this would be a good way to get a 64bit timestamp on 32bit. But Linux chose to use long, so only 32 bits are needed.Ardine
@Ardine could you please elaborate on "Linux chose to use long"?Famine
@wlnirvana, the function returns long, which is 32bit on a 32bit system. This is how the developers chose to define it.Ardine
You mean the function the OP used which returns lo? Yes that is of course 32bit. But if the style in the gcc doc is used, I think 64bit value would be returned?Famine
@wlnirvana: return( lo | ((uint64_t)hi << 32) ) works in both modes. No reason to mess around with "=A". See Mysticial's answer on Get CPU cycle count?.Maugham

© 2022 - 2024 — McMap. All rights reserved.