"cpuid" before "rdtsc"

Asked 27/5, 2010 at 2:29 Answered 16/1, 2019 at 9:49

Sometimes I encounter code that reads TSC with rdtsc instruction, but calls cpuid right before.

Why is calling cpuid necessary? I realize it may have something to do with different cores having TSC values, but what exactly happens when you call those two instructions in sequence?

Osterman answered 27/5, 2010 at 2:29 Comment(1)

In addition to paxdiablo's answer, note that even a single core like Pentium Pro, II and III can do out of order execution. Chapter 6 from Agner-Fog/microarchitecture – Mixtec 15/1, 2019 at 14:21

It's to prevent out-of-order execution. From a link that has now disappeared from the web (but which was fortuitously copied here before it disappeared), this text is from an article entitled "Performance monitoring" by one John Eckerdal:

The Pentium Pro and Pentium II processors support out-of-order execution instructions may be executed in another order as you programmed them. This can be a source of errors if not taken care of.

To prevent this the programmer must serialize the the instruction queue. This can be done by inserting a serializing instruction like CPUID instruction before the RDTSC instruction.

Deuce answered 27/5, 2010 at 2:32 Comment(4)

Shouldn't really remove all attribution from the quoted text, now it looks like you're plagiarizing. – Mope 15/1, 2019 at 20:22

@Ross, the link the text came from has disappeared from the net, so the choice is between broken link and non-attribution. If you have a suggestion, I'm willing to hear it. It seems counter-productive to attribute to something that doesn't exist. Hopefully the edit addresses the plagiarism issue. – Deuce 16/1, 2019 at 19:23

That why links don't really count as attribution. A proper attribution would have included the author's name (John Eckerdal) and the title or description of whatever you copied the text from (apparently called "Performance monitoring", but also "gem0029"). Fortunately the Wayback Machine has the page you originally linked to archive so I was able to find this information. web.archive.org/web/20160602175959/http://dflund.se:80/~john_e/… – Mope 16/1, 2019 at 21:3

Good find, Ross, I've added a reference to the article though not the actual Wayback link. I'm hesitant to add another link so I will just rely on the textual description. – Deuce 17/1, 2019 at 7:13

Two reasons:

As paxdiablo says, when the CPU sees a CPUID opcode it makes sure all the previous instructions are executed, then the CPUID taken, before any subsequent instructions execute. Without such an instruction, the CPU execution pipeline may end up executing TSC before the instruction(s) you'd like to time.
A significant proportion of machines fail to synchronise the TSC registers across cores. In you want to read it from a horse's mouth - knock yourself out at http://msdn.microsoft.com/en-us/library/ee417693%28VS.85%29.aspx. So, when measuring an interval between TSC readings, unless they're taken on the same core you'll have an effectively random but possibly constant (see below) interval introduced - it can easily be several seconds (yes seconds) even soon after bootup. This effectively reflects how long the BIOS was running on a single core before kicking off the others, plus - if you've any nasty power saving options on - increasing drift caused by cores running at different frequencies or shutting down again. So, if you haven't nailed the threads reading TSC registers to the same core then you'll need to build some kind of cross-core delta table and know the core id (which is returned by CPUID) of each TSC sample in order to compensate for this offset. That's another reason you can see CPUID alongside RDTSC, and indeed a reason why with newer RDTSCP many OSes are storing core id numbers into the extra TSC_AUX[31:0] data returned. (Available from Core i7 and Athlon 64 X2, RDTSCP is a much better option in all respects - the OS normally gives you the core id as mentioned, atomic to the TSC read, and prevent instruction reordering).

Walsh answered 8/6, 2012 at 6:53 Comment(0)

CPUID is serializing, preventing out-of-order execution of RDTSC.

These days you can safely use LFENCE instead. It's documented as serializing on the instruction stream (but not stores to memory) on Intel CPUs, and now also on AMD after their microcode update for Spectre.

https://hadibrais.wordpress.com/2018/05/14/the-significance-of-the-x86-lfence-instruction/ explains more about LFENCE.

See also https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/ia-32-ia-64-benchmark-code-execution-paper.pdf for a way to use RDTSCP that keeps CPUID (or LFENCE) out of the timed region:

LFENCE     ; (or CPUID) Don't start the timed region until everything above has executed
RDTSC           ; EDX:EAX = timestamp
mov  ebx, eax   ; low 32 bits of start time

   code under test

RDTSCP     ; built-in one way barrier stops it from running early
LFENCE     ; (or CPUID) still use a barrier after to prevent anything weird
sub  eax, ebx   ; low 32 bits of end-start

See also Get CPU cycle count? for more about RDTSC caveats, like constant_tsc and nonstop_tsc.

As a bonus, RDTSCP gives you a core ID. You could use RDTSCP for the start time as well, if you want to check for core migration. But if your CPU has the constant_tsc features, all cores in the package should have their TSCs synced so you typically don't need this on modern x86.

You could get the core ID from CPUID instead, as @Tony's answer points out.

Selfsacrifice answered 16/1, 2019 at 9:49 Comment(0)

Recommended topics

Hot tags