I am writing a C code for measuring the number of clock cycles needed to acquire a semaphore. I am using rdtsc, and before doing the measurement on the semaphore, I call rdtsc two consecutive times, to measure the overhead. I repeat this many times, in a for-loop, and then I use the average value as rdtsc overhead.
Is this correct, to use the average value, first of all?
Nonetheless, the big problem here is that sometimes I get negative values for the overhead (not necessarily the averaged one,but at least the partial ones inside the for loop).
This also affects the consecutive calculation of the number of cpu cycles needed for the sem_wait()
operation, which sometimes also turns out to be negative. If what I wrote is not clear, here there's a part of the code I am working on.
Why am I getting such negative values?
(editor's note: see Get CPU cycle count? for a correct and portable way of getting the full 64-bit timestamp. An "=A"
asm constraint will only get the low or high 32 bits when compiled for x86-64, depending on whether register allocation happens to pick RAX or RDX for the uint64_t
output. It won't pick edx:eax
.)
(editor's 2nd note: oops, that's the answer to why we're getting negative results. Still worth leaving a note here as a warning not to copy this rdtsc
implementation.)
#include <semaphore.h>
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <inttypes.h>
static inline uint64_t get_cycles()
{
uint64_t t;
// editor's note: "=A" is unsafe for this in x86-64
__asm volatile ("rdtsc" : "=A"(t));
return t;
}
int num_measures = 10;
int main ()
{
int i, value, res1, res2;
uint64_t c1, c2;
int tsccost, tot, a;
tot=0;
for(i=0; i<num_measures; i++)
{
c1 = get_cycles();
c2 = get_cycles();
tsccost=(int)(c2-c1);
if(tsccost<0)
{
printf("#### ERROR!!! ");
printf("rdtsc took %d clock cycles\n", tsccost);
return 1;
}
tot = tot+tsccost;
}
tsccost=tot/num_measures;
printf("rdtsc takes on average: %d clock cycles\n", tsccost);
return EXIT_SUCCESS;
}
__asm volatile ("rdtsc" : "=A"(t));
is problematic (or surprising?) in GCC (gcc.gnu.org/bugzilla/show_bug.cgi?id=21249). The=A
constraint meansrax
in x86_64, notedx:eax
. SHLrdx
by 32 and OR it intorax
, or SHLDrdx
left while shifting in the bits ofrax
from the right. – Georgiardtsc
sort of gets you there but blows up in a multitasking OS on modern CPUs. Emulation software along the lines of Win3mu that simulates each and every opcode for an architecture could actually be extremely helpful for taking those measurements. Correctly emulating a modern CPU has its own major issues but maybe such software already exists somewhere? – Jinni