On a cpu with constant_tsc and nonstop_tsc, why does my time drift?

Asked 25/8, 2016 at 17:7 Answered 19/8, 2018 at 15:25

I am running this test on a cpu with constant_tsc and nonstop_tsc

$ grep -m 1 ^flags /proc/cpuinfo | sed 's/ /\n/g' | egrep "constant_tsc|nonstop_tsc"
constant_tsc
nonstop_tsc

Step 1: Calculate the tick rate of the tsc:

I calculate _ticks_per_ns as the median over a number of observations. I use rdtscp to ensure in-order execution.

static const int trials = 13;
std::array<double, trials> rates;

for (int i = 0; i < trials; ++i)
{
    timespec beg_ts, end_ts;
    uint64_t beg_tsc, end_tsc;

    clock_gettime(CLOCK_MONOTONIC, &beg_ts);
    beg_tsc = rdtscp();

    uint64_t elapsed_ns;
    do
    {
        clock_gettime(CLOCK_MONOTONIC, &end_ts);
        end_tsc = rdtscp();

        elapsed_ns = to_ns(end_ts - beg_ts); // calculates ns between two timespecs
    }
    while (elapsed_ns < 10 * 1e6); // busy spin for 10ms

    rates[i] = (double)(end_tsc - beg_tsc) / (double)elapsed_ns;
}

std::nth_element(rates.begin(), rates.begin() + trials/2, rates.end());

_ticks_per_ns = rates[trials/2];

Step 2: Calculate starting wall clock time and tsc

uint64_t beg, end;
timespec ts;

// loop to ensure we aren't interrupted between the two tsc reads
while (1)
{
    beg = rdtscp();
    clock_gettime(CLOCK_REALTIME, &ts);
    end = rdtscp();

    if ((end - beg) <= 2000) // max ticks per clock call
        break;
}

_start_tsc        = end;
_start_clock_time = to_ns(ts); // converts timespec to ns since epoch

Step 3: Create a function which can return wall clock time from the tsc

uint64_t tsc_to_ns(uint64_t tsc)
{
    int64_t diff = tsc - _start_tsc;
    return _start_clock_time + (diff / _ticks_per_ns);
}

Step 4: Run in a loop, printing wallclock time from clock_gettime and from rdtscp

// lock the test to a single core
cpu_set_t mask;
CPU_ZERO(&mask);
CPU_SET(6, &mask);
sched_setaffinity(0, sizeof(cpu_set_t), &mask);

while (1)
{
    timespec utc_now;
    clock_gettime(CLOCK_REALTIME, &utc_now);
    uint64_t utc_ns = to_ns(utc_now);
    uint64_t tsc_ns = tsc_to_ns(rdtscp());

    uint64_t ns_diff = tsc_ns - utc_ns;

    std::cout << "clock_gettime " << ns_to_str(utc_ns) << '\n';
    std::cout << "tsc_time      " << ns_to_str(tsc_ns) << " diff=" << ns_diff << "ns\n";

    sleep(10);
}

Output:

clock_gettime 11:55:34.824419837
tsc_time      11:55:34.824419840 diff=3ns
clock_gettime 11:55:44.826260245
tsc_time      11:55:44.826260736 diff=491ns
clock_gettime 11:55:54.826516358
tsc_time      11:55:54.826517248 diff=890ns
clock_gettime 11:56:04.826683578
tsc_time      11:56:04.826684672 diff=1094ns
clock_gettime 11:56:14.826853056
tsc_time      11:56:14.826854656 diff=1600ns
clock_gettime 11:56:24.827013478
tsc_time      11:56:24.827015424 diff=1946ns

Questions:

It is quickly evident that the times calculated in these two ways rapidly drift apart.

I'm assuming that with constant_tsc and nonstop_tsc that the tsc rate is constant.

Is this the on board clock that is drifting? Surely it doesn't drift at this rate?
What is the cause of this drift?
Is there anything I can do to keep them in sync (other than very frequently recalculating _start_tsc and _start_clock_time in step 2)?

Pneumograph answered 25/8, 2016 at 17:7 Comment(19)

Depending on your env, you might get a better luck with __vdso_clock_gettime function (you might need to load vdso and dlsym it). – Suk 25/8, 2016 at 17:32

Is your program locked to a single core? It's usually impossible to synchronize the TSC across cores exactly. – Kiakiah 25/8, 2016 at 18:29

Based on your data, it looks like the mean "drift" rate between your wallclock time and TSC time is about 40 nanoseconds per second, or about 40 parts per billion. I suspect the main cause of this discrepancy is the limited accuracy of your ticks per nanosecond estimate. Actually, I'm pretty surprised that it's even that accurate. – Afra 25/8, 2016 at 18:49

@IlmariKaronen any tricks I can employ to increase the accuracy of my frequency calculation? – Pneumograph 25/8, 2016 at 19:32

@IwillnotexistIdonotexist I thought the whole point of constant_tsc was to keep the tscs synchronized across all cores in a system? – Pneumograph 25/8, 2016 at 19:36

@Suk I'm running this test on Ubuntu 16.04 (kernel 4.4.0-34-generic, glibc 2.23). Aren't I using vdso already? – Pneumograph 25/8, 2016 at 19:39

No, constant_tsc means that the TSC ticks at a constant frequency independent of frequency scaling/TurboBoost/etc. For cores of the same make I'd imagine they'd tick at the same speed. But when each core actually starts its TSC ticking isn't synchronized, so between each core there will be an offset. Software can attempt to synchronize the two, but you usually can't make them match to nanosecond precision. – Kiakiah 25/8, 2016 at 19:39

@IwillnotexistIdonotexist According to this that is nonstop_tsc, but then I'm reading in the intel manual Constant TSC behavior ensures that the duration of each clock tick is uniform and supports the use of the TSC as a wall clock timer even if the processor core changes frequency. This is the architectural behavior moving forward. I will try my test with taskset – Pneumograph 25/8, 2016 at 19:45

@IwillnotexistIdonotexist I've just used sched_setaffinity to lock it to a core - no difference – Pneumograph 25/8, 2016 at 19:48

If sched_setaffinity would solve it, you'd probably [before that] have seen jitter up/down rather than steady drift. I have independent code for this with some 20 years mileage on it. I'm experimenting now. If I find something, I'll post [At present, my program confirms the drift]. BTW, the best value for CPU khz is derived from bogomips / 2 rather than /sys/... – Haley 25/8, 2016 at 19:53

That link has constant and nonstop exactly inverted, and you should use Intel's terminology. As you found, constant TSC means constant period/frequency, and Intel's manual, immediately below in 17.14.1 Invariant TSC, describes the TSC as ticking regardless of sleep states, which is what that link should have called nonstop. But the asynchrony between the cores is apparently not the problem here. – Kiakiah 25/8, 2016 at 19:58

@IwillnotexistIdonotexist agreed, thanks for the input nonetheless! – Pneumograph 25/8, 2016 at 19:59

I have some ideas, but it really depends on what the actual problem you're trying to solve by using the TSC as a wallclock timer is. (That said, most of my ideas basically boil down to using the TSC only to interpolate or slightly extrapolate between regular actual wallclock timings.) – Afra 25/8, 2016 at 20:9

@IlmariKaronen how often to you resync your starting wall clock time? – Pneumograph 25/8, 2016 at 20:17

@IlmariKaronen actual problem is to find the absolute most performant way to get the current wall clock time. It's for our internal logging library. We have measured, it is a high contributor. – Pneumograph 25/8, 2016 at 20:19

The drift seems accurate. Why not just use tsc_time for all log timestamps? – Diet 25/8, 2016 at 23:0

A few other things that come to mind are the following. 1) You should use a timebase much longer than 10*1e6 == 10e6 ns. Using a timebase of 10 ms, an error of just 1 ns gives you a skew of magnitude around 100 ns/s, and indeed your drift is of around that number, 40 ns/s. Bigger timebases correspond to smaller variance of estimate. 2) The conversion factor should be computed by iterating for a fixed amount of tsc ticks and computing the gap with clock_gettime; The reason being that clock_gettime is much more expensive than rdtscp. 3) You should subtract the overhead of rdtscp. – Kiakiah 27/8, 2016 at 16:52

@SteveLorimer, Do you have any time synchronization daemon like ntpd enabled? What is the stratum of the ntp server used (and is it connected by stable network with symmetrical latency, not your Wifi)? What is timesource of your REALTIME? What is your motherboard model, does it have TCXO quartz or any kind of DO (Disciplined oscillator) for CPU BCLK or it is just cheapest 100MHz and typical clock crystal, both having 20-50 ppm deviation. Is there "Spread-spectrum clock generation" (SSCG) in the motherboard to limit EMI? You have 500ns / 10s it is less than 1 ppm, but you have to atomic clock – Scuffle 5/3, 2017 at 5:31

Another thing to consider: TSC could be adjusted at some point in time - e.g. by SMM - cf. lkml.org/lkml/2016/11/19/146 – Inflation 7/9, 2019 at 10:26

The reason for the drift seen in the OP, at least on my machine, is that the TSC ticks per ns drifts away from its original value of _ticks_per_ns. The following results were from this machine:

don@HAL:~/UNIX/OS/3EZPcs/Ch06$ uname -a
Linux HAL 4.4.0-81-generic #104-Ubuntu SMP Wed Jun 14 08:17:06 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
don@HAL:~/UNIX/OS/3EZPcs/Ch06$  cat /sys/devices/system/clocksource/clocksource0/current_clocksource
tsc

cat /proc/cpuinfo shows constant_tsc and nonstop_tsc flags.

viewRates.cc can be run to see the current TSC Ticks per ns on a machine:

rdtscp.h:

static inline unsigned long rdtscp_start(void) {
  unsigned long var;
  unsigned int hi, lo;
  
  __asm volatile ("cpuid\n\t"
          "rdtsc\n\t" : "=a" (lo), "=d" (hi)
          :: "%rbx", "%rcx");
  
  var = ((unsigned long)hi << 32) | lo;
  return (var);
}

static inline unsigned long rdtscp_end(void) {
  unsigned long var;
  unsigned int hi, lo;
  
  __asm volatile ("rdtscp\n\t"
          "mov %%edx, %1\n\t"
          "mov %%eax, %0\n\t"
          "cpuid\n\t"  : "=r" (lo), "=r" (hi)
          :: "%rax", "%rbx", "%rcx", "%rdx");
  
  var = ((unsigned long)hi << 32) | lo;
  return (var);
  }

See: Intel's ia-32-ia-64-benchmark-code-execution-paper

viewRates.cc:

#include <time.h>
#include <unistd.h>
#include <iostream>
#include <iomanip>
#include <cstdlib>
#include "rdtscp.h"
using std::cout;  using std::cerr;  using std::endl;

#define CLOCK CLOCK_REALTIME

uint64_t to_ns(const timespec &ts);   // Converts a struct timespec to ns (since epoch).
void view_ticks_per_ns(int runs =10, int sleep =10);

int main(int argc, char **argv) {
  int runs = 10, sleep = 10;
  if (argc != 1 && argc != 3) {
    cerr << "Usage: " << argv[0] << " [ RUNS SLEEP ] \n";
    exit(1);
  } else if (argc == 3) {
    runs = std::atoi(argv[1]);
    sleep = std::atoi(argv[2]);
  }

  view_ticks_per_ns(runs, sleep); 
}

  void view_ticks_per_ns(int RUNS, int SLEEP) {
// Prints out stream of RUNS tsc ticks per ns, each calculated over a SLEEP secs interval.
  timespec clock_start, clock_end;
  unsigned long tsc1, tsc2, tsc_start, tsc_end;
  unsigned long elapsed_ns, elapsed_ticks;
  double rate; // ticks per ns from each run.

  clock_getres(CLOCK, &clock_start);
  cout <<  "Clock resolution: " << to_ns(clock_start) << "ns\n\n";

  cout << " tsc ticks      " << "ns      " << " tsc ticks per ns\n";
  for (int i = 0; i < RUNS; ++i) {
    tsc1 = rdtscp_start();
    clock_gettime(CLOCK, &clock_start);
    tsc2 = rdtscp_end();                      
    tsc_start = (tsc1 + tsc2) / 2;

    sleep(SLEEP);

    tsc1 = rdtscp_start();
    clock_gettime(CLOCK, &clock_end);
    tsc2 = rdtscp_end();                     
    tsc_end = (tsc1 + tsc2) / 2;
    
    elapsed_ticks = tsc_end - tsc_start;
    elapsed_ns = to_ns(clock_end) - to_ns(clock_start);
    rate = static_cast<double>(elapsed_ticks) / elapsed_ns;

    cout << elapsed_ticks << " " << elapsed_ns << " " << std::setprecision(12) << rate << endl;
  } 
}

linearExtrapolator.cc can be run to re-create the experiment of the OP:

linearExtrapolator.cc:

#include <time.h>
#include <unistd.h>
#include <iostream>
#include <iomanip>
#include <algorithm>
#include <array>
#include "rdtscp.h"

using std::cout;  using std::endl;  using std::array;

#define CLOCK CLOCK_REALTIME

uint64_t to_ns(const timespec &ts);   // Converts a struct timespec to ns (since epoch).
void set_ticks_per_ns(bool set_rate); // Display or set tsc ticks per ns, _ticks_per_ns.
void get_start();             // Sets the 'start' time point: _start_tsc[in ticks] and _start_clock_time[in ns].
uint64_t tsc_to_ns(uint64_t tsc);     // Convert tsc ticks since _start_tsc to ns (since epoch) linearly using
                                      // _ticks_per_ns with origin(0) at the 'start' point set by get_start().

uint64_t _start_tsc, _start_clock_time; // The 'start' time point as both tsc tick number, start_tsc, and as
                                        // clock_gettime ns since epoch as _start_clock_time.
double _ticks_per_ns;                   // Calibrated in set_ticks_per_ns()

int main() {
  set_ticks_per_ns(true); // Set _ticks_per_ns as the initial TSC ticks per ns.

  uint64_t tsc1, tsc2, tsc_now, tsc_ns, utc_ns;
  int64_t ns_diff;
  bool first_pass{true};
  for (int i = 0; i < 10; ++i) {
    timespec utc_now;
    if (first_pass) {
      get_start(); //Get start time in both ns since epoch (_start_clock_time), and tsc tick number(_start_tsc)
      cout << "_start_clock_time: " <<  _start_clock_time << ", _start_tsc: " << _start_tsc << endl;
      utc_ns = _start_clock_time;
      tsc_ns = tsc_to_ns(_start_tsc);   // == _start_clock_time by definition.
      tsc_now = _start_tsc;
      first_pass = false;
    } else {
      tsc1 = rdtscp_start();
      clock_gettime(CLOCK, &utc_now);
      tsc2 = rdtscp_end();
      tsc_now = (tsc1 + tsc2) / 2;
      tsc_ns = tsc_to_ns(tsc_now);
      utc_ns = to_ns(utc_now);
    }

    ns_diff = tsc_ns - (int64_t)utc_ns;
    
    cout << "elapsed ns: " << utc_ns - _start_clock_time << ", elapsed ticks: " << tsc_now - _start_tsc 
     << ", ns_diff: " << ns_diff << '\n' << endl;
    
    set_ticks_per_ns(false);  // Display current TSC ticks per ns (does not alter original _ticks_per_ns).
  }
}

void set_ticks_per_ns(bool set_rate) {
  constexpr int RUNS {1}, SLEEP{10};
  timespec clock_start, clock_end;
  uint64_t tsc1, tsc2, tsc_start, tsc_end;
  uint64_t elapsed_ns[RUNS], elapsed_ticks[RUNS];
  array<double, RUNS> rates; // ticks per ns from each run.

  if (set_rate) {
    clock_getres(CLOCK, &clock_start);
    cout <<  "Clock resolution: " << to_ns(clock_start) << "ns\n";
  }

  for (int i = 0; i < RUNS; ++i) {
    tsc1 = rdtscp_start();
    clock_gettime(CLOCK, &clock_start);
    tsc2 = rdtscp_end();                      
    tsc_start = (tsc1 + tsc2) / 2;

    sleep(SLEEP);

    tsc1 = rdtscp_start();
    clock_gettime(CLOCK, &clock_end);
    tsc2 = rdtscp_end();                     
    tsc_end = (tsc1 + tsc2) / 2;
    
    elapsed_ticks[i] = tsc_end - tsc_start;
    elapsed_ns[i] = to_ns(clock_end) - to_ns(clock_start);
    rates[i] = static_cast<double>(elapsed_ticks[i]) / elapsed_ns[i];
  }
  
  cout << " tsc ticks      " << "ns     " << "tsc ticks per ns" << endl;
  for (int i = 0; i < RUNS; ++i)
    cout << elapsed_ticks[i] << " " << elapsed_ns[i] << " " << std::setprecision(12) << rates[i] << endl;

  if (set_rate)
    _ticks_per_ns = rates[RUNS-1];
}

constexpr uint64_t BILLION {1000000000};

uint64_t to_ns(const timespec &ts) {
  return ts.tv_sec * BILLION + ts.tv_nsec;
}

void get_start() { // Get start time both in tsc ticks as _start_tsc, and in ns since epoch as _start_clock_time
  timespec ts;
  uint64_t beg, end;

// loop to ensure we aren't interrupted between the two tsc reads
  while (1) {
    beg = rdtscp_start();
    clock_gettime(CLOCK, &ts);
    end = rdtscp_end();   
    if ((end - beg) <= 2000) // max ticks per clock call
      break;
  }

  _start_tsc = (end + beg) / 2;
  _start_clock_time = to_ns(ts); // converts timespec to ns since epoch
}

uint64_t tsc_to_ns(uint64_t tsc) { // Convert tsc ticks into absolute ns:
  // Absolute ns is defined by this linear extrapolation from the start point where
  //_start_tsc[in ticks] corresponds to _start_clock_time[in ns].
  uint64_t diff = tsc - _start_tsc;
  return _start_clock_time + static_cast<uint64_t>(diff / _ticks_per_ns);
}

Here is output from a run of viewRates immediately followed by linearExtrapolator:

# ./viewRates 

Clock resolution: 1ns

 tsc ticks      ns       tsc ticks per ns
28070466526 10000176697 2.8069970538
28070500272 10000194599 2.80699540335
28070489661 10000196097 2.80699392179
28070404159 10000170879 2.80699245029
28070464811 10000197285 2.80699110338
28070445753 10000195177 2.80698978932
28070430538 10000194298 2.80698851457
28070427907 10000197673 2.80698730414
28070409903 10000195492 2.80698611597
28070398177 10000195328 2.80698498942

# ./linearExtrapolator

Clock resolution: 1ns
 tsc ticks      ns     tsc ticks per ns
28070385587 10000197480 2.8069831264
_start_clock_time: 1497966724156422794, _start_tsc: 4758879747559
elapsed ns: 0, elapsed ticks: 0, ns_diff: 0

 tsc ticks      ns     tsc ticks per ns
28070364084 10000193633 2.80698205596
elapsed ns: 10000247486, elapsed ticks: 28070516229, ns_diff: -3465

 tsc ticks      ns     tsc ticks per ns
28070358445 10000195130 2.80698107188
elapsed ns: 20000496849, elapsed ticks: 56141027929, ns_diff: -10419

 tsc ticks      ns     tsc ticks per ns
28070350693 10000195646 2.80698015186
elapsed ns: 30000747550, elapsed ticks: 84211534141, ns_diff: -20667

 tsc ticks      ns     tsc ticks per ns
28070324772 10000189692 2.80697923105
elapsed ns: 40000982325, elapsed ticks: 112281986547, ns_diff: -34158

 tsc ticks      ns     tsc ticks per ns
28070340494 10000198352 2.80697837242
elapsed ns: 50001225563, elapsed ticks: 140352454025, ns_diff: -50742

 tsc ticks      ns     tsc ticks per ns
28070325598 10000196057 2.80697752704
elapsed ns: 60001465937, elapsed ticks: 168422905017, ns_diff: -70335

# ^C

The viewRates output shows that the TSC ticks per ns are decreasing fairly rapidly with time corresponding to one of those steep drops in the plot above. The linearExtrapolator output shows, as in the OP, the difference between the elapsed ns as reported by clock_gettime(), and the elapsed ns obtained by converting the elapsed TSC ticks to elapsed ns using _ticks_per_ns == 2.8069831264 obtained at start time. Rather than a sleep(10); between each print out of elapsed ns, elapsed ticks, ns_diff, I re-run the TSC ticks per ns calculation using a 10s window; this prints out the current tsc ticks per ns ratio. It can be seen that the trend of decreasing TSC ticks per ns observed from the viewRates output is continuing throughout the run of linearExtrapolator.

Dividing an elapsed ticks by _ticks_per_ns and subtracting the corresponding elapsed ns gives the ns_diff, e.g.: (84211534141 / 2.8069831264) - 30000747550 = -20667. But this is not 0 mainly due the drift in TSC ticks per ns. If we had used a value of 2.80698015186 ticks per ns obtained from the last 10s interval, the result would be: (84211534141 / 2.80698015186) - 30000747550 = 11125. The additional error accumulated during that last 10s interval, -20667 - -10419 = -10248, nearly disappears when the correct TSC ticks per ns value is used for that interval: (84211534141 - 56141027929) / 2.80698015186 - (30000747550 - 20000496849) = 349.

If the linearExtrapolator had been run at a time when the TSC ticks per ns had been constant, the accuracy would be limited by how well the (constant) _ticks_per_ns had been determined, and then it would pay to take, e.g., a median of several estimates. If the _ticks_per_ns was off by a fixed 40 parts per billion, a constant drift of about 400ns every 10 seconds would be expected, so ns_diff would grow/shrink by 400 each 10 seconds.

genTimeSeriesofRates.cc can be used to generate data for a plot like above: genTimeSeriesofRates.cc:

#include <time.h>
#include <unistd.h>
#include <iostream>
#include <iomanip>
#include <algorithm>
#include <array>
#include "rdtscp.h"

using std::cout;  using std::cerr;  using std::endl;  using std::array;

double get_ticks_per_ns(long &ticks, long &ns); // Get median tsc ticks per ns, ticks and ns.
long ts_to_ns(const timespec &ts);

#define CLOCK CLOCK_REALTIME            // clock_gettime() clock to use.
#define TIMESTEP 10
#define NSTEPS  10000
#define RUNS 5            // Number of RUNS and SLEEP interval used for each sample in get_ticks_per_ns().
#define SLEEP 1

int main() {
  timespec ts;
  clock_getres(CLOCK, &ts);
  cerr << "CLOCK resolution: " << ts_to_ns(ts) << "ns\n";
  
  clock_gettime(CLOCK, &ts);
  int start_time = ts.tv_sec;

  double ticks_per_ns;
  int running_elapsed_time = 0; //approx secs since start_time to center of the sampling done by get_ticks_per_ns()
  long ticks, ns;
  for (int timestep = 0; timestep < NSTEPS; ++timestep) {
    clock_gettime(CLOCK, &ts);
    ticks_per_ns = get_ticks_per_ns(ticks, ns);
    running_elapsed_time = ts.tv_sec - start_time + RUNS * SLEEP / 2;
    
    cout << running_elapsed_time << ' ' << ticks << ' ' << ns << ' ' 
     << std::setprecision(12) << ticks_per_ns << endl;
    
    sleep(10);
  }
}

double get_ticks_per_ns(long &ticks, long &ns) {
  // get the median over RUNS runs of elapsed tsc ticks, CLOCK ns, and their ratio over a SLEEP secs time interval 
  timespec clock_start, clock_end;
  long tsc_start, tsc_end;
  array<long, RUNS> elapsed_ns, elapsed_ticks;
  array<double, RUNS> rates; // arrays from each run from which to get medians.

  for (int i = 0; i < RUNS; ++i) {
    clock_gettime(CLOCK, &clock_start);
    tsc_start = rdtscp_end(); // minimizes time between clock_start and tsc_start.
    sleep(SLEEP);
    clock_gettime(CLOCK, &clock_end);
    tsc_end = rdtscp_end();
    
    elapsed_ticks[i] = tsc_end - tsc_start;
    elapsed_ns[i] = ts_to_ns(clock_end) - ts_to_ns(clock_start);
    rates[i] = static_cast<double>(elapsed_ticks[i]) / elapsed_ns[i];
  }

  // get medians:
  std::nth_element(elapsed_ns.begin(), elapsed_ns.begin() + RUNS/2, elapsed_ns.end());
  std::nth_element(elapsed_ticks.begin(), elapsed_ticks.begin() + RUNS/2, elapsed_ticks.end());
  std::nth_element(rates.begin(), rates.begin() + RUNS/2, rates.end());
  ticks = elapsed_ticks[RUNS/2];
  ns = elapsed_ns[RUNS/2];

  return rates[RUNS/2];
}

constexpr long BILLION {1000000000};

long ts_to_ns(const timespec &ts) {
  return ts.tv_sec * BILLION + ts.tv_nsec;
}

Superpatriot answered 20/6, 2017 at 18:48 Comment(1)

Very interesting, but I would like to understand: (a) Why you are using separate inline statements (in rdtscp.h) for reading the start vs end TS's? (b) Why are you using RDTSC in one and then RDTSCP in the other end read? (c) What SW did you use to create the plot? – Persecute 25/11, 2022 at 19:1

The relationship between the TSC and something like CLOCK_MONOTONIC will not be exactly unchanging. Even though you "calibrate" the TSC against CLOCK_MONOTONIC, the calibration will be out of date almost as soon as it is finished!

The reasons they won't stay in sync long term:

CLOCK_MONOTONIC is affected by NTP clock rate adjustments. NTP will constantly check network time and subtly slow down or speed up the system clock to match network time. This results in some kind of oscillating pattern in the true CLOCK_MONOTONIC frequency, and so your calibration will always be slightly off, especially the next time NTP applies a rate adjustment. You could compare against CLOCK_MONOTONIC_RAW to eliminate this effect.
CLOCK_MONOTONIC and TSC are almost certainly based on totally different underlying oscillators. It is often say that modern OSes use the TSC for time-keeping, but this is only to apply a small "local" offset to some other underlying slow-running clock to provide a very precise time (e.g., the "slow time" might be updated every timer tick, and then the TSC is used to interpolate between timer ticks). It is the slow underlying clock (something like the HPET or APIC clocks) that determines the longer-term behavior of CLOCK_MONOTONIC. The TSC itself, however is an independent free running clock, deriving its frequency from a different oscillator, on a different place on the chipset/motherboard and will different natural fluctuations (in particular, different response to temperature changes).

It is (2) that is more fundamental out of the two above: it means that even without any kind of NTP adjustments (or if you use a clock that is not subject to them), you'll see drift over time if the underlying clocks are based on different physical oscillators.

Hammerhead answered 19/8, 2018 at 15:25 Comment(0)

Is this the on board clock that is drifting? Surely it doesn't drift at this rate?
No, they shouldn't drift

What is the cause of this drift?
NTP service or similar that runs your OS. They affects clock_gettime(CLOCK_REALTIME, ...);

Is there anything I can do to keep them in sync (other than very frequently recalculating _start_tsc and _start_clock_time in step 2)? Yes you can ease the problem.

1 You can try to use CLOCK_MONOTONIC instead of CLOCK_REALTIME.

2 You can calculate the difference as a linear function from the time and apply it to compensate the drifting. But it will not be very reliable because time services doesn't adjust the time as linear function. But it will give you some more accuracy. Periodically you can do readjustment.

Some drifting you can get because you calculate _ticks_per_ns not accurately. You can check it by running you program several times. If results are not reproducible, it is mean that you calculate your _ticks_per_ns incorrectly. It is better to use statistics method then just an average value.

Also please note, _ticks_per_ns you are calculating by using CLOCK_MONOTONIC, which is related to TSC.

Next you are using CLOCK_REALTIME. It provides the system time. If your system has NTP or similar service, the time will be adjusted.

Your difference is around 2 micro seconds per minute. It is 0.002 * 24*60 = 2.9 milli seconds a day. It is a great accuracy for CPU clock. 3 ms a day it is a 1 second a year.

Thorathoracic answered 29/3, 2017 at 22:11 Comment(3)

BayK, is tsc clock affected/modulated by "Spread-spectrum clock generation" (SSCG) or not? Where is the linux kernel interface (in /proc or in /sys) to see current adjust parameters if they are set by NTP or other time daemon? – Scuffle 30/3, 2017 at 1:10

(There is SSCG in Xeon Phi's micetc: books.google.com/books?id=KJORYTHOxbEC&pg=PA380 Intel Xeon Phi Coprocessor High Performance Programming, 9780124104945, page 380. Also kernel.org/doc/Documentation/virtual/kvm/timekeeping.txt - ".. very large systems may deliberately slew the clocks of individual cores This technique, known as spread-spectrum clocking, reduces EMI at the clock frequency and harmonics of it". There was SSC in FSB - serverfault.com/questions/129112) – Scuffle 30/3, 2017 at 1:19

Osgx, NTP or other time daemon can't adjust TSC. It affects what clock_gettime(CLOCK_REALTIME) function return. In this question code shows difference between clock_gettime(CLOCK_REALTIME) result and tsc clock. – Thorathoracic 30/3, 2017 at 1:49

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags