How can I use mach_absolute_time without overflowing?
Asked Answered
M

3

19

On Darwin, the POSIX standard clock_gettime(CLOCK_MONOTONIC) timer is not available. Instead, the highest resolution monotonic timer is obtained through the mach_absolute_time function from mach/mach_time.h.

The result returned may be an unadjusted tick count from the processor, in which case the time units could be a strange multiple. For example, on a CPU with a 33MHz tick count, Darwin returns 1000000000/33333335 as the exact units of the returned result (ie, multiply the mach_absolute_time by that fraction to obtain a nanosecond value).

We usually wish to convert from exact ticks to "standard" (decimal) units, but unfortunately, naively multiplying the absolute time by the fraction will overflow even in 64-bit arithmetic. This is an error that Apple's sole piece of documentation on mach_absolute_time falls into (Technical Q&A QA1398).1

How should I write a function that correctly uses mach_absolute_time?


  1. Note that this is not a theoretical problem: the sample code in QA1398 completely fails to work on PowerPC-based Macs. On Intel Macs, mach_timebase_info always returns 1/1 as the scaling factor because the CPU's raw tick count is unreliable (dynamic speed-stepping), so the API does the scaling for you. On PowerPC Macs, mach_timebase_info returns either 1000000000/33333335 or 1000000000/25000000, so Apple's provided code definitely overflows every few minutes. Oops.
Multicolored answered 30/4, 2014 at 1:18 Comment(1)
I have CLOCK_MONOTONIC on MacOS 10.13.6, and it fills timespec, returning 0; not sure if it's really monotonic, however.Guilford
M
25

Most-precise (best) answer

Perform the arithmetic at 128-bit precision to avoid the overflow!

// Returns monotonic time in nanos, measured from the first time the function
// is called in the process.
uint64_t monotonicTimeNanos() {
  uint64_t now = mach_absolute_time();
  static struct Data {
    Data(uint64_t bias_) : bias(bias_) {
      kern_return_t mtiStatus = mach_timebase_info(&tb);
      assert(mtiStatus == KERN_SUCCESS);
    }
    uint64_t scale(uint64_t i) {
      return scaleHighPrecision(i - bias, tb.numer, tb.denom);
    }
    static uint64_t scaleHighPrecision(uint64_t i, uint32_t numer,
                                       uint32_t denom) {
      U64 high = (i >> 32) * numer;
      U64 low = (i & 0xffffffffull) * numer / denom;
      U64 highRem = ((high % denom) << 32) / denom;
      high /= denom;
      return (high << 32) + highRem + low;
    }
    mach_timebase_info_data_t tb;
    uint64_t bias;
  } data(now);
  return data.scale(now);
}

A simple low-resolution answer

// Returns monotonic time in nanos, measured from the first time the function
// is called in the process.  The clock may run up to 0.1% faster or slower
// than the "exact" tick count.
uint64_t monotonicTimeNanos() {
  uint64_t now = mach_absolute_time();
  static struct Data {
    Data(uint64_t bias_) : bias(bias_) {
      kern_return_t mtiStatus = mach_timebase_info(&tb);
      assert(mtiStatus == KERN_SUCCESS);
      if (tb.denom > 1024) {
        double frac = (double)tb.numer/tb.denom;
        tb.denom = 1024;
        tb.numer = tb.denom * frac + 0.5;
        assert(tb.numer > 0);
      }
    }
    mach_timebase_info_data_t tb;
    uint64_t bias;
  } data(now);
  return (now - data.bias) * data.tb.numer / data.tb.denom;
}

A fiddly solution using low-precision arithmetic but using continued fractions to avoid loss of accuracy

// This function returns the rational number inside the given interval with
// the smallest denominator (and smallest numerator breaks ties; correctness
// proof neglects floating-point errors).
static mach_timebase_info_data_t bestFrac(double a, double b) {
  if (floor(a) < floor(b))
  { mach_timebase_info_data_t rv = {(int)ceil(a), 1}; return rv; }
  double m = floor(a);
  mach_timebase_info_data_t next = bestFrac(1/(b-m), 1/(a-m));
  mach_timebase_info_data_t rv = {(int)m*next.numer + next.denum, next.numer};
  return rv;
}

// Returns monotonic time in nanos, measured from the first time the function
// is called in the process.  The clock may run up to 0.1% faster or slower
// than the "exact" tick count. However, although the bound on the error is
// the same as for the pragmatic answer, the error is actually minimized over
// the given accuracy bound.
uint64_t monotonicTimeNanos() {
  uint64_t now = mach_absolute_time();
  static struct Data {
    Data(uint64_t bias_) : bias(bias_) {
      kern_return_t mtiStatus = mach_timebase_info(&tb);
      assert(mtiStatus == KERN_SUCCESS);
      double frac = (double)tb.numer/tb.denom;
      uint64_t spanTarget = 315360000000000000llu; // 10 years
      if (getExpressibleSpan(tb.numer, tb.denom) >= spanTarget)
        return;
      for (double errorTarget = 1/1024.0; errorTarget > 0.000001;) {
        mach_timebase_info_data_t newFrac =
            bestFrac((1-errorTarget)*frac, (1+errorTarget)*frac);
        if (getExpressibleSpan(newFrac.numer, newFrac.denom) < spanTarget)
          break;
        tb = newFrac;
        errorTarget = fabs((double)tb.numer/tb.denom - frac) / frac / 8;
      }
      assert(getExpressibleSpan(tb.numer, tb.denom) >= spanTarget);
    }
    mach_timebase_info_data_t tb;
    uint64_t bias;
  } data(now);
  return (now - data.bias) * data.tb.numer / data.tb.denom;
}

The derivation

We aim to reduce the fraction returned by mach_timebase_info to one that is essentially the same, but with a small denominator. The size of the timespan that we can handle is limited only by the size of the denominator, not the numerator of the fraction we shall multiply by:

uint64_t getExpressibleSpan(uint32_t numer, uint32_t denom) {
  // This is just less than the smallest thing we can multiply numer by without
  // overflowing. ceilLog2(numer) = 64 - number of leading zeros of numer
  uint64_t maxDiffWithoutOverflow = ((uint64_t)1 << (64 - ceilLog2(numer))) - 1;
  return maxDiffWithoutOverflow * numer / denom;
}

If denom=33333335 as returned by mach_timebase_info, we can handle differences of up to 18 seconds only before the multiplication by numer overflows. As getExpressibleSpan shows, by calculating a rough lower bound for this, the size of numer doesn't matter: halving numer doubles maxDiffWithoutOverflow. The only goal therefore is to produce a fraction close to numer/denom that has a smaller denominator. The simplest method to do this is using continued fractions.

The continued fractions method is rather handy. bestFrac clearly works correctly if the provided interval contains an integer: it returns the least integer in the interval over 1. Otherwise, it calls itself recursively with a strictly larger interval and returns m+1/next. The final result is a continued fraction that can be shown by induction to have the correct property: it's optimal, the fraction inside the given interval with the least denominator.

Finally, we reduce the fraction Darwin passes us to a smaller one to use when rescaling the mach_absolute_time to nanoseconds. We may introduce an error here because we can't reduce the fraction in general without losing accuracy. We set ourselves the target of 0.1% error, and check that we've reduced the fraction enough for common timespans (up to ten years) to be handled correctly.

Arguably the method is over-complicated for what it does, but it handles correctly anything the API can throw at it, and the resulting code is still short and extremely fast (bestFrac typically recurses only three or four iterations deep before returning a denominator less than 1000 for random intervals [a,a*1.002]).

Multicolored answered 30/4, 2014 at 1:18 Comment(6)
That looks promising, unfortunately I can't seem to be able to make my compiler work with those (using a simple clang foo.c) (probably the static struct Data {Data(uint64_t bias_) : bias(bias_) { part which I've never seen) could you give me any clue?Sverige
It's C++ (ObjC++), not C. Having said that, I should actually update this answer because I use a different technique now - just do the arithmetic at high-precision :)Multicolored
Thanks :) I've came up with a different solution too (but have no PowerPC to test it actually works :p)Sverige
I've updated with my more-accurate solution now. We dropped support for PPC at my company a year ago, so it's all moot now that we're pretty-much guaranteed never to see anything other than 1/1 returned from mach_timebase_info().Multicolored
In the one with title "Perform the arithmetic at 128-bit precision to avoid the overflow!" now is passed to the ctor of Data which set bias to the value of now, then now is passed to scale and there bias is subtracted from i. Doesn't that mean that now is subtracted from now? ... I can't see how this isn't always zero?Asarum
You've missed that it's static struct Data ... so it's only zero the first time the function is called. After that the bias is remembered from previous runs.Multicolored
S
5

You're worrying about overflow when multiplying/dividing with values from the mach_timebase_info struct, which is used for conversion to nanoseconds. So, while it may not fit your exact needs, there are easier ways to get a count in nanoseconds or seconds.

All solutions below are using mach_absolute_time internally (and NOT the wall clock).


Use double instead of uint64_t

(supported in Objective-C and Swift)

double tbInSeconds = 0;
mach_timebase_info_data_t tb;
kern_return_t kError = mach_timebase_info(&tb);
if (kError == 0) {
    tbInSeconds = 1e-9 * (double)tb.numer / (double)tb.denom;
}

(remove the 1e-9 if you want nanoseconds)

Usage:

uint64_t start = mach_absolute_time();
// do something
uint64_t stop = mach_absolute_time();
double durationInSeconds = tbInSeconds * (stop - start);

Use ProcessInfo.processInfo.systemUptime

(supported in Objective-C and Swift)

It does the job in double seconds directly:

CFTimeInterval start = NSProcessInfo.processInfo.systemUptime;
// do something
CFTimeInterval stop = NSProcessInfo.processInfo.systemUptime;
NSTimeInterval durationInSeconds = stop - start;

For reference, source code of systemUptime just does something similar as previous solution:

struct mach_timebase_info info;
mach_timebase_info(&info);
__CFTSRRate = (1.0E9 / (double)info.numer) * (double)info.denom;
__CF1_TSRRate = 1.0 / __CFTSRRate;
uint64_t tsr = mach_absolute_time();
return (CFTimeInterval)((double)tsr * __CF1_TSRRate);

Use QuartzCore.CACurrentMediaTime()

(supported in Objective-C and Swift)

Same as systemUptime, but without being open source.


Use Dispatch.DispatchTime.now()

(supported in Swift only)

Another wrapper around mach_absolute_time(). Base precision is nanoseconds, backed with UInt64.

DispatchTime start = DispatchTime.now()
// do something
DispatchTime stop = DispatchTime.now()
TimeInterval durationInSeconds = Double(end.uptimeNanoseconds - start.uptimeNanoseconds) / 1_000_000_000

For reference, source code of DispatchTime.now() says it basically simply returns a struct DispatchTime(rawValue: mach_absolute_time()). And the calculation for uptimeNanoseconds is:

(result, overflow) = result.multipliedReportingOverflow(by: UInt64(DispatchTime.timebaseInfo.numer))
result = overflow ? UInt64.max : result / UInt64(DispatchTime.timebaseInfo.denom)

So it just discards results if the multiplication can't be stored in an UInt64.

Survance answered 29/1, 2019 at 7:53 Comment(1)
This should be the right answer. Most practical approach is to cache a double conversion coefficient for conversion to seconds. If you need max accuracy, stay in ticks domain and convert only for display/interop purposes.Messina
C
-1

If mach_absolute_time() sets the uint64 back to 0 then reset the time calculations if less than the last check.

That's the problem, they don't document what happens when the uint64 reaches all ones (binary).

read it. https://developer.apple.com/documentation/kernel/1462446-mach_absolute_time

Colostomy answered 15/5, 2022 at 4:33 Comment(2)
It will take more than 580 years for it to wrap around on Intel and more than 24000 years on Apple Silicon.Messina
It will also get you ranked in the broad classification of fails eventually if not fixed and discussed as such with those not knowing of your stack overflow response.Colostomy

© 2022 - 2024 — McMap. All rights reserved.