How to get microsecond timings in JavaScript since Spectre and Meltdown
Asked Answered
S

2

15

The situation

When writing high-performance JavaScript code, the standard profiling tools offered by Chrome et al are not always sufficient. They only seem to offer function-level granularity and it can be quite time-consuming to drill down and find the information I need.

In .NET the StopWatch class gives me exactly what I need: sub-microsecond resolution timing of arbitrary pieces of code.

For JavaScript performance.now() used to be a pretty good way to measure performance, but in response to Spectre and Meltdown all major browsers have reduced the resolution down to not even a millisecond.

To quote MDN on performance.now():

The timestamp is not actually high-resolution. To mitigate security threats such as Spectre, browsers currently round the result to varying degrees. (Firefox started rounding to 2 milliseconds in Firefox 59.) Some browsers may also slightly randomize the timestamp. The precision may improve again in future releases; browser developers are still investigating these timing attacks and how best to mitigate them.

The problem

I need microsecond precision timings. At the time of writing, browsers don't seem to offer any options or flags to disable these security measurements. Maybe I'm googling the wrong terms but the only articles I come across are explanations of the security problems and how these mitigations address them.

I'm not interested in the security aspect here - I'm benchmarking performance-critical pieces of JavaScript code on my own machine and the only thing I care about is that I get as accurate measurements as possible with as little effort as possible.

Existing workarounds

Two options come to mind:

  1. Install an older version of a browser that doesn't have these mitigations implemented

I'd have to dedicate an old version of FireFox to benchmarking, and a new version of Chrome to browsing for example. That's not practical since I need to test in all browsers (and preferably also benchmark in all browsers). Also, new optimizations are not implemented in old browsers so the benchmarks would be potentially useless.

  1. Implement a custom timer using WebWorkers

I've seen various older blog posts on this but none of them seem to achieve the high precision that I need (after all, there used to be performance.now() for that).

The question

How do I get an effectively pre-Spectre performance.now() without having to resort to older browser versions, Virtual Machines and such?

Are there any coding techniques or libraries in JavaScript that achieve microsecond precision?

Are there any options or flags for the 3 aforementioned browsers that disable these security measures?

I'm ultimately looking for a way to accurately measure the relative performance of different pieces of code compared to one another - so if there is a solution that gives me ticks, rather than microseconds, that would be acceptable too as long as it's accurate and works across browsers.

Skidway answered 1/5, 2018 at 13:39 Comment(23)
In order to give a good recommendation: what do you need this precision timing for? And do you need precision timestamps, or precision intervals?Maddocks
Given what you're describing sounds like quite low level timings, presumably it's not totally dependent on running in a browser (where accessing the DOM, performing ajax etc will be orders of magnitude slower than any code that's running) - could you instead work within the node.js environment? You have process.hrtime there.Raynaraynah
@Mike'Pomax'Kamermans Intervals. Quite simply, I need a "stopwatch" that can tell me how long a certain piece of code took to execute, and after trying to optimize that piece of code, compare the two timings to see if my optimization has improved the performance or not.Skidway
Can you run the code in a loop without the JIT optimizer doing something that destroys your microbenchmark?Sitka
@JamesThorpe Well yes and no. It is pure JavaScript that I'm looking to measure the performance of, so it's not dependant on running in a browser, but I do need to know how it performs in a browser. NodeJS would only give me the metrics for Chrome, presumablySkidway
@PeterCordes Doing thousands of iterations in a loop is how I'm currently doing my benchmarks, but the JIT optimizations do give some uncertainty here. Also the looping itself makes it quite tedious to do benchmarks since I have to refactor and isolate code and effectively make it different, just to be able to measure. Ideally I just want to drop some measurement points in existing code without having to rewire everything,Skidway
Are there any coding techniques or libraries in JavaScript that achieve microsecond precision? If there were, it would be a vulnerability in the current defence mechanism against Spectre. (AFAIK, you can't easily exploit Meltdown from JS, just Spectre. And kernel Meltdown patches totally block it, while kernel Spectre patches only defend the kernel, and maybe processes from other processes; they still leave sandbox VMs vulnerable to guest code in the same process. Making the timing-based side channels unusable is basically necessary to defend the browser internals from JS guest code.)Sitka
I think your best bet is to look for a knob somewhere that re-enables high precision timing in modern browsers; presumably they made it possible to enable for use-cases like yours. I don't know how, but I'd guess it's possible. You may have to build your own Firefox and Chromium from source with that option disabled, though. You mainly care about using this in your own browser on one or a few machines, because this is for benchmarking use.Sitka
@PeterCordes Fair point. But say for example if you had an infinite loop running in a webworker, just counting (and fully utilizing a CPU core in the process), and you'd ask it the current count on two different moments, that would potentially still be useful information. I just don't know exactly how to do that (and whether it's viable to begin with - not sure about the overhead of postMessage, etc)Skidway
@PeterCordes Yeah I'm kind of looking for a knob, just saying I'd be open for other options as well if they existed. Building firefox and chromium from source is actually a pretty clever idea, I'll look into that.Skidway
Interesting; that loop hack hopefully would get you better than ms precision, but likely with many microseconds of overhead. Probably not good enough for reliable Spectre attacks. Does JS have shared atomic variables that you can read from one thread while another atomically increments them? Or would the looping thread have to receive and reply to a message? (I don't really know javascript, just CPUs / low level stuff; I'm here for the [benchmarking] and [performance] tags :P)Sitka
This also feels a bit like optimizing the wrong thing: timing individual pieces of code when the full codepath is already at, or sub-2ms suggests you already have code code in which optimization is going to be -effectively- irrelevant. To optimize those pieces of code, you're really more looking at complexity analysis, not runtime analysis, so that increased length input doesn't go from 2ms to 2 seconds to 2000 years over two orders of magnitude. And if you have to optimize them, you have code for which Node.js would be fine.Maddocks
@PeterCordes JavaScript doesn't do threading, everything conceptually happens in a message loop in a single thread - while you have the likes of service workers now (which can be thought of as threads), you have to post messages between them - there's no guarantee as to how fast that message will be seenRaynaraynah
@Mike'Pomax'Kamermans: 2 ms is about 8 million CPU clock cycles on a 4GHz CPU. That's many thousands of times longer than the out-of-order window (224 uops) on Skylake, even running code with low instructions per clock. It's also long enough that you can probably usefully microbenchmark something that you might use as part of a loop without having things totally distorted by it being so small that it normally gets folded into part of a larger operation. (i.e. we're not talking about ++x vs. x++ here or other things that are meaningless without looking at the JIT asm output.)Sitka
@Mike'Pomax'Kamermans One example of what I want to benchmark is specific hot paths in a parser. Testing whether a switch statement with 100 cases is perhaps a few % faster than a switch statement with only 20 cases and 2-3 if-elses. The optimizations are very different in Chrome vs FireFox vs Edge. Sometimes in Chrome it cannot be optimized further, but in FireFox it can. Apart from this low-level example, there are also higher-level benchmarks I'm looking for which are still sub-millisecond.Skidway
@FredKleuver: Yup, as an asm guru, that sounds like a reasonable thing to try to profile / microbenchmark this way, except that microbenchmarking branches that might mispredict is hard. Modern branch predictors learn easily, and creating the same amount of unpredictability as in your real use-case is hard. (And just putting it in a loop distorts that.) Maybe a bit on the small side, especially if it turns into a jump table. (Err, if the cases aren't integers like they have to be in C, but instead string matches or something, that's a bit bigger and an even better fit for microbenching.)Sitka
@PeterCordes Interesting. With that in mind the results will also vary from cpu to cpu, and I can only go so far. There will always be some rate of error and that's perfectly acceptable. Surely microsecond vs millisecond precision timing is the low-hanging fruit here, hence that's what I'm focusing on.Skidway
For a single switch statement you might well need nanosecond precision to get useful results. Some messy multi-threading stuff you hack up is not going to be worth the trouble vs. getting the JS implementation to give you high-rez time directly. For open source browsers it's definitely possible, it just might take a bit of work. The relative performance of the same x86 asm can be different on different hardware, too, so you don't want a timing method that also varies with CPUs making it impossible to tell what's what. (But beware of turbo / powersave CPU frequency variation either way)Sitka
@PeterCordes Thanks for the useful insights. I'm currently looking at how to build Chromium :) If I somehow figure this out for Chrome and FF I'll post it as an answer. Hopefully someone else comes along in the meantime with an existing solution but that's probably wishful thinking, hahSkidway
I'd look for the commits that introduced this change, and see if it included an option to disable it in a Firefox config setting that has no GUI menu entry, just buried in about:settings somewhere. Or actually I'd just go looking in those settings at first, and maybe google on how to disable the precision restriction. You probably aren't the first person to want this.Sitka
Yeah you'd say so, but I've googled all over for disabling the precision restriction - the utter lack of results is what led me to ask this question. I think I found the commit in chromium: github.com/chromium/chromium/commit/… - looks pretty hard-coded to me.Skidway
@FredKleuver: Yeah, no sign of a time API that bypasses TimeClamper::ClampTimeResolution. But if there is, you wouldn't necessarily see it in that patch, because there was already a 5us limit before that. Anyway, looks easy to disable in the source. Changing static constexpr double kResolutionSeconds = 0.0001 to 1.0 would work, or for lower overhead by making TimeClamper::ClampTimeResolution a no-op or leaving out calls to it. (And BTW, floor(time_seconds / kResolutionSeconds) would be more efficient as floor(time_seconds * (1.0/kResolutionSeconds)) to avoid runtime division.Sitka
You really got nerd sniped by this huh :) Thanks a ton though, very helpful info! I haven't been able to build chromium yet (no large OS product ever manages to build first try on Windows), so no good news from my end yet.Skidway
M
8

Since Firefox 79, you can use high resolution timers if you make your server send two headers with your page:

Starting with Firefox 79, high resolution timers can be used if you cross-origin isolate your document using the Cross-Origin-Opener-Policy and Cross-Origin-Embedder-Policy headers:

Cross-Origin-Opener-Policy: same-origin
Cross-Origin-Embedder-Policy: require-corp

These headers ensure a top-level document does not share a browsing context group with cross-origin documents. COOP process-isolates your document and potential attackers can't access to your global object if they were opening it in a popup, preventing a set of cross-origin attacks dubbed XS-Leaks.

Ref: https://developer.mozilla.org/en-US/docs/Web/API/Performance/now

It's not mentioned on that page as of this moment, but with a little experiment, I've concluded that the accuracy of the timer is 20µs with those headers present, which is 50x better than the accuracy you get by default.

(() => {
  const start = performance.now();
  let diff;
  while ((diff = performance.now() - start) === 0);
  return diff;
})();

This returns 0.02 or a value very close to 0.02 with those headers, and 1 without.

Maxinemaxiskirt answered 29/1, 2021 at 18:20 Comment(2)
These headers now also give improved timer precision in Chrome, see web.dev/coop-coepImelda
Also Safari 15.2 (both iOS and Mac) gets 0.02ms intervals with those headers, and 1ms without. Chrome gets down to 0.005ms intervals with those headers enabled (Chrome 97 on M1 Mac seems to have a bug resulting in 0.125ms intervals, but this seems fixed in Chrome Canary).Imelda
K
6

Firefox does have a config setting called privacy.reduceTimerPrecision that disables the Spectre mitigation. You can switch it to false using Firefox's about:config page (type about:config into the address bar).

Figured this out via a hint on MDN.

Kennard answered 26/6, 2018 at 23:1 Comment(1)
I read about that option in FireFox back when I posted this question but it didn't seem implemented yet at the time. It's there now though. I should note that toggling that doesn't actually disable the Spectre mitigation entirely, it just clamps it to 20µs which is what the mitigation was in v59. Still a lot better than the default clamping to 1-2ms, but not quite the 5µs that it used to be..Skidway

© 2022 - 2024 — McMap. All rights reserved.