Do Go testing.B benchmarks prevent unwanted optimizations?

Asked 1/5, 2016 at 12:59 Answered 2/5, 2016 at 5:27

I've recently started learning Go and I'm trying to implement a map that can be used concurrently by multiple groutines. I want to be able to compare my implementation to a simple sync.Mutex-protected map, or to something like this: https://github.com/streamrail/concurrent-map/blob/master/concurrent_map.go

From using Google Caliper, I assume that a naive approach for benchmarking would allow many unwanted optimizations to trash the actual results. Are the benchmarks that use testing.B employing some of the techniques to avoid that (after all both Go and Caliper are Google projects)? If yes, are they known? If not, what's the best way to microbenchmark in Go?

Volume answered 1/5, 2016 at 12:59 Comment(3)

I'm not sure if this is what you're looking for. Read the section 'A note on compiler optimisations' from this link – Announcement 1/5, 2016 at 15:50

@JohnSPerayil, that's exactly what I'm looking for! I just wonder whether it's exhaustive (I assume it isn't), and whether it's still valid (I'm not sure). – Volume 1/5, 2016 at 21:37

It's probably valid but not exhaustive. – Announcement 2/5, 2016 at 3:30

Converting my comment to an answer.

To be completely accurate, any benchmark should be careful to avoid compiler optimisations eliminating the function under test and artificially lowering the run time of the benchmark.

var result int

func BenchmarkFibComplete(b *testing.B) {
        var r int
        for n := 0; n < b.N; n++ {
                // always record the result of Fib to prevent
                // the compiler eliminating the function call.
                r = Fib(10)
        }
        // always store the result to a package level variable
        // so the compiler cannot eliminate the Benchmark itself.
        result = r
}

Source

The following page can also be useful.

Compiler And Runtime Optimizations

Another interesting read is

One other interesting flag is -N, which will disable the optimisation pass in the compiler.

Source1 Source2

I'm not a 100% sure but the following should disable optimisations ? Someone with more experience needs to confirm it.

go test -gcflags=-N -bench=.

Announcement answered 2/5, 2016 at 5:27 Comment(1)

While I cannot confirm for all cases (or find release notes to explain), the above example no longer performs as the specified. The version without the side-effect is not optimized and both versions return the same value – Cacophony 23/7, 2019 at 23:19

In Java, micro benchmarks are harder to do due to how the Hotspot compiler works. If you simply just run the same code over and over, you will often find it gets faster which throws off your averages. To compensate, Caliper has to do some warmup runs and other tricks to try to get a stable benchmark.

In Go, things are statically compiled. There is no runtime Hotspot like system. It doesn't really have to do any tricks to get a good timing.

The testing.B functionality should have no impact on your code's performance, so you shouldn't have to do anything special.

Sinhalese answered 1/5, 2016 at 14:47 Comment(1)

Thanks, that sounds encouraging, although the link that John S Perayil has provided on his comment to my answer shows that at least some optimizations need to be fended off manually for a testing.B benchmark. The link is more than 2 years old though, so things may have changed by now. – Volume 1/5, 2016 at 21:34

@David Budworth gives a lot of good info, and I agree regarding Go vs Java, but there still are many things you have to consider in microbenchmarking. Most of them boil down to "how closely does this match your use case?" For example, different concurrency patterns perform very differently under contention. Do you expect multiple simultaneous writers to be common? Single writer, many readers? Many readers, rare writing? Single-access? Different producers/consumers accessing different parts of the map? A scheme that performs beautifully in your benchmark may be rubbish for other use cases.

Similarly you may discover that your scheme is or isn't very dependent on locality of reference. Some approaches perform very differently if the same values are being read over and over again (because they stay in the on-CPU caches). This is very common in microbenchmarks, but may not be very indicative of your intended use case.

This isn't to say microbenchmarks are useless, only that they are very often almost useless :D … at least for arriving at general conclusions. If you're building this for a particular project, just make sure that you're testing against realistic data and patterns that match your use case (and ideally just turn this into a real benchmark for your program, rather than a "microbenchmark" of the data structure). If you're building this for general use, you'll need to make sure you're benchmarking against a wide range of use cases before coming to too many conclusions on whether it is substantially better.

And if it's just educational, awesome. Learning why a particular scheme works better or worse in various situations is great experience. Just don't push your findings past your evidence.

Wigeon answered 1/5, 2016 at 15:19 Comment(2)

Thank you, that's a very nice write-up and I agree with practically all of it, but I don't think it answers my question :) – Volume 1/5, 2016 at 21:4

To distill down to the root question of how to best microbenchmark in Go, the answer is "use real data in a test that matches your use case." If the only question is "how to avoid Java-like hotspot optimization distortions," see David's answer (which is exactly right). There aren't any in Go, so there is no problem. John's link (and the links from there) is also excellent and probably is spot on for your question. – Wigeon 1/5, 2016 at 21:9

Recommended topics

Hot tags