What is microbenchmarking?
Asked Answered
R

6

90

I've heard this term used, but I'm not entirely sure what it means, so:

  • What DOES it mean and what DOESN'T it mean?
  • What are some examples of what IS and ISN'T microbenchmarking?
  • What are the dangers of microbenchmarking and how do you avoid it?
    • (or is it a good thing?)
Rhotacism answered 16/5, 2010 at 5:41 Comment(3)
It's bedtime for me, so here's just a dumb comment with a link to get you started with reading the material: java.sun.com/docs/hotspot/HotSpotFAQ.html (check the "Benchmarking" chapters at bottom of TOC).Burrstone
Only 1 millionth as useful as benchmarking :-)Kiker
code.google.com/p/caliper/wiki/JavaMicrobenchmarksRhotacism
N
109

It means exactly what it says on the tin can - it's measuring the performance of something "small", like a system call to the kernel of an operating system.

The danger is that people may use whatever results they obtain from microbenchmarking to dictate optimizations. And as we all know:

We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil" -- Donald Knuth

There can be many factors that skew the result of microbenchmarks. Compiler optimizations is one of them. If the operation being measured takes so little time that whatever you use to measure it takes longer than the actual operation itself, your microbenchmarks will be skewed also.

For example, someone might take a microbenchmark of the overhead of for loops:

void TestForLoop()
{
    time start = GetTime();

    for(int i = 0; i < 1000000000; ++i)
    {
    }

    time elapsed = GetTime() - start;
    time elapsedPerIteration = elapsed / 1000000000;
    printf("Time elapsed for each iteration: %d\n", elapsedPerIteration);
}

Obviously compilers can see that the loop does absolutely nothing and not generate any code for the loop at all. So the value of elapsed and elapsedPerIteration is pretty much useless.

Even if the loop does something:

void TestForLoop()
{
    int sum = 0;
    time start = GetTime();

    for(int i = 0; i < 1000000000; ++i)
    {
        ++sum;
    }

    time elapsed = GetTime() - start;
    time elapsedPerIteration = elapsed / 1000000000;
    printf("Time elapsed for each iteration: %d\n", elapsedPerIteration);
}

The compiler may see that the variable sum isn't going to be used for anything and optimize it away, and optimize away the for loop as well. But wait! What if we do this:

void TestForLoop()
{
    int sum = 0;
    time start = GetTime();

    for(int i = 0; i < 1000000000; ++i)
    {
        ++sum;
    }

    time elapsed = GetTime() - start;
    time elapsedPerIteration = elapsed / 1000000000;
    printf("Time elapsed for each iteration: %d\n", elapsedPerIteration);
    printf("Sum: %d\n", sum); // Added
}

The compiler might be smart enough to realize that sum will always be a constant value, and optimize all that away as well. Many would be surprised at the optimizing capabilities of compilers these days.

But what about things that compilers can't optimize away?

void TestFileOpenPerformance()
{
    FILE* file = NULL;
    time start = GetTime();

    for(int i = 0; i < 1000000000; ++i)
    {
        file = fopen("testfile.dat");
        fclose(file);
    }

    time elapsed = GetTime() - start;
    time elapsedPerIteration = elapsed / 1000000000;
    printf("Time elapsed for each file open: %d\n", elapsedPerIteration);
}

Even this is not a useful test! The operating system may see that the file is being opened very frequently, so it may preload it in memory to improve performance. Pretty much all operating systems do this. The same thing happens when you open applications - operating systems may figure out the top ~5 applications you open the most and preload the application code in memory when you boot up the computer!

In fact, there are countless variables that come into play: locality of reference (e.g. arrays vs. linked lists), effects of caches and memory bandwidth, compiler inlining, compiler implementation, compiler switches, number of processor cores, optimizations at the processor level, operating system schedulers, operating system background processes, etc.

So microbenchmarking isn't exactly a useful metric in a lot of cases. It definitely does not replace whole-program benchmarks with well-defined test cases (profiling). Write readable code first, then profile to see what needs to be done, if any.

I would like to emphasize that microbenchmarks are not evil per se, but one has to use them carefully (that's true for lots of other things related to computers)

Negotiation answered 16/5, 2010 at 5:41 Comment(6)
Good comment, though Knuth meant that premature consideration of optimizations should not affect DESIGN (rather than "dictate optimizations"). Catering the design to the result of early benchmarks often results in inflexible design. en.wikipedia.org/wiki/Program_optimizationOlly
Correct, but I may add that how someone goes about optimizing a program can affect its design. The point I'm trying to get across is that microbenchmarking rarely gives you useful information.Negotiation
Should these programs really print "overhead", when what is printed is not the overhead but the entire time per iteration?Burghley
I changed it to Time elapsed for <whatever>, which I suppose is the more accurate term for what we're measuring. But with microbenchmarks, what you're measuring may have nothing to do with the actual code itself!Negotiation
Actually Knuth was referring to performance optimization done with very little real understanding of the software execution.Cayuga
"see that the file is being opened very frequently, so it may preload it". A file's metadata will still be hot in the VFS cache right after opening / closing it, for any real real OS. It doesn't have to detect any special pattern or anything, just work like a normal pseudo-LRU cache for filesystem metadata (e.g. Linux VFS cache) and data (pagecache). There's nothing amazing or special about it. It's a well-known fact that e.g. time find /usr/lib > /dev/null runs way faster the second time.Nympholepsy
K
11

Note: This is a Java-centric answer, given the tags that the OP used.

There is no definition of micro-benchmarking, but when I use it I mean a small artificial benchmark designed to test the performance of some specific hardware1 or language feature. By contrast, a better (Java) benchmark is a real program designed to perform a real task. (Drawing a hard line between the two cases is pointless, IMO, and I won't try.)

The danger of micro benchmarking is that it is easy to write a benchmark that gives results that are totally misleading. Some common traps in Java micro-benchmarks are:

  • writing code that the compiler can deduce does not useful work, and therefore optimize away entirely,
  • not taking account of the "lumpy" nature of Java memory management (i.e. garbage collection), and
  • not taking account of JVM startup effects; e.g. the time take to load and JIT compile classes, and (conversely) the execution speedup that happens once the methods have been JIT compiled.

However, even once you have addressed the issues above, there is a systemic issue with micro-benchmarking that is impossible to address. The code and behavior of a micro-benchmark frequently bears little relation to what you really care about; i.e. how your application is going to perform. There are far too many "hidden variables" for you to be able to generalize from a micro-benchmark to typical Java programs, let alone to your program.

For these reasons, we regularly advise people not to waste their time with micro-benchmarks. Instead, it is better to write simple and natural code, and use a profiler to identify areas that need to be hand optimized. It usually turns out2 that the most significant performance problems in typical real-world applications are due to bad design of data structures and algorithms (including networking, database and threading-related bottlenecks) rather than the kind of fine-grained things that typical micro-benchmarks are trying to test.

@BalusC has provided an excellent link to material on this topic in the Hotspot FAQ page. And here is a link to an IBM whitepaper by Brian Goetz.

See also: How do I write a correct micro-benchmark in Java?


1 - Experts would not even try to do hardware benchmarking in Java. There are too many "complex things" happening between the bytecodes and the hardware to draw valid / useful conclusions about hardware from the raw results. You would be better off using a language that is closer to the hardware; e.g. C or even assembly code.
2 - I assert this without providing any supporting evidence. And there are exceptions, of course.

Kiker answered 16/5, 2010 at 5:46 Comment(0)
L
6
  • What DOES it mean and what DOESN'T it mean?

I would say micro-benchmarking simply means measuring something tiny. Tiny is probably context dependent, but typically on the level of a single system call or something similar. Benchmarking refers to everything above.

  • What are some examples of what IS and ISN'T microbenchmarking?

This (archived) article lists measuring time of a getpid() system call and measuring the time to copy memory using memcpy() as examples of micro-benchmarking.

Any measurement of an algorithm implementation etc would not count as micro-benchmarking. Especially result reports listing tasks with decreasing execution time probably seldom count as micro benchmarking.

  • What are the dangers of microbenchmarking and how do you avoid it?

The obvious danger is that it tempts developers to optimize the wrong parts of a program. Another danger is that it is notoriously difficult to do measurements of something small accurately. The easiest way to avoid it is probably just to get a good picture of where the most time is spent in the program.

People usually say "don't do micro-benchmarking" but what they probably mean is "don't make optimization decisions based on micro-benchmarks".

  • (or is it a good thing?)

It's not at all a bad thing per se as others here, and many webpages seem to suggest. It has it's places. I work with program rewriting and runtime aspect weaving etc. We usually publish micro-benchmarks of our added instructions, not to guide any optimizations, but making sure that our extra code has close to no impact on the execution of the rewritten program.

It's an art however, especially in the context of a VM that has JIT, warm-up times etc. A well described approach for Java is descrbed here (archived).

Lordship answered 16/5, 2010 at 6:30 Comment(1)
Re: warm-up and so on: see Idiomatic way of performance evaluation? for some of the pitfalls of failing to do that on modern CPUs and OSes.Nympholepsy
L
5

The book 'Java Performance: The Definitive Guide' has this definition and example about microbenchmarks:

  1. Microbenchmarks

    A microbenchmark is a test designed to measure a very small unit performance: the time to call a synchronized method versus a non-synchronized method; the overhead in creating a thread versus using a thread pool; the time to execute one arithmetic algoritm versus an alternate implementation; and so on.

    Microbenchmarks may seem like a good idea, but they are very difficult to write correctly. Consider the following code, which is an attempt to write a microbenchmark that tests the performance of different implementations of a method to compute the 50th Fibonacci number:

public void doTest(){
double l;
long then = System.currentTimeMillis();

for(int i = 0; i < nLoops; i++){
 l = fibImpl1(50);
}

long now = system.currentTimeMillis();
System.out.println("Elapsed time: " + (now - then))

}

...

private double fibImpl1(int n){
if(n < 0) throw new IllegalArgumentException("Must be > 0");
if(n == 0) return 0d;
if(n == 1) return 1d;
double d = fibImpl1(n - 2) + fibImpl(n - 1);
if(Double.isInfinited(d)) throw new ArithmeticException("Overflow");
return d;
}

Microbenchmarks must use their results.

The biggest problem with this code is that it never actually changes any program state. Because the result of the Fibonacci calculation is never used, the compiler is free to discard that calculation, A smart compiler (including current Java 7 and 8 compilers) will end up executing this code:

long then = System.currentTimeMillis();
long now = System.currentTimeMillis();
System.out.println("Elapsed time: " + (now - then));

As a result, the elapsed time will be only a few milliseconds, regardless of the implementation of the Fibonacci method, or the number of times the loop is supposed to be executed.

There is a way around that particular issue: ensure that each result is read, nor simply written. In practice, changing the definition of l from a local variable to an instance variable (declared with the volatile keyword) will allow the performance of the method to be measured.

Laid answered 3/9, 2016 at 15:57 Comment(1)
You pretty much always need to look at the assembly language output of an optimizing compiler to make sure your microbenchmark is really measuring what you intended. It's really easy to something to optimize away that you didn't intend. I definitely agree that they're hard to write correctly. So many perf questions on SO get comments like "why not measure it yourself?", as if it was easy for someone to measure something they don't even fully understand.Nympholepsy
I
2

Here are some good articles by Brian Goetz that explain why (micro)benchmarking is especially hard in Java:

Israelite answered 16/5, 2010 at 15:28 Comment(0)
S
-1

Microbenchmarking is benchmarking I don't think is worthwhile. Effective benchmarking is benchmarking I think is worth the time.

Generally speaking, microbenchmarking is (as in silico says) attempting to measure the performance of some very granular task, which is both hard to do well and usually pointless in the context of actual performance headaches.

Schorl answered 16/5, 2010 at 6:4 Comment(2)
so you are operating under the definition that microbenchmarking serves no good at all, right? That's the impression that I get too, but I just didn't want to rule anything out, and it may actually be "useful" in some scenarios I would need to care about.Rhotacism
Micro-benchmarking has its placed in a performance engineers toolset. Unfortunately most engineers are not performance engineers which means you get flawed tests and results. A good micro-benchmark can reveal unit costs for various operations which can better serve analysis when complete benchmarks are not representative of your applications software & system execution model.Cayuga

© 2022 - 2024 — McMap. All rights reserved.