JMH: Using the same static object in all Benchmark tests
Asked Answered
H

2

13

I have a class that constructs some complicated data (imagine a large XML or JSON structure - that sort of thing). Constructing it takes time. So I want to construct it once, and then use that same data in all tests. Currently I basically have a public static object instance defined in a class that defines main, and then refer to it explicitly in the tests (code is a very simplified example):

public class Data 
{
    // This class constructs some complicated data 
}

public class TestSet 
{
    public static final Data PARSE_ME = new Data(...);

    public static void main(String[] args) throws RunnerException 
    {
        Options opt = new OptionsBuilder()
                .include(".*ParserTest") // several tests
                .forks(1)
                .build();

        new Runner(opt).run();
    }
}

@State(Scope.Thread)
public class SomeParserTest
{
    @Setup(Level.Iteration)
    public void setup()
    {
        Parser parser = new Parser(TestSet.PARSE_ME);
    }

    @Benchmark
    public void getId() 
    {
        parser.getId(123);
    }
}

And this is awful of course... An equally evil option would be creating a separate class just so that it can hold a single static object. It would be nice to use something like

Options opt = new OptionsBuilder()
    ...
    .param(/*my Data object comes here*/)

but param only accepts Strings, so not sure how would I pass an object (and more importantly: the same instance of the object!) to it.

So is there anything more elegant, than a global object, I described above?

Humblebee answered 18/11, 2015 at 21:31 Comment(2)
From what I understand a @State-annotated class is the canonical way to do this sort of thing with JMH.Surrogate
@ach: that would be great too (namely @State(Scope.Benchmark)), as long as multiple tests can share it. Question is how do I do that? (on top of everything Java does not pass annotations to children, so even if all tests inherited a class with @State annotation, still would not helpHumblebee
W
19

Unfortunately, JMH provides no way to share data between the benchmarks.

For one thing, this breaks benchmark isolation, when one benchmark can silently modify the input data for another benchmark, rendering the comparison incorrect. This is why you are required to @Setup the @State object for every benchmark.

But more importantly, whatever trick you build to share the data between the benchmarks (e.g. static field accessible from both) would break in the default "forked" mode, when JMH will execute each test in its own VM. Notably, the thing you are suggesting with static final Data TestSet.PARSE_ME would actually execute for each @Benchmark, since every new VM instance would have to initialize TestSet anyhow ;) Granted, you may disable forking, but that introduces more problems than it solves.

Therefore, it may be a better idea to invest time into making the setup costs more tolerable, so that it is not excruciatingly painful. E.g., deserialize the data from disk instead of computing it. Or, just come up with a faster way to compute.

Workaday answered 19/11, 2015 at 15:47 Comment(10)
Thanks, especially for forking argument, I didn't think about that aspect.Humblebee
Can't you share state between benchmarks by making them all part of the same group and using Group state scope?Surrogate
@ach: That will run benchmarks concurrently, not one after another.Workaday
@AlekseyShipilev I'm not seeing anything in the question that would indicate this would be a problem -- might be acceptable to the OP in this scenario. Of course AFAIK the state would be initialized once for each fork which still might be too expensive.Surrogate
@ach: I think OP means exactly that: running the different tests one after another with a single dataset. Thinking that concurrent mode is something OP wants is a stretch.Workaday
@AlekseyShipilev I was just suggesting it. I recently had a requirement for multiple benchmarks to share the same randomized data set for each run, and the Group state scope accomplished that nicely -- the fact that the tests were asymmetric wasn't a real issue for me.Surrogate
@ach: I think that is a questionable (at best) suggestion. Running tests concurrently gives you different picture from running tests back-to-back, especially if the original tests were supposed to be single-threaded. "Sharing same randomized data" is accomplished by statically-seeded PRNGs for all tests, not by @State(Group) abuse.Workaday
@AlekseyShipilev what if you you need to share data. let me try to explain: you want to build some pseudo-random String (for example) that is shared by all @Benchmark methods via some @State(Scope.Thread) class, BUT it has to be the same for each of them. so if I do @Setup {// ThreadLocalRandom.current... build String form this} - each calling Thread will get it's own instance of the String for each of it's Benchmarks, right? What if I want that each benchmarks gets as input identical data all the time? Do I make any sense?Saldivar
@Eugene: You can use the java.util.Random with the same seed to generate data per thread. ThreadLocalRandom would start from different seeds in each thread. But if that String is immutable, it would not really hurt to share it with @State(Benchmark), wouldn't it?Workaday
@AlekseyShipilev that is correct, agreed (ThreadLocalRandom was probably a bad example). Let's say I have two methods I want to benchmark. method1 and method2 would get first pseudoRandomString1 and measure that, then get pseudoRandomString2 and measure that and so on - the data has to be the same for each test. This is important because different data for each test would mean the results are skewed pretty bad.Saldivar
S
0

I would just move all of your benchmark methods into one class, define your state object as an inner class, and inject the state to each of them:

public class ParserBenchmarks {

  @State(Scope.Thread)
  public static class StateHolder {
    Parser parser = null;

    @Setup(Level.Iteration)
    public void setup()
    {
      parser = new Parser(TestSet.PARSE_ME);
    }

    public Parser getParser() {
      return parser;
    }
  }

  @Benchmark
  public int getId_123(StateHolder stateHolder) {
    return stateHolder.getParser().getId(123);
  }

  @Benchmark
  public int getId_456(StateHolder stateHolder) {
    return stateHolder.getParser().getId(456);
  }

}

Note that all of your benchmark methods should return values otherwise the compiler could eliminate it as dead code.

Surrogate answered 19/11, 2015 at 14:17 Comment(2)
Thanks. Unfortunately this solution is also prone to the forking problem, which is mentioned in the second answer.Humblebee
Even here the setup() gets executed before every benchmark (within a single fork, even with @Setup(Level.Trial))Sudatory

© 2022 - 2024 — McMap. All rights reserved.