Testing thread safety fails with Spock
Asked Answered
G

1

9

The subject

I have some code that is decidedly not thread safe:

public class ExampleLoader
{
    private List<String> strings;

    protected List<String> loadStrings()
    {
        return Arrays.asList("Hello", "World", "Sup");
    }

    public List<String> getStrings()
    {
        if (strings == null)
        {
            strings = loadStrings();
        }

        return strings;
    }
}

Multiple threads accessing getStrings() simultaneously are expected to see strings as null, and thus loadStrings() (which is an expensive operation) is triggered multiple times.

The problem

I wanted to make the code thread safe, and as a good citizen of the world I wrote a failing Spock spec first:

def "getStrings is thread safe"() {
    given:
    def loader = Spy(ExampleLoader)
    def threads = (0..<10).collect { new Thread({ loader.getStrings() })}

    when:
    threads.each { it.start() }
    threads.each { it.join() }

    then:
    1 * loader.loadStrings()
}

The above code creates and starts 10 threads that each calls getStrings(). It then asserts that loadStrings() was called only once when all threads are done.

I expected this to fail. However, it consistently passes. What?

After a debugging session involving System.out.println and other boring things, I found that the threads are indeed asynchronous: their run() methods printed in a seemingly random order. However, the first thread to access getStrings() would always be the only thread to call loadStrings().

The weird part

Frustrated after quite some time spent debugging, I wrote the same test again with JUnit 4 and Mockito:

@Test
public void getStringsIsThreadSafe() throws Exception
{
    // given
    ExampleLoader loader = Mockito.spy(ExampleLoader.class);
    List<Thread> threads = IntStream.range(0, 10)
            .mapToObj(index -> new Thread(loader::getStrings))
            .collect(Collectors.toList());

    // when
    threads.forEach(Thread::start);
    threads.forEach(thread -> {
        try {
            thread.join();
        } catch (InterruptedException e) {
            e.printStackTrace();
        }
    });

    // then
    Mockito.verify(loader, Mockito.times(1))
            .loadStrings();
}

This test consistently fails due to multiple calls to loadStrings(), as was expected.

The question

Why does the Spock test consistently pass, and how would I go about testing this with Spock?

Grundyism answered 17/9, 2016 at 23:35 Comment(6)
Why would one ever want to use anything than the original language of the code to write a test for the code? As you yourself found out this makes debugging a nightmare. Apart from the fact that the language may go ahead and the so beloved framework stay behind.Shell
@OlegSklyar I've found Spock/Groovy to be a great tool for testing most Java code in a terse and quick way. I think the problem here is with the implementation of the Spock framework itself, as Groovy compiles to Java bytecode and it's fairly easy to use the Java debugger with it. Also, this is a dumbed-down example of a problem I encountered in a work situation—I can neither ditch Spock/Groovy nor migrate the main code base from Java.Grundyism
Unfortunately I do not have an answer to your original question, so for me it is a philosophical discussion... But an introduction of any further framework increases complexity. That complexity has bitten you now. I believe the beautiful framework will also refactor all the tests when you decide to refactor the code. Surely it won't. Albeit self-made I had to deal with a similar framework in one of Java projects: it was nightmare to fix or amend tests until we have written a Java test-base library. Now all tests are comprehensible and refactorable.Shell
@OlegSklyar thanks for sharing, I'll keep that in mind next time we have a greenfield project :)Grundyism
For us this was not a greenfield project, it was a natural evolution of a beast... :) In fact we also became massively more productive and rocketed the test coverage by going back to basics: before nobody wanted to touch those tests unless failed, now everybody is happy to write an extra test for any new problem found. And yes, those were integration tests, for unit tests I do not see any point at all to go away from the original code...Shell
@OlegSklyar It's entirely ordinary to mix and match JVM languages; I routinely switch back and forth between Java and Groovy files in the same package. Using Spock to test pure Java is thoroughly mundane--even more so if something like Geb is involved. Your problem was rolling your own, not picking the best tool for the job.Freddie
B
11

The cause of your problem is that Spock makes methods it spies on synchronized. Particularly, the method MockController.handle(), through which all such calls go, is synchronized. You'll easily notice it if you add a pause and some output to your getStrings() method.

public List<String> getStrings() throws InterruptedException {
    System.out.println(Thread.currentThread().getId() + " goes to sleep");
    Thread.sleep(1000);
    System.out.println(Thread.currentThread().getId() + " awoke");
    if (strings == null) {
        strings = loadStrings();
    }
    return strings;
}

This way Spock inadvertently fixes your concurrency problem. Mockito obviously uses another approach.

A couple of other thoughts on your tests:

First, you don't do much to ensure that all your threads have come to the getStrings() call at the same moment, thus decreasing the probability of collisions. Long time may pass between threads start (long enough for the first one to complete the call before others start it). A better approach would be to use some synchronization primitive to remove the influence of threads startup time. For instance, a CountDownLatch may be of use here:

given:
def final CountDownLatch latch = new CountDownLatch(10)
def loader = Spy(ExampleLoader)
def threads = (0..<10).collect { new Thread({
    latch.countDown()
    latch.await()
    loader.getStrings()
})}

Of course, within Spock it will make no difference, it's just an example on how to do it in general.

Second, the problem with concurrency tests is that they never guarantee that your program is thread safe. The best you can hope for is that such test will show you that your program is broken. But even if the test passes, it doesn't prove thread safety. To increase chances of finding concurrency bugs you may want to run the test many times and gather statistics. Sometimes, such tests only fail once in several thousands or even once in several hundreds thousands of runs. Your class is simple enough to make guesses about its thread safety, but such approach will not work with more complicated cases.

Bot answered 18/9, 2016 at 7:39 Comment(1)
You, sir, are completely right. Thank you very much for this explanation! As for my tests, I'm aware that they're extremely naive and might accidentally fail (or pass) "randomly". Thank you for your suggestion!Grundyism

© 2022 - 2024 — McMap. All rights reserved.