dbunit best practices for performance

Asked 10/12, 2011 at 5:45 Answered 15/12, 2011 at 9:24

Solved java testing junit integration-testing dbunit

What are some best practices/principles to follow, beyond those recommended on the actual dbunit site, that can greatly speed up tests as well as keep them maintainable? I long for a library like factory girl for java, but it doesn't look like it's possible because of the static typing.

My current thinking is to have 1 xml dataset per test class at this point - maybe I share some of these, and maybe I don't. While some test data might be duplicated acrossed datasets, I'm finding it too hard to maintain shared datasets across 3000 unit/integration tests - and I've got a long more to go.

Would appreciate any principles to follow that lead to tests that perform well and easy to maintain.

Calves answered 10/12, 2011 at 5:45 Comment(3)

The question is about performance, but the real concern seems to be 'making them maintainable'. IMHO, you should focus absolutely on maintainability and increase the performance via adding more computing power. – Pluviometer 15/12, 2011 at 3:25

Be warned that if you use multiple small datasets with DBUnit, you can run into a nasty problem of random failures. I wrote a blog post explaining why and how to work around it. – Weinstein 16/12, 2011 at 17:31

If you long for a Factory Girl for Java, take a look at github.com/mguymon/model-citizen – Baldwin 4/9, 2012 at 6:44

In one of my previous assignment we had hundreds of integration tests involving data sets, though not in DBUnit — the test environment was written from scratch, as it was A Very Big Company That Can Afford This Kind Of Stuff.

The data sets were organized hierarchically. The system under test consisted of a few (5-10) modules and the test data followed that pattern. A unit test script looked like this:

 include(../../masterDataSet.txt)
 include(../moduleDataSet.txt)

 # unit-specific test data
 someProperty=someData

The property names were mapped directly to DB records by some bizarre tool I can't remember.

Same pattern may be applied to DBUnit tests. In master data set you could place records always need to be — like dictionaries, initial load of the database, as if it were to be installed from scratch.

In module data set you'd put records covering test cases of a majority of tests in a module; I don't suppose an average test of yours involves all of your 70 database tables, does it? You surely must have some functionality groups that could constitute a module, even if the application is monolithic. Try to organize module-level test data around it.

Finally, on the test level, you'd only amend your test set with a minimal number of records needed for this particular tests.

This approach has the enormous benefit of learning; because there are few data files, in time, you actually begin to memorize them. Instead of seeing hundreds of big data sets that differ only by unnoticeable details (which you have to find out each time you come back to a test after a while), you can easily tell how different any two data sets are.

A word on performance at the end. On my 2.4 GHz 2-core WinXP machine a DBUnit test involving:

dropping 14 tables,
creating 14 tables,
inserting ca. 100 records,
performing the test logic,

takes 1-3 seconds. Logs show that the first 3 operations take less than a second, most of the test time is consumed by Spring. This logic is performed by each test, to avoid test order dependencies. Everything runs in one VM with embedded Derby, this is probably why it's so fast.

EDIT: I think DBUnit XML data sets don't support inclusion of other test files, it can be overcome by using a base class for all integration tests, e.g.:

public class AbstractITest {

    @Before
    public void setUp() throws Exception {
        //
        // drop and recreate tables here if needed; we use 
        // Spring's SimpleJdbcTemplate executing drop/create SQL
        //
        IDataSet masterDataSet = new FlatXmlDataSetBuilder().build("file://masterDataSet.xml");
        DatabaseOperation.CLEAN_INSERT.execute(dbUnitConnection, dataSet);
    }
}

public class AbstractModuleITest extends AbstractITest {

    @Before
    public void setUp() throws Exception {
        super.setUp();
        IDataSet moduleDataSet = new FlatXmlDataSetBuilder().build("file://moduleDataSet.xml");
        DatabaseOperation.CLEAN_INSERT.execute(dbUnitConnection, moduleDataSet);
    }
}

public class SomeITest extends AbstractModuleITest {
    // The "setUp()" routine only here if needed, remember to call super.setUp().

    @Test
    public void someTest() { ... }
}

Springlet answered 15/12, 2011 at 9:24 Comment(3)

I love this idea. I will go find out if dbunit supports this sort of thing. – Calves 15/12, 2011 at 22:12

@Calves DBUnit XML data sets probably don't support this directly; see my edit. – Springlet 16/12, 2011 at 8:28

I like your idea a lot. I think creating a master dataset full of dictionary information is awesome. However, I'm not such a fan of the Module datasets. I think it's easy to group tests into modules based on common data today, but those may change in the future. Creating a dependance to a module dataset seems like it would be a huge headache if you ever had to change your design and not worth the benefit. I could see where it would work though. I guess it really depends on your data model. – Summertree 22/10, 2015 at 14:48

The recommendation in Junit in Action 2e is actually not to create too many datasets (like one per test class), but just enough that is considered maintainable. Except for a few exceptional cases, I found it possible to use a master dataset for most unit tests, and individual datasets for integration tests. Limiting the usage of ExpectedDataSets is also an option.

Also, I used Unitils in combination with dbunit to simplify some of the setup and loading of test data, so you might want to consider it where appropriate.

Bergson answered 10/12, 2011 at 6:58 Comment(5)

I think the book's wrong if that is it's recommendation. When you have a database with 70 tables, and you have so many different boolean fields on flags, tons of search criteria you need to test for, and on and on... changing 1 row can cause a massive amount of tests to break. This is bad, and I think it's a waste of time. Maybe using a builder pattern in code, and then saving them to the database with cascades is the answer. I'm not sure. – Calves 10/12, 2011 at 7:4

Clarification: using a single master dataset for most unit tests is what I chose I do in that specific situation. The book only recommends that you maintain only as many datasets as you are ready to maintain, so instead of 1 for each class/table, you can consider a less rigorous ratio. – Bergson 10/12, 2011 at 7:9

My current thinking is that it wouldn't be 1:1, but it at least 50% of the tests would be 1:1 and the rest would share them. As my application gets more and more complex, I need less coupling with the datasets. I have 5 datasets that build 1 database currently - and every test uses 1 to 5 datasets. It is designed to act as 1 database. I think this was a mistake. It is leading me down a path of test dependency hell. – Calves 10/12, 2011 at 9:19

so currently you are using no more than 5 unique datasets for 3000 different tests? That's definitely not to be recommended since datasets should be specific to the test as much as possible (to a reasonable and maintainable extent). You can continue to use them for certain integration tests but for unit tests I would try to have table-specific datasets (even if it's going to be in the order of 70) – Bergson 10/12, 2011 at 9:39

I think I'm looking for some good principles to follow that have been tried and succeeded for large projects. Like when to make dataset, where to put it, how much data it should contain, and so on. – Calves 10/12, 2011 at 10:23

Recommended topics

Hot tags