JUnit Testing Cassandra with embedded server
Asked Answered
A

6

19

What is the best approach to write unit tests for code that persists data to nosql data store, in our case cassandra?

=> We are using embedded server approach using a utility from git hub (https://github.com/hector-client/hector/blob/master/test/src/main/java/me/prettyprint/hector/testutils/EmbeddedServerHelper.java). However I have been seeing some issues with this. 1) It persists data across multiple test cases making it hard for us to make sure data is different in test cases of a test class. I tried calling cleanUp @After each test case, but that doesn't seem to cleanup data. 2) We are running out of memory as we add more tests and this could be because of 1, but I am not sure yet on that. I currently have 1G heap size to run my build.

=> The other approach I have been thinking is to mock the cassandra storage. But that might leak some issues in the cassandra schema as we often found the above approach catching issues with the way data is stored into cassandra.

Please let me know you thoughts on this and if anyone has used EmbeddedServerHelper and are familiar with the issues I have mentioned.


Just an update. I was able to resolve 2) running out of java heap space issue when running builds by changing the in_memory_compaction_limit_in_mb parameter to 32 in the cassandra.yaml used by the test embedded server. The below link helped me http://www.datastax.com/docs/0.7/configuration/storage_configuration#in-memory-compaction-limit-in-mb. It was 64 and started to fail consistently during compaction.

Axel answered 7/7, 2011 at 14:20 Comment(1)
I'll be very interested to hear your experience with the recent in-memory feature for testing: datastax.com/2014/02/why-we-added-in-memory-to-cassandraCounterclockwise
C
10

We use an embedded cassandra server, and I think that is the best approach when testing cassandra, mocking the cassandra API is too error prone.

EmbeddedServerHelper.cleanup() just removes files rom the file system, but data may still exist in memory.

There is a teardown() method in EmbeddedServerHelper, but I a not sure how effective that is, as cassandra has a lot of static singletons whose state is not cleaned up by teardown()

What we do is we have a method that calls truncate on each column family between tests. That will remove all data.

Cacilie answered 9/7, 2011 at 14:31 Comment(1)
Yes, we were here thinking in the same lines ie. truncate column families after each test. Thank you.Axel
F
7

I think you can take a look at cassandra-unit : https://github.com/jsevellec/cassandra-unit/wiki

Franfranc answered 13/10, 2011 at 18:39 Comment(5)
Please disclose your affiliation in your answer. See the FAQ for the policy on this.Huzzah
Perfect - thanks for writing and sharing this. Cassandra-unit is working well so far.Melodeemelodeon
Take into account that cassandra-unit is licenced under GPLv3.Lawler
@Marcin, FYI I asked the developer to confirm the license (there was some inconsistency across various project files), and they confirmed it is LGPLv3.Vermicelli
what is the equivalent for .net?Dalury
I
3

I use the Mojo Cassandra maven plugin.

Here's an example plugin configuration that I use to spin up a Cassandra server for use by my unit tests:

 <build>
    <plugins>
        <plugin>
            <groupId>org.codehaus.mojo</groupId>
            <artifactId>cassandra-maven-plugin</artifactId>
            <version>1.1.0-1</version>
            <executions>
                <execution>
                    <goals>
                        <goal>start</goal>
                        <goal>flush</goal>
                        <goal>cleanup</goal>
                    </goals>
                    <phase>compile</phase>
                </execution>
            </executions>
        </plugin>
     <plugins>
  <build>

I did manage to get Hector's embedded server helper class working which can be very useful, however I ran into classloader conflicts due to this bug.

Impeachable answered 1/12, 2012 at 16:10 Comment(0)
G
2

You cannot restart Cassandra instance within one VM - Cassandra has "shutdown per kill policy" due to singeltons that they are using.

You also do not need to restart Casandra, just only remove all column families (CFs). In order to remove CF you need first to flush data, compact it and after that finally you can drop it.

This code will connect to embedded Cassandra and execute required cleaup:

private void cleanAndCompact() throws Exception {
    MBeanServer mbs = ManagementFactory.getPlatformMBeanServer();
    ObjectName ssn = new ObjectName("org.apache.cassandra.db:type=StorageService");
    StorageServiceMBean ssmb = JMX.newMBeanProxy(mbs, ssn, StorageServiceMBean.class);

    List<String> keyspaces = ssmb.getKeyspaces();
    if (keyspaces == null) {
        LOG.info("No keysaces to cleanup");
        return;
    }

    for (String keyspace : keyspaces) {
        if (keyspace.equalsIgnoreCase("system")) {
            continue;
        }
        execCleanup(ssmb, keyspace);
    }

}

private void execCleanup(StorageServiceMBean ssmb, String keyspace) throws Exception {
    LOG.info("Cleaning up keyspace: " + keyspace);

    ssmb.invalidateKeyCaches(keyspace, new String[0]);
    ssmb.invalidateRowCaches(keyspace, new String[0]);
    ssmb.forceTableFlush(keyspace, new String[0]);
    ssmb.forceTableCompaction(keyspace, new String[0]);
    ssmb.forceTableCleanup(keyspace, new String[0]);
}

Now execute CLI drop CF script:

CliMain.main(new String[] { "-host", host, "-port", Integer.toString(rpcPort), "-f", "/my/script/path/script.txt","-username", "myUser", "-password", "123456" });

and script.txt could have:

use ExampleTestSpace;
drop column family ExampleCF;
Gardas answered 30/11, 2011 at 15:26 Comment(0)
P
0

By "doesn't seem to clean up data" what exactly do you mean? That you still see your data in the database?

That problem might be due to Cassandra that doesn't delete the "values" instantly, but only after the gc_grace_seconds seconds are passed (that usually defaults to 10 days). Cassandra marks the values to be deleted.

Pelecypod answered 8/7, 2011 at 12:55 Comment(2)
I misinterpreted cleanup to delete the data that was created in the test cases. But cleanup is only meant to do some housekeeping and remove all the commit logs and data directories created by the embedded cassandra process.Axel
without cleanup you will be not able to drop your CF - drop request will simply do nothing and create request will throw exception that CF already existsGardas
A
0

In addition to what's been posted, there are cases when you want to test error handling - how does your app behave when a Cassandra query fails.

There are a few libraries that can help you with this:

I'm the author of cassandra-spy and wrote to it help me test these cases.

Athos answered 2/8, 2017 at 10:50 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.