Pentaho kettle: how to set up tests for transformations/jobs?

Asked 3/4, 2012 at 12:49 Answered 14/7, 2015 at 18:10

I've been using Pentaho Kettle for quite a while and previously the transformations and jobs i've made (using spoon) have been quite simple load from db, rename etc, input to stuff to another db. But now i've been doing transformations that do a bit more complex calculations that i would now like to test somehow.

So what i would like to do is:

Setup some test data
Run the transformation
Verify result data

One option would probably be to make a Kettle test job that would test the transformation. But as my transformations relate to a java project i would prefer to run the tests from jUnit. So i've considered making a jUnit test that would:

Setup test data (using dbunit)
Run the transformation (using kitchen.sh from command line)
Verify result data (using dbunit)

This approach would however require test database(s) which are not always available (oracle etc. expensive/legacy db's) What I would prefer is that if I could mock or pass some stub test data to my input steps some how.

Any other ideas on how to test Pentaho kettle transformations?

Vestigial answered 3/4, 2012 at 12:49 Comment(2)

I don't understand what you mean by "This however would limit my tests to those databases that i have available on our test server." Aren't you always limited to those databases, given you are running on the test server? – Theis 5/4, 2013 at 15:25

I edited the question a bit to clarify. But anyhoo, What I meant was that I don't always have access to my input step databases (besides read-access to a real production db). So i cannot input any test data to those via dbunit etc. So that's why I would prefer mocking my input step data if possible somehow. – Vestigial 10/4, 2013 at 13:33

there is a jira somewhere on jira.pentaho.com ( i dont have it to hand ) that requests exactly this - but alas it is not yet implemented.

So you do have the right solution in mind- I'd also add jenkins and an ant script to tie it all together. I've done a similar thing with report testing - I actually had a pentaho job load the data, then it executed the report, then it compared the output with known output and reported pass/failure.

Cresa answered 3/4, 2012 at 15:21 Comment(0)

If you separate out your kettle jobs into two phases:

load data to stream
process and update data

You can use copy rows to result at the end of your load data to stream step, and get rows from result to get rows at the start of your process step.

If you do this, then you can use any means to load data (kettle transform, dbunit called from ant script), and can mock up any database tables you want.

I use this for testing some ETL scripts I've written and it works just fine.

Theis answered 5/4, 2013 at 15:25 Comment(3)

This is an ok solution which I've also used. In addition to separating load and process parts I also separated the output to a output stream part so I can redirect the output in a test situation to a results file which can be asserted. But again I would be better if one could mock the actual input/output transformations with test data some how because there are often problems in the input and output transformations which in this approach do not get tested. – Vestigial 10/4, 2013 at 19:13

hannesh, I try to make input/output transforms as dumb as possible, which minimizes the problems that pop up there. Can you give an example of such a problem? – Theis 12/4, 2013 at 11:39

I wrote a series of blog posts about how to test kettle ETL transforms: mooreds.com/wordpress/archives/1061 – Theis 10/5, 2013 at 2:28

You can use the data validator step. Of course is not a full unit test suite, but i think sometimes will be useful to check the data integrity in a quick way. You can run more than several tests at once.

For a more "serious" test i will recommend the @codek answer and execute your kettles under Jenkins.

data validator step screenshot

Bricebriceno answered 14/7, 2015 at 18:10 Comment(0)

Recommended topics

Hot tags