Pentaho: How to dynamically add Field (= Column) to OutputRow?

Asked 27/2, 2013 at 14:31 Answered 3/5, 2013 at 10:45

I would like to dynamically add fields (or a new columns) to the resulting output row in Kettle.

After spending hours reading through froum posts and he not so well done scripting documentation, I wondered if Stackoverflow would be of any help.

Anatomical answered 27/2, 2013 at 14:31 Comment(0)

We can use the below steps to generate Dynamic column generation:

calculator
add constants.
Select required fields in table input and assign those values as a set variable and second transformtion level use get variables hop

Clodhopping answered 3/5, 2013 at 10:45 Comment(1)

I just voted for this answer as a solution as it sounds the most usable answer.... After evaluating pentaho for the given use case at the given time and the given client, we moved on to evaluate other "products" and after wards out of our developer-driven team to build something on our own. – Anatomical 24/9, 2014 at 14:5

How are your input values passed to the SQL query? if they are variables then just pass the table input step to a "get variables" step and get your new columns in that way.

Alternatively you can add columns using either calculator or add constants.

Or you could even use the "get system info" step to get commandline args and dates etc.

Certie answered 28/2, 2013 at 7:8 Comment(0)

First, let me give you a code snippet of what I have in a User Defined Java Class step:

private int fieldToHashGeoIndex;
private int fieldToHashHeadIndex;

public boolean processRow(StepMetaInterface smi, StepDataInterface sdi) throws KettleException 
{
  Object[] r=getRow();
  if (r==null)
  {
    setOutputDone();
        return false;
  }

  if (first) {
     fieldToHashGeoIndex = getInputRowMeta().indexOfValue(getParameter("FIELD_TO_HASH_GEO"));
     if (fieldToHashGeoIndex<0) {
         throw new KettleException("Field to hash not found in the input row, check parameter 'FIELD_TO_HASH_GEO'!");
     }
     fieldToHashHeadIndex = getInputRowMeta().indexOfValue(getParameter("FIELD_TO_HASH_HEAD"));
     if (fieldToHashHeadIndex<0) {
         throw new KettleException("Field to hash not found in the input row, check parameter 'FIELD_TO_HASH_HEAD'!");
     }

     first=false;
  }

  Object[] outputRowData = RowDataUtil.resizeArray(r, data.outputRowMeta.size());
  int outputIndex = getInputRowMeta().size();

  String fieldToHashGeo = getInputRowMeta().getString(r, fieldToHashGeoIndex);
  String fieldToHashHead = getInputRowMeta().getString(r, fieldToHashHeadIndex);
  outputRowData[outputIndex++] = MurmurHash.hash64(fieldToHashGeo);
  outputRowData[outputIndex++] = MurmurHash.hash64(fieldToHashHead);

  putRow(data.outputRowMeta, outputRowData);

  return true;
}

Now, normally you configure outputRowMeta from the step's config, but maybe you can modify it in the code. This should allow you to specify additional fields in the code.

As an alternative, you could latch on variable fields by defining fixed output fields on to the step like 'field1', 'field2', etc. and tracking the names of the fields elsewhere. You'd probably have to make all the fields of type String and then do your own type adjustments later.

Now that I think of it, though, variable output fields may lead to trouble: you have to be very careful with what you do in later steps to avoid having errors due to type mismatches or missing fields.

Debutante answered 28/2, 2013 at 14:28 Comment(0)

Recommended topics

Hot tags