Use JSON Input step to process uneven data
Asked Answered
R

2

11

I'm trying to process the following with an JSON Input step:

{"address":[
  {"AddressId":"1_1","Street":"A Street"},
  {"AddressId":"1_101","Street":"Another Street"},
  {"AddressId":"1_102","Street":"One more street", "Locality":"Buenos Aires"},
  {"AddressId":"1_102","Locality":"New York"}
]}

However this seems not to be possible:

Json Input.0 - ERROR (version 4.2.1-stable, build 15952 from 2011-10-25 15.27.10 by buildguy) : 
The data structure is not the same inside the resource! 
We found 1 values for json path [$..Locality], which is different that the number retourned for path [$..Street] (3509 values). 
We MUST have the same number of values for all paths.

The step provides Ignore Missing Path flag but it only works if all the rows misses the same path. In that case that step acts as as expected an fills the missing values with null.

This limits the power of this step to read uneven data, which was really one of my priorities.

My step Fields are defined as follows:

JSON Input Fields definition

Am I missing something? Is this the correct behavior?

Rotarian answered 22/2, 2012 at 15:37 Comment(0)
R
12

What I have done is use JSON Input using $.address[*] to read to a jsonRow field the full map of each element p.e:

{"address":[
    {"AddressId":"1_1","Street":"A Street"},  
    {"AddressId":"1_101","Street":"Another Street"},  
    {"AddressId":"1_102","Street":"One more street", "Locality":"Buenos Aires"},   
    {"AddressId":"1_102","Locality":"New York"} 
]}

This results in 4 jsonRows one for each element, p.e. jsonRow = {"AddressId":"1_101","Street":"Another Street"}. Then using a Javascript step I map my values using this:

var AddressId = getFromMap('AddressId', jsonRow);
var Street = getFromMap('Street', jsonRow);
var Locality = getFromMap('Locality', jsonRow);

In a second script tab I inserted minified JSON parse code from https://github.com/douglascrockford/JSON-js and the getFromMap function:

function getFromMap(key,jsonRow){
  try{
   var map = JSON.parse(jsonRow);
  }
  catch(e){
   var message = "Unparsable JSON: "+jsonRow+" Desc: "+e.message;
   var nr_errors = 1;
   var field = "jsonRow";
   var errcode = "JSON_PARSE";
   _step_.putError(getInputRowMeta(), row, nr_errors, message, field, errcode);
   trans_Status = SKIP_TRANSFORMATION;
   return null;
  }

  if(map[key] == undefined){
   return null;
  }
  trans_Status = CONTINUE_TRANSFORMATION;
  return map[key]
}
Rotarian answered 14/3, 2012 at 16:17 Comment(4)
it would be nice if you could copy answer elements here from your link :) (and accept your own answer btw)Easterling
Done! Was the answer useful to you?Rotarian
Four years late, but still - very nice idea. I'm still a bit disappointed that the JSON input step doesn't support this by design, because it's kind of expensive to first break down the object into elements and then compile a JS script. +1 for the resourcefulness.Introspection
10 years later. Pentaho still hasn't fixed this issue.Coagulate
D
3

You can solve this by changing the JSONPath and splitting up the steps in two JSON input steps. The following website explains a lot about JSONPath: http://goessner.net/articles/JsonPath/

$..AddressId

Does in fact return all the AddressId's in the address array, BUT since Pentaho is using grid rows for input and output [4 rows x 3 columns], it can't handle a missing value aka null value when you want as results return all the Streets (3 rows) and return all the Locality (2 rows), simply because there are no null values in the array itself as in you can't drive out of your garage with 3 wheels on your car instead of the usual 4.

I guess your script returns null (where X is zero) values like:

A S X
A S X
A S L
A X L

The scripting step can be avoided same by changing the Fields path of the first JSONinput step into:

$.address[*]

This is to retrieve all the 4 address lines. Create a next JSONinput step based on the new source field which contains the address line(s) to retrieve the address details per line:

$.AddressId
$.Street
$.Locality

This yields the null values on the four address lines when a address details is not available in an address line.

Degenerate answered 12/7, 2013 at 14:19 Comment(3)
I tested your suggestion. No luck, still failing. Check it out here pastebin.com/8Ez68rpVRotarian
Hi rsilva, you were right. I just made this work, by using a first json input step to get all the address lines and another one for the obtaining the Street / AddressId and Locality if available: pastebin.com/iRqNJMPFDegenerate
This worked well for me and seems more elegant than the accepted answer. Accepted one should work too, but it's nicer to avoid Javascripting your way to the solution when possible.Eno

© 2022 - 2024 — McMap. All rights reserved.