Unmarshalling XML to three lists of different objects using STAX Parser
Asked Answered
F

3

12

Is there a way I can use STAX parser to efficiently parse an XML document with multiple lists of objects of different classes (POJO). The exact structure of my XML is as follows (class names are not real)

<?xml version="1.0" encoding="utf-8"?>
<root>
    <notes />
    <category_alpha>
        <list_a>
            <class_a_object></class_a_object>
            <class_a_object></class_a_object>
            <class_a_object></class_a_object>
            <class_a_object></class_a_object>
            .
            .
            .
        </list_a>
        <list_b>
            <class_b_object></class_b_object>
            <class_b_object></class_b_object>
            <class_b_object></class_b_object>
            <class_b_object></class_b_object>
            .
            .
            .
        </list_b>
    </category_alpha>
    <category_beta>
        <class_c_object></class_c_object>
        <class_c_object></class_c_object>
        <class_c_object></class_c_object>
        <class_c_object></class_c_object>
        <class_c_object></class_c_object>
        .
        .
        .
        .
        .
    </category_beta>
</root>

I have been using the STAX Parser i.e. XStream library, link: XStream

It works absolutely fine as long as my XML contains list of one class of objects but I dont know how to handle an XML that contains list of objects of different classes.

Any help would be really appreciated and please let me know if I have not provided enough information or I haven't phrased the question properly.

Felon answered 11/5, 2019 at 6:49 Comment(0)
B
9

You can use Declarative Stream Mapping (DSM) stream parsing library to easily convert complex XML to java class. It uses StAX to parse XML.

I skip getting notes tag and add a field inside class_x_object tags for demostration.

Here is the XML:

<?xml version="1.0" encoding="utf-8"?>
<root>
    <notes />
    <category_alpha>
        <list_a>
            <class_a_object>
                <fieldA>A1</fieldA>
            </class_a_object>
            <class_a_object>
                <fieldA>A2</fieldA>
            </class_a_object>
            <class_a_object>
                <fieldA>A3</fieldA>
            </class_a_object>

        </list_a>
        <list_b>
            <class_b_object>
                <fieldB>B1</fieldB>
            </class_b_object>
            <class_b_object>
                <fieldB>B2</fieldB>
            </class_b_object>
            <class_b_object>
                <fieldB>B3</fieldB>
            </class_b_object>
        </list_b>
    </category_alpha>
    <category_beta>
        <class_c_object>
          <fieldC>C1</fieldC>
        </class_c_object>
        <class_c_object>
          <fieldC>C2</fieldC>
        </class_c_object>
        <class_c_object>
          <fieldC>C3</fieldC>
        </class_c_object>
    </category_beta>
</root>

First of all, you must define the mapping between XML data and your class fields in yaml or JSON format.

Here are the mapping definitions:

result:     
   type: object
   path: /root   
   fields:
     listOfA:
       type: array
       path: .*class_a_object  # path is regex
       fields:
          fieldOfA:
            path: fieldA
     listOfB:
       type: array
       path: .*class_b_object
       fields:
          fieldOfB:
            path: fieldB 
     listOfC:
       type: array
       path: .*class_c_object
       fields:
          fieldOfC:
            path: fieldC 

Java class that you want to deserialize:

public class Root {
    public List<A> listOfA;
    public List<B> listOfB;
    public List<C> listOfC;

    public static class A{
        public String fieldOfA;
    }
    public static class B{
        public String fieldOfB;
    }
    public static class C{
        public String fieldOfC;
    }

}   

Java Code to parse XML:

DSM dsm=new DSMBuilder(new File("path/to/mapping.yaml")).setType(DSMBuilder.TYPE.XML).create(Root.class);
Root root =  (Root)dsm.toObject(xmlFileContent);
// write root object as json
dsm.getObjectMapper().writerWithDefaultPrettyPrinter().writeValue(System.out, object);

Here is output:

{
  "listOfA" : [ {"fieldOfA" : "A1"}, {"fieldOfA" : "A2"}, {"fieldOfA" : "A3"} ],
  "listOfB" : [ {"fieldOfB" : "B1"}, {"fieldOfB" : "B2"}, "fieldOfB" : "B3"} ],
  "listOfC" : [ {"fieldOfC" : "C1"}, {"fieldOfC" : "C2"}, {"fieldOfC" : "C3"} ]
}

UPDATE:

As I understand from your comment, you want to read big XML file as a stream. and process data while you are reading the file.

DSM allows you to do process data while you are reading XML.

Declare three different functions to process partial data.

FunctionExecutor processA=new FunctionExecutor(){
            @Override
            public void execute(Params params) {

                Root.A object=params.getCurrentNode().toObject(Root.A.class);

                // process aClass; save to db. call service etc.
            }
        };
FunctionExecutor processB=new FunctionExecutor(){
            @Override
            public void execute(Params params) {

                Root.B object=params.getCurrentNode().toObject(Root.B.class);

                // process aClass; save to db. call service etc.
            }
        };

FunctionExecutor processC=new FunctionExecutor(){
            @Override
            public void execute(Params params) {

                Root.C object=params.getCurrentNode().toObject(Root.C.class);

                // process aClass; save to db. call service etc.
            }
        };

Register function to DSM

 DSMBuilder builder = new DSMBuilder(new File("path/to/mapping.yaml")).setType(DSMBuilder.TYPE.XML);

       // register function
        builder.registerFunction("processA",processA);
        builder.registerFunction("processB",processB);
        builder.registerFunction("processC",processC);

        DSM dsm= builder.create();
        Object object =  dsm.toObject(xmlContent);

change Mapping file to call registered function

result:     
   type: object
   path: /root   
   fields:
     listOfA:
       type: object
       function: processA  # when 'class_a_object' tag closed processA function will be executed.
       path: .*class_a_object  # path is regex
       fields:
          fieldOfA:
            path: fieldA
     listOfB:
       type: object
       path: .*class_b_object
       function: processB# register function
       fields:
          fieldOfB:
            path: fieldB 
     listOfC:
       type: object
       path: .*class_c_object
       function: processC# register function
       fields:
          fieldOfC:
            path: fieldC 
Bootery answered 14/5, 2019 at 10:7 Comment(5)
Thanks a lot for the solution but isnt it still going to wait for the entire XML to parse to create one object. The XML is really really big and I was wondering if there was a better way to parse the XML and start processing objects before the end of the file is reached. On a sidenote, do you know any library that would convert java POJO to YAML?Felon
DSM allows you to processing parsed data while reading. You don't need to wait until end of file. Also DSM support scripting and expression so you can manipulate and filter data while you reading file. Here is the a example using DSM for large XML #9390868 SnakeYaml or Jackson are a liblary to convert java to yaml.Bootery
Awesome. Let me try this out.Felon
I am not able to parse inline variables for e.g. <root generated_date ="2019-01-01"> Is there some other kind of declaration in the mapping file for thisFelon
mfatihercik.github.io/dsm/build/html/specification/…. attribute: trueBootery
M
3

You could use Java Architecture for XML binding JAXB and Unmarshall using the POJO classes as mentioned below.

Create POJO classes first (I have taken few nodes from your XML file and created the POJO. You can do the similar for the rest). Below is the XML I considered.

<?xml version="1.0" encoding="utf-8"?>
<root>
    <category_alpha>
        <list_a>
            <class_a_object></class_a_object>
            <class_a_object></class_a_object>
            <class_a_object></class_a_object>
            <class_a_object></class_a_object>
        </list_a>
        <list_b>
            <class_b_object></class_b_object>
            <class_b_object></class_b_object>
            <class_b_object></class_b_object>
            <class_b_object></class_b_object>
        </list_b>
    </category_alpha>
</root>

Below are the POJO classes for Root, category_alpha, list_a, list_b, class_a_object and class_b_object

import java.util.List;

import javax.xml.bind.annotation.XmlAccessType;
import javax.xml.bind.annotation.XmlAccessorType;
import javax.xml.bind.annotation.XmlElement;
import javax.xml.bind.annotation.XmlRootElement;


@XmlRootElement(name = "root")
@XmlAccessorType (XmlAccessType.FIELD)
public class Root {

    @XmlElement(name = "category_alpha")
    private List<CategoryAlpha> categoryAlphaList = null;

    public List<CategoryAlpha> getCategoryAlphaList() {
        return categoryAlphaList;
    }

    public void setCategoryAlphaList(List<CategoryAlpha> categoryAlphaList) {
        this.categoryAlphaList = categoryAlphaList;
    }
}

Import the similar java imports to the above class here in the following classes.

@XmlRootElement(name = "category_alpha")
@XmlAccessorType (XmlAccessType.FIELD)
public class CategoryAlpha {

    @XmlElement(name = "list_a")
    private List<ListAClass> list_a_collectionlist = null;

    @XmlElement(name = "list_b")
    private List<ListBClass> list_b_collectionlist = null;


    public List<ListAClass> getList_a_collectionlist() {
        return list_a_collectionlist;
    }


    public void setList_a_collectionlist(List<ListAClass> list_a_collectionlist) {
        this.list_a_collectionlist = list_a_collectionlist;
    }


    public List<ListBClass> getList_b_collectionlist() {
        return list_b_collectionlist;
    }


    public void setList_b_collectionlist(List<ListBClass> list_b_collectionlist) {
        this.list_b_collectionlist = list_b_collectionlist;
    }
}

@XmlRootElement(name = "list_a")
@XmlAccessorType (XmlAccessType.FIELD)
public class ListAClass {

    @XmlElement(name = "class_a_object")
    private List<ClassAObject> classAObjectList = null;

    public List<ClassAObject> getClassAObjectList() {
        return classAObjectList;
    }

    public void setClassAObjectList(List<ClassAObject> classAObjectList) {
        this.classAObjectList = classAObjectList;
    }
}

@XmlRootElement(name = "list_b")
@XmlAccessorType (XmlAccessType.FIELD)
public class ListBClass {

    @XmlElement(name = "class_b_object")
    private List<ClassBObject> classBObjectList = null;

    public List<ClassBObject> getClassBObjectList() {
        return classBObjectList;
    }

    public void setClassBObjectList(List<ClassBObject> classBObjectList) {
        this.classBObjectList = classBObjectList;
    }
}

@XmlRootElement(name = "class_a_object")
@XmlAccessorType (XmlAccessType.FIELD)
public class ClassAObject {

}

@XmlRootElement(name = "class_b_object")
@XmlAccessorType (XmlAccessType.FIELD)
public class ClassBObject {

}

Here is the Main class

import java.io.File;

import javax.xml.bind.JAXBContext;
import javax.xml.bind.JAXBException;
import javax.xml.bind.Unmarshaller;

public class UnmarshallMainClass {

    public static void main(String[] args) throws JAXBException {
        JAXBContext jaxbContext = JAXBContext.newInstance(Root.class);
        Unmarshaller jaxbUnmarshaller = jaxbContext.createUnmarshaller();

        // This root object contains all the list of objects you are looking for
        Root emps = (Root) jaxbUnmarshaller.unmarshal( new File("sample.xml") );
    }

}

By using the getters in the root object and other objects you can retrieve the list of all objects inside the root similar like below.

List<CategoryAlpha> categoryAlphaList = emps.getCategoryAlphaList();
Mathison answered 19/5, 2019 at 15:1 Comment(2)
Thanks a lot Subash. Your solution will definitely work but I think I will stick to DSM stream parser as it lets me manipulate objects before the end of XML is reached.Felon
Hi, Thanks for this response. I am doing something similar but unable to marshall few events to my class. If you get a chance can you please have a look at this question and provide your observation? #67668016Atingle
P
0

I have created a parser for provided example. https://github.com/sbzDev/stackoverflow/tree/master/question56087924

import com.thoughtworks.xstream.annotations.XStreamAlias;

import java.util.List;

@XStreamAlias("root")
public class Root {

    String notes;

    @XStreamAlias("category_alpha")
    CategoryAlpha categoryAlpha;


    @XStreamAlias("category_beta")
    List<C> listC;

    static class CategoryAlpha {

        @XStreamAlias("list_a")
        List<A> listA;

        @XStreamAlias("list_b")
        List<B> listB;
    }

    @XStreamAlias("class_a_object")
    static class A {
    }

    @XStreamAlias("class_b_object")
    static class B {
    }

    @XStreamAlias("class_c_object")
    static class C {
    }
}

Parser:

import com.thoughtworks.xstream.XStream;

public class SampleRootParser {

    public Root parse(String xmlContent){
        XStream xstream = new XStream();
        xstream.processAnnotations(Root.class);
        return  (Root)xstream.fromXML(xmlContent);
    }
}

Maybe you can provide actual XML and expected result?

Preliminary answered 13/5, 2019 at 11:55 Comment(1)
That is almost similar to what I am using unfortunately and its not at all efficient because we are parsing the entire XML by one read and XStream loses its benefit here.Felon

© 2022 - 2024 — McMap. All rights reserved.