Efficient XSLT pipeline in Java (or redirecting Results to Sources)
Asked Answered
C

3

20

I have a series of XSL 2.0 stylesheets that feed into each other, i.e. the output of stylesheet A feeds B feeds C.

What is the most efficient way of doing this? The question rephrased is: how can one efficiently route the output of one transformation into another.

Here's my first attempt:

@Override
public void transform(Source data, Result out) throws TransformerException{
    for(Transformer autobot : autobots){
        if(autobots.indexOf(autobot) != (autobots.size()-1)){
            log.debug("Transforming prelim stylesheet...");
            data = transform(autobot,data);
        }else{
            log.debug("Transforming final stylesheet...");
            autobot.transform(data, out);
        }
    }
}

private Source transform(Transformer autobot, Source data) throws TransformerException{
    DOMResult result = new DOMResult();
    autobot.transform(data, result);
    Node node = result.getNode();
    return new DOMSource(node);
}

As you can see, I'm using a DOM to sit in between transformations, and although it is convenient, it's non-optimal performance wise.

Is there any easy way to route to say, route a SAXResult to a SAXSource? A StAX solution would be another option.

I'm aware of projects like XProc, which is very cool if you haven't taken a look at yet, but I didn't want to invest in a whole framework.

Cardialgia answered 21/8, 2009 at 14:47 Comment(1)
"for(Transformer autobot : autobots){" Priceless :-)Aleshia
R
24

I found this: #3. Chaining Transformations that shows two ways to use the TransformerFactory to chain transformations, having the results of one transform feed the next transform and then finally output to system out. This avoids the need for an intermediate serialization to String, file, etc. between transforms.

When multiple, successive transformations are required to the same XML document, be sure to avoid unnecessary parsing operations. I frequently run into code that transforms a String to another String, then transforms that String to yet another String. Not only is this slow, but it can consume a significant amount of memory as well, especially if the intermediate Strings aren't allowed to be garbage collected.

Most transformations are based on a series of SAX events. A SAX parser will typically parse an InputStream or another InputSource into SAX events, which can then be fed to a Transformer. Rather than having the Transformer output to a File, String, or another such Result, a SAXResult can be used instead. A SAXResult accepts a ContentHandler, which can pass these SAX events directly to another Transformer, etc.

Here is one approach, and the one I usually prefer as it provides more flexibility for various input and output sources. It also makes it fairly easy to create a transformation chain dynamically and with a variable number of transformations.

SAXTransformerFactory stf = (SAXTransformerFactory)TransformerFactory.newInstance();

// These templates objects could be reused and obtained from elsewhere.
Templates templates1 = stf.newTemplates(new StreamSource(
  getClass().getResourceAsStream("MyStylesheet1.xslt")));
Templates templates2 = stf.newTemplates(new StreamSource(
  getClass().getResourceAsStream("MyStylesheet1.xslt")));

TransformerHandler th1 = stf.newTransformerHandler(templates1);
TransformerHandler th2 = stf.newTransformerHandler(templates2);

th1.setResult(new SAXResult(th2));
th2.setResult(new StreamResult(System.out));

Transformer t = stf.newTransformer();
t.transform(new StreamSource(System.in), new SAXResult(th1));

// th1 feeds th2, which in turn feeds System.out.
Reeva answered 23/8, 2009 at 22:38 Comment(4)
Great, that looks like exactly what I'm looking for. Just curious - what did you search on to find that? My google-foo must be rusty.Cardialgia
Actually, your question reminded me of some code that I had seen implemented a while back. I knew that it used a saxtransformerfactory, so I googled: "saxtransformerfactory chain transformations". It does seem oddly hard to find, considering how much code/logic/trouble it saves when you want to pipeline transforms.Reeva
According to onjava.com/pub/a/onjava/excerpt/java_xslt_ch5/?page=6, one can test if transFact.getFeature(SAXTransformerFactory.FEATURE) to be able to safely cast to SAXTransformerFactory.Wendt
Should you not close the streams that you get from getResourceAsStream? To me this looks like a resource leak.Moravian
U
3

Related question Efficient XSLT pipeline, with params, in Java clarified on correct parameters passing to such transformer chain.

And it also gave a hint on slightly shorter solution without third transformer:

SAXTransformerFactory stf = (SAXTransformerFactory)TransformerFactory.newInstance();

Templates templates1 = stf.newTemplates(new StreamSource(
        getClass().getResourceAsStream("MyStylesheet1.xslt")));
Templates templates2 = stf.newTemplates(new StreamSource(
        getClass().getResourceAsStream("MyStylesheet2.xslt")));

TransformerHandler th1 = stf.newTransformerHandler(templates1);
TransformerHandler th2 = stf.newTransformerHandler(templates2);

th2.setResult(new StreamResult(System.out));

// Note that indent, etc should be applied to the last transformer in chain:
th2.getTransformer().setOutputProperty(OutputKeys.INDENT, "yes");

th1.getTransformer().transform(new StreamSource(System.in), new SAXResult(th2));
Uncrowned answered 1/3, 2013 at 16:10 Comment(0)
S
2

Your best bet is to stick to DOM as you're doing, because an XSLT processor would have to build a tree anyway - streaming is only an option for very limited category of transforms, and few if any processors can figure it out automatically and switch to a streaming-only implementation; otherwise they just read the input and build the tree.

Sailfish answered 21/8, 2009 at 17:22 Comment(1)
This is not correct. Behavior is implementation dependent. The Java W3C DOM implementation is very inefficient and most implementations use a more efficient internal representation of this DOM. So the accepted answer does improve performance over "stick to DOM".Orgel

© 2022 - 2024 — McMap. All rights reserved.