XSLT 3.0 streaming (Saxon)
Asked Answered
D

1

6

I have a big XML file (6 GB) with this kind of tree:

<Report>
   <Document>
      <documentType>E</documentType>
      <person>
         <firstname>John</firstname>
         <lastname>Smith</lastname>
      </person>
   </Document>
   <Document>
      [...]
   </Document>
   <Document>
      [...]
   </Document>
   [...]
</Report>

If I apply an XSLT style sheet on it, I have this error:

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space

So I wanted to try the new XSLT 3.0 feature: streaming, with Saxon 9.6 EE. I don't want to have the streaming constrains once in a Document. I think that, what I want to do, is very close to the "burst mode" that is described here: http://saxonica.com/documentation/html/sourcedocs/streaming/burst-mode-streaming.html

Here is my Saxon command line:

java -cp saxon9ee.jar net.sf.saxon.Transform -t -s:input.xml -xsl:stylesheet.xsl -o:output/output.html

Here is my XSLT style sheet:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="3.0">
<xsl:mode streamable="yes" />

<xsl:template match="/">
    GLOBAL HEADER
        <xsl:iterate select="copy-of()/Report/Document" >
           DOC HEADER
           documentType: <xsl:value-of select="documentType"/>
           person/firstname: <xsl:value-of select="person/firstname"/>
           DOC FOOTER
           <xsl:next-iteration/>
        </xsl:iterate>
    GLOBAL FOOTER
</xsl:template>

</xsl:stylesheet>

But I still have the same out of memory error.

Thank you for your help!

Deeprooted answered 6/10, 2014 at 22:0 Comment(0)
C
6

Your copy-of() is copying the context item, which is the entire document. You want

copy-of(/Report/Document)

which copies each Document in turn. Or I tend to write it

/Report/Document/copy-of()

because I think it makes it clearer what is going on.

Incidentally you don't need xsl:iterate here: xsl:for-each will do the job perfectly well, because processing of one Document doesn't depend on the processing of any previous documents.

Cordes answered 7/10, 2014 at 8:28 Comment(5)
Thank you! "/Report/Document/copy-of()" is working well. The tricky thing is that "copy-of(/Report/Document)" gives this error: XPTY0004: A sequence of more than one item is not allowed as the first argument of copy-of()Deeprooted
The copy-of function has been changed in the 2 Oct 2014 xslt 3.0 working draft to accept a sequence of nodes, and Saxon 9.6 has implemented this change. Are you sure you are using 9.6? I would strongly recommend moving to Saxon 9.6 if you are doing streaming.Cordes
Oh you are right, I clicked on the wrong download link. I was using Saxon-EE 9.5.1.6J, now with Saxon-EE 9.6.0.1J, it is working! I have another issue related to this burst mode: if I want to use xsl:call-template in the xsl:iterate tags I have this error: XTSE3430: Template rule is declared streamable but it does not satisfy the streamability rules. * xsl:call-template is not streamable in this Saxon release. Any ideas to overcome the streaming contrains after the copy?Deeprooted
Please submit a new question on that one. Quote the code of the template that is deemed non-streamable.Cordes
Here it is: #26259639Deeprooted

© 2022 - 2024 — McMap. All rights reserved.