While this is a pretty old question, there might be another angle on the answer that hasn't been touched on yet.
TL;DR it matters what flavor of Result
the Transformer
is feeding into. (If you're using xalan through Java code you didn't write/can't change, this might not be what you want to hear.)
For demonstrations in this answer, I'll be using PostgreSQL PL/Java, because it comes with a set of example functions including preparexmltransform
and transformxml
that use Java's xalan-based XSLT 1.0 stuff, and have some extra arguments for test purposes. There's an important behavior effect here that I wouldn't have seen without those extra arguments.
I'll start by preparing a transform named indent
:
SELECT
preparexmltransform(
'indent',
'<xsl:transform version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" indent="yes"/>
<xsl:template match="/">
<xsl:copy-of select="."/>
</xsl:template>
</xsl:transform>',
how => 5);
It should be clear enough that the first argument there is a name for the transform and the second is the XSLT defining it. I'll get to that "how" argument in a bit.
So anyway, let's use that transform on some XML and see what happens:
SELECT
transformxml(
'indent',
'<a b="c" d="e"><f><g/><h/></f></a>',
howin => 5, howout => 4);
transformxml
----------------
<a b="c" d="e">
<f>
<g/>
<h/>
</f>
</a>
Cool, that did what was wanted right away, and shows that the short transform above is enough; notably, it doesn't need an xalan:indent-amount
property (unless you like a different indent width), so it doesn't need an xalan
namespace defined, and there doesn't have to be a strip-space
element for it to work (if you try with spaces in the input document, the indent spaces are just added to them, which can look goofy, so you might choose to use strip-space
, but the indenting happens either way).
I still haven't said what those extra arguments do (two of 'em now, "howin" and "howout"!), but that's coming, because look what happens changing nothing but "howout" from 4 to 5:
SELECT
transformxml(
'indent',
'<a b="c" d="e"><f><g/><h/></f></a>',
howin => 5, howout => 5);
transformxml
------------------------------------
<a b="c" d="e"><f><g/><h/></f></a>
So the "howout" matters for whether the indenting happens. What are these hows?
Well, Java doesn't have just one API for working with XML. It has several, including DOM, StAX, and SAX, not to mention you might just want to handle the XML as a String
, or a character stream via Reader
/Writer
, or an encoded byte stream via InputStream
/OutputStream
.
The JDBC spec says if you're writing Java code to work with XML in a database, the SQLXML API has to give you your choice of any of those ways to work with the data, whichever is convenient for your task. And the JAXP Transformations API says you have to be able to hand a Transformer
pretty much any flavor of Source
and any flavor of Result
, and have it do the right thing.
So that's why those PL/Java example functions have "how" arguments: there needs to be a way to test all of the required ways the same XML content can be passed to the Transformer
and all the ways the Transformer
's result can come back. The "how"s are arranged (arbitrarily) like this:
code | form | howin | howout
------+---------------------+--------------+--------------
1 | binary stream | InputStream | OutputStream
2 | character stream | Reader | Writer
3 | String | String | String
4 | binary or character | StreamSource | StreamResult
5 | SAX | SAXSource | SAXResult
6 | StAX | StAXSource | StAXResult
7 | DOM | DOMSource | DOMResult
So what does the same xalan indenting transform do, when it is called with different ways of producing its result?
SELECT
i, transformxml(
'indent',
'<a b="c" d="e"><f><g/><h/></f></a>',
howin => 5, howout => i)
FROM
generate_series(1,7) AS i;
i | transformxml
---+------------------------------------------
1 | <a b="c" d="e">
| <f>
| <g/>
| <h/>
| </f>
| </a>
|
2 | <a b="c" d="e">
| <f>
| <g/>
| <h/>
| </f>
| </a>
|
3 | <a b="c" d="e">
| <f>
| <g/>
| <h/>
| </f>
| </a>
|
4 | <a b="c" d="e">
| <f>
| <g/>
| <h/>
| </f>
| </a>
|
5 | <a b="c" d="e"><f><g/><h/></f></a>
6 | <a b="c" d="e"><f><g></g><h></h></f></a>
7 | <a b="c" d="e"><f><g/><h/></f></a>
Well, there's the pattern. For all of the APIs where the Transformer
actually has to directly produce a serialized stream of characters or bytes, it adds the indentation as requested.
When it is given a SAXResult
, StAXResult
, or DOMResult
to write into, it doesn't add indentation, because those are all structural XML APIs; it's as if xalan treats indenting as strictly a serialization issue, and it technically isn't serializing when it is producing SAX, StAX, or DOM.
(The table above also shows that the StAX API doesn't always render an empty element as self-closed when the other APIs do. Side issue, but interesting.)
So, if you find yourself trying to get an xalan transform to do indenting and it isn't, double check which form of Result
you are asking the Transformer
to produce.
Edit: One final point: if you are coding this directly in Java, there really isn't any need at all to write those seven-ish lines of XSLT just to get what's nothing more than an identity-transform with the indent
output property set.
If you call the no-argument TransformerFactory.newTransformer()
, it straight-up gives you a plain-vanilla identity transform. Then all you need to do is set its output properties, and you're in business:
var tf = javax.xml.transform.TransformerFactory.newInstance();
var t = tf.newTransformer();
t.setOutputProperty("indent", "yes");
t.setOutputProperty("{http://xml.apache.org/xalan}indent-amount", "1"); // if you don't like the default 4
t.transform(source, result);
Doesn't get much simpler than that. Again, it's critical that result
be a StreamResult
, so that the transformer will do serialization.
method="xml"
only,method="html"
has different problems/behaviors. The most important being:com.sun.org.apache.xalan.internal.xsltc.runtime.AbstractTranslet#transferOutputSettings
very simply ignoresindent-amount
formethod="html"
in the JDK (checked 8, 9 and 11). Java 11 supports indentation, because the default indent-number is 4 there, but not configurable. – Rasheedarasher