Why is Apache Xerces/Xalan adding additional carriage returns to my serialized output?
Asked Answered
P

3

7

I'm using Apache Xerces 2.11.0 and Apache Xalan 2.7.1 and I'm having problems with additional carriage return characters in the serialized XML.

I have this (pseudo) code:

String myString = ...;
Document doc = ...;

Element item = doc.createElement("item");
item.appendChild(doc.createCDATASection(myString));

Transformer transformer = ...;
ByteArrayOutputStream stream = new ByteArrayOutputStream();
Result result = new StreamResult(stream);
transformer.transform(new DOMSource(document), result);

Now myString contains line breaks (\r\n), (actually it's base64 encoded data) but when I look at the serialized output, there are additional \r characters.

Input:

Line 1 \r\n
Line 2 \r\n
Line 3 \r\n

Output:

Line 1 \r\r\n
Line 2 \r\r\n
Line 3 \r\r\n

If I use createTextNode instead of createCDATASection the output becomes even more interesting:

Line 1 
\r\n
Line 2 
\r\n
Line 3 
\r\n

The additional character seems to be introduced during serialization, the DOM tree seems to be correct. (According to getTextContent())

Why is this happening? What can I do to fix this?

Perfect answered 11/6, 2011 at 16:59 Comment(3)
Result is just an output tree. How are you serializing Result to a String or output stream?Utile
I ran into the same problem. Did you find a solution to this problem?Homer
No, unfortunately I never did. I'm manually removing the line breaks now.Perfect
T
11

I guess your are having this problem on Windows and not on Linux/Solaris/Mac. Xalan serializer (org.apache.xml.serializer.ToStream.java) gets the line separator using System.getProperty("line.separator"). When the serializer writes \r\n, it interprets the \n as the end of line sequence and it actually writes \r+lineSeparator = \r\r\n. Although this sounds strange, this is not a bug, see [1]. But since this was frequently reported as a bug, a xalan extension property was added [2]. So you may programmatically set:

transformer.setOutputProperty("{http://xml.apache.org/xalan}line-separator","\n");

or

<xsl:output xalan:line-separator="&#10;" />

where xalan is a prefix associated with the URL "http://xml.apache.org/xalan".

[1] https://issues.apache.org/jira/browse/XALANJ-1660

[2] https://issues.apache.org/jira/browse/XALANJ-2093

Tied answered 5/9, 2012 at 9:37 Comment(1)
Thank you! Trying to generate CSV files that Excel can process requires changing this. New lines in cells are LF and new rows use CRLF. Have not been able to find this information easily anywhere else on the internet.Newfangled
F
1

Odd, but try doing transformer.setOutputProperty(javax.xml.transform.OutputKeys.INDENT, "no"); immediately after creating the transformer and see what happens.

Feer answered 11/6, 2011 at 17:30 Comment(1)
Odd. What is the code to create the Result result = .. entry? Are you using a Writer or a Stream?Feer
S
0

Try using Xerces 2.9.0 which is tested with Xalan 2.7.1. (2.9.0 comes within the Xalan package)

After I had problems with Xerces 2.11.0 I did the same.

Sperling answered 11/6, 2012 at 14:28 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.