XMLStreamWriter - Java 8 - writeCharacters - 
Asked Answered
T

2

9

The behavior of this method has changed in Java 8, it seems. I need some quick-fix for my problem.

The problem is what I have some code which writes CR and LF after each XML node named <row>. Now (as we migrated to Java 8), instead of CR and LF the characters &#xd; are written out.

Again, I need a quick fix, I cannot change the StaX implementation or do anything big like that.

    while (reader.hasNext()){
        event = reader.next();
        if (event == XMLStreamConstants.START_ELEMENT){

            if (reader.getLocalName().equals("row")){

                writer.writeCharacters("\r\n"); /// this is my problem now!!! 
                writer.writeStartElement(reader.getLocalName());

                n = reader.getAttributeCount();
                for (int i=0; i<n; i++){
                    name = reader.getAttributeName(i).getLocalPart();
                    value = reader.getAttributeValue(i);

                                    ...
                    }
        }
Tuchun answered 16/4, 2015 at 13:0 Comment(8)
And, uhm, what about some code?Tribute
The CR+LF (Windows newline) can be substituted by LF (Unix newline). It might work. Maybe the line-endings are handled differently. Look at the API. Maybe system property "line.sepator" was defined as project parameter.Sago
@JoopEggen I can use System.getProperty("line.separator") but would that help?!Tuchun
@Tribute Yes, added some code.Tuchun
I more meant, that the migration to Java 8 lost a definition of a line separator. Setting it oneself is maybe not so good style. But above instead of "\r\n" that migh be suitable - or not. Try "\n". If it is a BufferedWriter writer.newLine(); is possible for the OS default.Sago
@JoopEggen It's not a BufferedWriter, it's an XMLStreamWriter. The migration to Java 8 didn't lose anything, it's just a hardcoded "\r\n" there.Tuchun
" it's just a hardcoded "\r\n" there " I mean code hasn't changed, only the behavior is now different. I think I will just comment out that line for now as a quick dirty fix. That will work for my case but I am interested how I can write a normal newline now under Java 8.Tuchun
It turns out I am having my problem on Linux. On Windows, it's working fine.Tuchun
H
1

You need to get access to the underlying writer that is the writer you decorated with the XMLStreamWriter (hopefully if there is one it would be the writer you passed into createXMLStreamWriter()) or you need to temporarily disable escaping which is implementation dependent.

The reason your getting the weird characters is that the XMLStreamWriter has no idea where you are writing these characters so it defaults to XML attribute escaping which is stricter than element (content) escaping. The escaping is also generally based on the CharacterEncoder. My guess is that in older versions of Java it was defaulting to XML element escaping which will not escape white space like newlines or a different character encoding was used. I can see why they fixed this as clearly attribute escaping is the correct way to do it. I also have no idea which XMLStreamWriter or CharacterEncoder your actually using and probably what more likely happened is that the default picked XMLStreamWriter or character encoding implementation changed (you should check in the debugger which one is getting picked).

Regardless if you get access to the underlying writer you can just write the characters directly and they will not be escaped. However make sure the writer you use is the one that is decorated and not one deeper (ie if you have a BufferWriter decorating a FileWriter use the BufferWriter).

For those that don't think writeCharacters does escaping you can look at the code.

EDIT

Apparently after looking at the code you can just call writer.setEscapeCharacters(false) on the default sun impl (unfortunately you probably have to do some casting) before you callwriteCharacters which is probably better than getting the original writer. I did not know about this flag.

EDIT 2

Another possible quick fix if your hopefully using the Sun StaX implementation is to change your system level character encoding and picking an encoding so that the CRLF does not get escaped ideally to whatever it was before JDK upgrade. This is assuming the problem could be your character encoding changed from Windows or ISO to UTF-8 on Java upgrade but I can't be sure since you didn't specify your operating system. If it didn't change on upgrade (ie hopefully you have always defaulted to UTF-8) then disregard this option.

EDIT 3

After doing some testing I'm pretty positive your StaX implementation is not the default Java Sun implementation but probably Woodstox. I haven't tested Woodstox but it appears the library cares quite a bit about whitespace for performance reasons and appears to have different rules if its UTF-8 and ISO (again character encoding).

Huntingdonshire answered 16/4, 2015 at 14:10 Comment(5)
I don't think this is correct. There is no "startAttribute" method in XMLStreamWriter, so writeCharacters is never writing an attribute value.Gerenuk
It has nothing to do with writing attribute values but escaping. If you look at any of the stax writers they don't just pump the characters unescaped for writeCharacters Here is a random Stax Stream Writer: grepcode.com/file/repo1.maven.org/maven2/com.sun.xml.stream/… ... See writeXMLContent .. it clearly is escaping.Huntingdonshire
Windows/Linux has nothing to do with the upgrade. I use Windows on my dev box/laptop (it's always been like that). We use Linux in production (again, it's always been like that). We only upgraded the JDK/JRE versions. But even so, with the new JDK I have no issues on my Windows dev box.Tuchun
There are so many things that can vary based on the operating system and even a JDK upgrade. With out a you telling me exactly what XMLStreamWriter your using and exactly what your character encoding you have and exactly what line separator you would like and that these are consistent across environments I have no idea what will happen but I do know that you can expect differences to possibly happen. The order of JAR loading alone can change which could cause a different XMLStreamWriter to get loaded. You need to use a debugger with some breakpoints to see what is happening.Huntingdonshire
I've seen the same problem with IBM's Java SDK 8. writer.writeCharacters("\r\n") outputs &#xD; following by \r\n under Windows. Writing the line separator to the underlying stream works (after you've flushed your xmlstreamwriter).Detention
T
1

What I did as a fix was to just call the following method.

writer.writeCharacters(System.lineSeparator());

That works fine and produced raw (and not XML-escaped) CR/LF data.
Also, it turned out I was having my problem on Linux, while on Windows it was working OK.

Tuchun answered 16/4, 2015 at 16:19 Comment(7)
If you only had said in your original question that you were switching operating systems and that you didn't care what kind of newline it was I could have kept my answer simpler :).... I'm glad you figured it out though.Huntingdonshire
BTW you still might get escaped characters with this method if your systems character encoding is different than the output of the XML. That is if you write ASCII XML on Windows with System.lineSeparator() you might still have the problem. I also find it uncomfortable that your produce different XMLs for different Operating Systems. Better to explicitly have the lineseparator (ie lf) and character encoding (ie utf-8) the same on all OS if you can unless this is a desktop application.Huntingdonshire
@AdamGent Thank you but I didn't figure anything yet. I will re-read all this and re-think it including your answer. I just found some patch (quick fix/solution) which works for now and I shared it via my answer. I don't generate different data for different OS, I use UTF-8 explicitly.Tuchun
@AdamGent XMLStreamReader reader = XMLInputFactory.newInstance().createXMLStreamReader(fis, "UTF-8"); XMLStreamWriter writer = XMLOutputFactory.newInstance().createXMLStreamWriter(fos, "UTF-8"); // this is how I create the reader and the writer which I am using.Tuchun
@AdamGent Ideally I want "\r\n" on both Linux and Windows and I want them generated by the same consistent code. I don't see why same Java code on the same JVM would XML-escape things on Linux and not XML-escape them on Windows.Tuchun
I just told you your character encoding is probably different on windows which will cause different behavior let alone different jar loading order can happen on different operating systems. I won't believe you till you set a breakpoint and prove that line separator, character encoding, and stream writer are all the same. My guess is one of those is different on your dev box. Otherwise you may have found a bug with how the character encoders are checking what System.lineSeparator() and allowing it to not be encoded which I doubt.Huntingdonshire
Ok did some testing and looks like the only way this could happen is your StaX implementation is not the default Java Sun one but rather Woodstox or some other Stax implementation.Huntingdonshire

© 2022 - 2024 — McMap. All rights reserved.