Output JSoup without added spaces and line breaks around the elements
Asked Answered
M

2

12

I am parsing and outputting an xml file using JSoup (and modifying the elements in between of course).

The output file has some extra spaces and line breaks. I was wondering if I can print this in the original format.

Original:

  <attributes>
        <divisions>4</divisions>
        <key>
          <fifths>0</fifths>
          <mode>major</mode>
          </key>
...

New:

<attributes> 
    <divisions>
     4
    </divisions> 
    <key> 
     <fifths>
      0
     </fifths> 
     <mode>
      major
     </mode> 
    </key> 
...

Any idea on how to remove the spaces/enters from the elements?

I currently read in and print the document like this:

doc = Jsoup.parse(is, "UTF-8", "", Parser.xmlParser());


BufferedWriter htmlWriter = new BufferedWriter(new OutputStreamWriter(new FileOutputStream("output.xml"), "UTF-8"));
        htmlWriter.write(doc.toString());
Mcpherson answered 5/3, 2015 at 11:46 Comment(5)
Have you seen this: https://mcmap.net/q/861530/-jsoup-line-feed/1700321 ?Bonni
Interesting, but isn't this the opposite, as it adds the \n to it?Mcpherson
I was referring more the prettyPrint and OutputSettings options.Bonni
Great, doc.outputSettings().indentAmount(0).prettyPrint(false); did it. Will you post it as an answer?Mcpherson
Well, you are the one who found it, I just pointed you in the right direction. :) You can answer your question yourself.Bonni
M
20

With some help from Aleksandr M I solved it in the following way:

doc.outputSettings().indentAmount(0).prettyPrint(false);

A little less nice, but this also seemed to do the trick:

htmlWriter.write(doc.toString().replaceAll(">\\s+",">").replaceAll("\\s+<","<"));
Mcpherson answered 5/3, 2015 at 12:42 Comment(2)
Thanks! outputSettings() is great. replaceAll() is problematic in that it can join e.g. this: A <b>doozy</b> dog into this textual content: Adoozydog, rightPrewitt
In Javadocs, there is a line for this method indentAmount(int indentAmount) "Set the indent amount for pretty printing". I believe you wouldn't need to set indent to 0 if you're setting prettyPrint to falseTeacup
F
1

Try this:

doc = Jsoup.parse(is, "UTF-8", "", Parser.xmlParser());
doc.outputSettings().escapeMode(Entities.EscapeMode.xhtml);
..
..

Hope this helps

Feliks answered 5/3, 2015 at 11:50 Comment(1)
Tried this, but no difference.Mcpherson

© 2022 - 2024 — McMap. All rights reserved.