Pretty HTML snippet output
Asked Answered
E

3

9

I've a snippet of HTML <div><p>text1</p></div><div><p>text1</p></div>

I want to make it pretty like this

<div>
  <p>text1</p>
</div>
<div>
  <p>text1</p>
</div>

What would be most simple way to do it? (I've looked on transform and jsoup) but not sure what would be really smart to use. Thanks!

Envy answered 22/3, 2015 at 16:12 Comment(1)
thanks, somehow I missed this one :)Envy
L
2

jTidy could fit for this task - http://jtidy.sourceforge.net/howto.html

public String prettyPrintHTML(String rawHTML)
{    
    Tidy tidy = new Tidy();
    tidy.setXHTML(true);
    tidy.setIndentContent(true);
    tidy.setPrintBodyOnly(true);
    tidy.setTidyMark(false);

    // HTML to DOM
    Document htmlDOM = tidy.parseDOM(new ByteArrayInputStream(rawHTML.getBytes()), null);

    // Pretty Print
    OutputStream out = new ByteArrayOutputStream();
    tidy.pprint(htmlDOM, out);

    return out.toString();
}
Labuan answered 22/3, 2015 at 16:18 Comment(5)
Is it possible to avoid auto-adding head and body to my result? I have to parse only snippet.Envy
I have set tidy.setPrintBodyOnly(true); - that should do it. If it's still wrapped, simply fetch the content of the node body from htmlDOM. Node body = htmlDom.getElementsByTagName("body").item(0);Labuan
what if I need to pretty have [header part] only but not a body?Envy
Remove tidy.setPrintBodyOnly(true);. You get the full doc. Then again work with the htmlDom and extract the header node. Node head = htmlDom.getElementsByTagName("head").item(0);Labuan
It also replace some new html5 tags to div :( can't really use it thenEnvy
R
21

You can use Jsoup like

String html = "<div><p>text1</p></div><div><p>text1</p></div>";
Document doc = Jsoup.parseBodyFragment(html);

But this will wrap your text into

<html>
  <head></head>
  <body>
    ..
  </body>
</html>

To get rid of this part you can get part from <body> like

System.out.println(doc.body().html());

which prints

<div>
 <p>text1</p>
</div>
<div>
 <p>text1</p>
</div>

If you want to increase indentation you can set it earlier with

doc.outputSettings().indentAmount(4); 

now result will look like

<div>
    <p>text1</p>
</div>
<div>
    <p>text1</p>
</div>
Rhynchocephalian answered 22/3, 2015 at 16:38 Comment(0)
C
2

I would use HTML Tidy here is an online version.

Many of the text editors have plugins or built in functionality for this.

Sublime Text

BBEdit

Coda

Charwoman answered 22/3, 2015 at 16:17 Comment(3)
How can I avoid adding body and html? I want to have pretty only my snippet.Envy
It depends when you are trying to do this. Do you want to do this on the fly so your code is sent to a page formatted or are you trying to do this while you are editing it?Charwoman
I need to display snippets of html our page is combined from. Some customer want to see snippets separately instead of viewing HTML on page.Envy
L
2

jTidy could fit for this task - http://jtidy.sourceforge.net/howto.html

public String prettyPrintHTML(String rawHTML)
{    
    Tidy tidy = new Tidy();
    tidy.setXHTML(true);
    tidy.setIndentContent(true);
    tidy.setPrintBodyOnly(true);
    tidy.setTidyMark(false);

    // HTML to DOM
    Document htmlDOM = tidy.parseDOM(new ByteArrayInputStream(rawHTML.getBytes()), null);

    // Pretty Print
    OutputStream out = new ByteArrayOutputStream();
    tidy.pprint(htmlDOM, out);

    return out.toString();
}
Labuan answered 22/3, 2015 at 16:18 Comment(5)
Is it possible to avoid auto-adding head and body to my result? I have to parse only snippet.Envy
I have set tidy.setPrintBodyOnly(true); - that should do it. If it's still wrapped, simply fetch the content of the node body from htmlDOM. Node body = htmlDom.getElementsByTagName("body").item(0);Labuan
what if I need to pretty have [header part] only but not a body?Envy
Remove tidy.setPrintBodyOnly(true);. You get the full doc. Then again work with the htmlDom and extract the header node. Node head = htmlDom.getElementsByTagName("head").item(0);Labuan
It also replace some new html5 tags to div :( can't really use it thenEnvy

© 2022 - 2024 — McMap. All rights reserved.