Programmatic HTMLDocument generation using Java
Asked Answered
V

9

12

Does anyone know how to generate an HTMLDocument object programmatically in Java without resorting to generating a String externally and then using HTMLEditorKit#read to parse it? Two reasons I ask:

Firstly my HTML generation routine needs to be very fast and I assume that parsing a string into an internal model is more costly than directly constructing this model.

Secondly, an object-oriented approach would likely result in cleaner code.

I should also mention that, for licensing reasons, I can't resort to using any libraries other than those shipped with the JVM.

Veratridine answered 5/6, 2009 at 14:19 Comment(5)
Why would you need to parse the HTML that you are generating? Are you going to need to be able to insert inline HTML that could invalidate?Scowl
Thanks for your questions: mmyers: HTML Oliver: Sorry, I didn't make that clear. If I understand your question correctly, I am generating an HTMLDocument (using HTMLEditorKit#read) from the HTML to be rendered by by a JTextPane.Veratridine
again, that doesn't explain why you would need to generate then parse.Freya
It is not me doing the parsing. However, I assume that swing must parse the HTML under covers in order to render the HTML in a JTextArea (otherwise it would be unnecessarily be reparsing the object every time it wanted to redraw the pane). I want to find some way of skipping this step, directly generating the object, rather than generating a String (which Swing presumably parses into the Object).Veratridine
Since you have a defined target object (HTMLDocument) that you need to generate, the code will only be cleaner if the API is designed for it. If the API was written with only a string in mind as the source, coercing your code into its parsing methods may or may not be faster, but it almost certainly won't be cleaner.Sombrero
M
9

One object-oriented approach is to use a library called ECS.

It is quite simple library, and has not changed for ages. Then again, the HTML 4.01 spec has not changed either ;) I've used ECS and consider it far better than generating large HTML fragments with just Strings or StringBuffers/StringBuilders.

Small example:

Option optionElement = new Option();
optionElement.setTagText("bar");
optionElement.setValue("foo");
optionElement.setSelected(false);   

optionElement.toString() would now yield:

<option value='foo'>bar</option>

The library supports both HTML 4.0 and XHTML. The only thing that initially bothered me a lot was that names of classes related to the XHTML version started with a lowercase letter: option, input, a, tr, and so on, which goes against the most basic Java conventions. But that's something you can get used to if you want to use XHTML; at least I did, surprisingly fast.

Mazur answered 5/6, 2009 at 14:29 Comment(2)
Tom can't use the library directly (although why someone would have issues with the Apache license is a mystery to me), but he can look at the api for ideas.Cultus
Hmm, I'm pretty sure that "I can't resort to using any libraries other than those shipped with the JVM" was not in the original version of the question! :) With that restriction, JeeBee's (utility classes like TableBuilder) and Adam Paynter's (XMLStreamWriter) solutions seem reasonable.Mazur
O
7

I'd look into how JSPs work - i.e., they compile down into a servlet that is basically one huge long set of StringBuffer appends. The tags also compile down into Java code snippets. This is messy, but very very fast, and you never see this code unless you delve into Tomcat's work directory. Maybe what you want is to actually code your HTML generation from a HTML centric view like a JSP, with added tags for loops, etc, and use a similar code generation engine and compiler internally within your project.

Alternatively, just deal with the StringBuilder yourself in a utility class that has methods for "openTag", "closeTag", "openTagWithAttributes", "startTable", and so on... it could use a Builder pattern, and your code would look like:

public static void main(String[] args) {
    TableBuilder t = new TableBuilder();
    t.start().border(3).cellpadding(4).cellspacing(0).width("70%")
      .startHead().style("font-weight: bold;")
        .newRow().style("border: 2px 0px solid grey;")
          .newHeaderCell().content("Header 1")
          .newHeaderCell().colspan(2).content("Header 2")
      .end()
      .startBody()
        .newRow()
          .newCell().content("One/One")
          .newCell().rowspan(2).content("One/Two")
          .newCell().content("One/Three")
        .newRow()
          .newCell().content("Two/One")
          .newCell().content("Two/Three")
      .end()
    .end();
    System.out.println(t.toHTML());
}
Orthodontia answered 5/6, 2009 at 14:33 Comment(1)
I used a TableBuilder here because I had the code, because we had a need to embed a HTML table into HTML emails in a project. It's not that hard to write, but you need to keep track of open tags and your current state.Orthodontia
I
4

When dealing with XHTML, I have had much success using Java 6's XMLStreamWriter interface.

OutputStream destination = ...;
XMLOutputFactory outputFactory = XMLOutputFactory.newInstance();
XMLStreamWriter xml = outputFactory.createXMLStreamWriter(destination);

xml.writeStartDocument();
xml.writeStartElement("html");
xml.writeDefaultNamespace("http://www.w3.org/1999/xhtml");

xml.writeStartElement("head");
xml.writeStartElement("title");
xml.writeCharacters("The title of the page");
xml.writeEndElement();
xml.writeEndElement();

xml.writeEndElement();
xml.writeEndDocument();
Inulin answered 5/6, 2009 at 14:39 Comment(0)
H
3

I think manually generating your HTML via something like a StringBuilder (or directly to a stream) is going to be your best option, especially if you cannot use any external libraries.

Not being able to use any external libraries, you will suffer more in terms of speed of development rather than performance.

Hampson answered 5/6, 2009 at 14:31 Comment(1)
+1 it's fastest, and the grimy aspects can be hidden away in utility builder classes, as per my reply also on this question.Orthodontia
D
2

javax.swing.text.html has HTMLWriter and HTMLDocument class among others. I have not used them. I have used the HtmlWriter in .Net and it does exactly what you want, but the java version may not work out to be the same.

Here is the doc: http://java.sun.com/j2se/1.5.0/docs/api/javax/swing/text/html/HTMLWriter.html

Also, I can't imagine a StringBuilder being slower than building with an object layer. It seems to me that any object oriented approach would have to build the object graph AND then produce the string. The main reason not to use raw strings for this stuff is that you are sure to get encoding errors as well as other mistakes that produce malformed documents.

Option 2: You could use your favorite XML api's and produce XHTML.

Drainpipe answered 5/6, 2009 at 14:34 Comment(0)
E
1

You may want to build some Element object with a render() method, and then assemble them in a tree structure; with a visit algorhytm you may then proceed to set the values and then render the whole thing.

PS: have you considered some templating engine like freemarker?

Escamilla answered 5/6, 2009 at 14:30 Comment(0)
F
1

It appears that you can accomplish what you are attempting using direct construction of HTMLDocument.BlockElement and HTMLDocument.BlockElement objects. Theses constructors have a signature that suggests direct use is possible, at least.

I would suggest examining the Swing sources in OpenJDK to see how the parser handles this, and derive your logic from there.

I would also suggest that this optimization may be premature, and perhaps this should be a speed-optimized replacement for a simpler approach (i.e. generating the HTML text) only introduced if this really does become a performance hotspot in the application.

Freya answered 5/6, 2009 at 15:36 Comment(0)
S
0

You can use any decent xml library like JDom or Xom or XStream. Html is just a special case of XML.

Or, you can use one of the existing templating engines for server side java like jsp or velocity.

Swashbuckler answered 5/6, 2009 at 14:30 Comment(4)
Technically, HTML is not a special case of XML. XHTML is.Slumgullion
XHTML (emphasis on the X) is XML. HTML is SGML. They are similar but not actually the same thing. Most valid XHTML is also valid HTML, but not all.Pirbhai
Since it is being used in a JEditorPane, all it needs to be is a form of HTML that the pane can read.Cultus
Yes, that is the crux of the problem Kathy :)Veratridine
B
0

Basically you can insert html into your HTMLDocument using one of the insert methods, insertBeforeEnd(), insertAfterEnd(), insertBeforeStart(), insertAfterStart(). You supply the method with the html you want to insert and the position in the document tree that you want the html inserted.

eg.

doc.insertBeforeEnd(element, html);

The HTMLDocument class also provided methods for traversing the document tree.

Babar answered 6/6, 2009 at 1:30 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.