Best XML parser for Java [closed]
Asked Answered
A

8

397

I need to read smallish (few MB at the most, UTF-8 encoded) XML files, rummage around looking at various elements and attributes, perhaps modify a few and write the XML back out again to disk (preferably with nice, indented formatting).

What would be the best XML parser for my needs? There are lots to choose from. Some I'm aware of are:

And of course the one in the JDK (I'm using Java 6). I'm familiar with Xerces but find it clunky.

Recommendations?

Antediluvian answered 17/12, 2008 at 6:52 Comment(6)
I think, you can find more players here: xml.com/lpt/a/1703Magnetochemistry
i think there are real problems with this question. 1 is it is is comparing totally unlike things, lumping parsers (xerces, crimson) together with dom-manipulation libraries (dom4j, xom, jdom). also the answers tend toward advocacy and are not that constructive.Formenti
@Magnetochemistry your link is not working.Ernesternesta
Unfortunately yes, the link is gone. Is was posted 9 years ago. I was interested in this topic when I was going my own research concerning what DOM manipulation library to take.Magnetochemistry
Underscore-java library can read and generate xml strings.Ethiopia
I know I'm literally 15 years late to the party (and honestly it probably didn't even exist back then) but JSoup is fantastic for XML as well if you already use it for HTML parsing.Zapateado
D
83

If speed and memory is no problem, dom4j is a really good option. If you need speed, using a StAX parser like Woodstox is the right way, but you have to write more code to get things done and you have to get used to process XML in streams.

Duodenal answered 17/12, 2008 at 8:4 Comment(2)
dom4j is pretty good, but definitely not without problems. For good dom4j alternatives, see #832365Selmore
@Duodenal are they thread-safe?Ernesternesta
W
262

I think you should not consider any specific parser implementation. Java API for XML Processing lets you use any conforming parser implementation in a standard way. The code should be much more portable, and when you realise that a specific parser has grown too old, you can replace it with another without changing a line of your code (if you do it correctly).

Basically there are three ways of handling XML in a standard way:

  • SAX This is the simplest API. You read the XML by defining a Handler class that receives the data inside elements/attributes when the XML gets processed in a serial way. It is faster and simpler if you only plan to read some attributes/elements and/or write some values back (your case).
  • DOM This method creates an object tree which lets you modify/access it randomly so it is better for complex XML manipulation and handling.
  • StAX This is in the middle of the path between SAX and DOM. You just write code to pull the data from the parser you are interested in when it is processed.

Forget about proprietary APIs such as JDOM or Apache ones (i.e. Apache Xerces XMLSerializer) because will tie you to a specific implementation that can evolve in time or lose backwards compatibility, which will make you change your code in the future when you want to upgrade to a new version of JDOM or whatever parser you use. If you stick to Java standard API (using factories and interfaces) your code will be much more modular and maintainable.

There is no need to say that all (I haven't checked all, but I'm almost sure) of the parsers proposed comply with a JAXP implementation so technically you can use all, no matter which.

Wag answered 17/12, 2008 at 8:2 Comment(4)
Actually, 3 ways: StAX (javax.xml.stream) is the third standard one.Vicinal
java-samples.com/showtutorial.php?tutorialid=152 (personally love SAX)Audreyaudri
@Audreyaudri Chrome tells me that page has nasty stuff on it. I used this instead: sce.uhcl.edu/yue/courses/xml/notes/xmlparser/IntroDOM.aspMaladapted
Good overview: only one thing I'd disagree with -- while for incremental/streaming, SAX and Stax are good, standard API sufficient, for DOM this is not the case (IMO): there are valid reasons for Java-specific takes like XOM, JDOM and DOM4J: language-agnostic DOM is pretty cumbersome to use.Vicinal
G
132

Here is a nice comparision on DOM, SAX, StAX & TrAX (Source: http://download.oracle.com/docs/cd/E17802_01/webservices/webservices/docs/1.6/tutorial/doc/SJSXP2.html )

Feature                  StAX                  SAX                      DOM                  TrAX

API Type                Pull,streaming     Push,streaming    In memory tree    XSLT Rule

Ease of Use          High                    Medium                 High                    Medium

XPath Capability   No                       No                        Yes                      Yes

CPU & Memory     Good                  Good                    Varies                  Varies

Forward Only        Yes                    Yes                        No                       No

Read XML              Yes                    Yes                        Yes                     Yes

Write XML              Yes                    No                          Yes                     Yes

CRUD                      No                      No                         Yes                     No

Gaither answered 14/4, 2011 at 15:35 Comment(1)
You can write XML with SAX. The sink provides a handler implementation which the user can call SAX events on to generate XML output. (I see that the table is sourced and not original material, the table is wrong though)Dynel
D
83

If speed and memory is no problem, dom4j is a really good option. If you need speed, using a StAX parser like Woodstox is the right way, but you have to write more code to get things done and you have to get used to process XML in streams.

Duodenal answered 17/12, 2008 at 8:4 Comment(2)
dom4j is pretty good, but definitely not without problems. For good dom4j alternatives, see #832365Selmore
@Duodenal are they thread-safe?Ernesternesta
M
8

Simple XML http://simple.sourceforge.net/ is very easy for (de)serializing objects.

Mcnully answered 23/7, 2011 at 19:7 Comment(0)
F
4

In addition to SAX and DOM there is STaX parsing available using XMLStreamReader which is an xml pull parser.

Frias answered 18/12, 2008 at 1:32 Comment(0)
B
3

I have found dom4j to be the tool for working with XML. Especially compared to Xerces.

Braden answered 17/12, 2008 at 7:11 Comment(0)
S
2

I wouldn't recommended this is you've got a lot of "thinking" in your app, but using XSLT could be better (and potentially faster with XSLT-to-bytecode compilation) than Java manipulation.

Sacramental answered 18/12, 2008 at 1:42 Comment(2)
Better, possible: faster, very unlikely.Vicinal
Reading, manipulating, and writing XML is exactly what XSLT is designed to do. This is a nice out-of-the-box answer.Apeldoorn
S
1

If you care less about performance, I'm a big fan of Apache Digester, since it essentially lets you map directly from XML to Java Beans.

Otherwise, you have to first parse, and then construct your objects.

Sapota answered 18/12, 2008 at 1:33 Comment(2)
I don't need to make Java Beans, just manipulate the raw XML elements a little, and review certain elements to get data from them, so a DOM style parser is probably my ideal solution.Antediluvian
Yea, dom4j would probably be a better solution there... I used to use it heavily, until I went one level up to digesterSapota

© 2022 - 2024 — McMap. All rights reserved.