What is the advantage of using JAXP instead of DOM / SAX directly in Java?
Asked Answered
M

2

11

Being new to XML parsing I'm trying to understand the different technologies. There is a confusing amount of different technologies for different needs:

  • W3C-DOM
  • XOM
  • jDom
  • JAXP
  • JAXB
  • DOM
  • SAX
  • StAX
  • TrAX
  • Woodstox
  • dom4j
  • Crimson
  • VTD-XML
  • Xerces-J
  • Castor
  • XStream
  • ...

Just to name a few.

DOM and SAX seem to be a low-level way for parsing and working on XML, so I decided to focus on the ones that get mentioned the most in different sources and are low-level:

DOM, SAX, JAXP.

I've read about parsers in general here on stackoverflow, JAXP-Tutorial from Oracle, XML-Parsing in general, and so on.

I've also tried some tutorials like this german one and others.

I'm grasping a little bit about DOM and SAX now, but the reason to use JAXP is still beyond me. It seems to be more of an interface to use DOM, SAX, ... internally, but why not use DOM or SAX directly?

What is the advantage of using JAXP in layman's-terms?

Macaroon answered 5/1, 2016 at 9:55 Comment(5)
When I'm working with (manipulating/creating) xml i'm always using DOM, but that's just my personal opinion! I think it works quite well and provides all the features you need.Abjure
This may helps you jaxp.java.net/1.4/JAXP-FAQ.htmlPhosphorylase
ParkerHalo: DOM seems to be a very intuitive way to work with XML. The main reason to not use DOM is often stated as the size of a document, but people only say "if the document is too big, use SAX instead of DOM", while never defining what "big" means - lines of code, document size in MB, number of xml-objects, ... and at which number this occurs. Are 20,000 lines considered big, or 1,000,000 and so on.Macaroon
@Macaroon You'll notice what's big when you run out of memory (which won't take that much time with DOM). As for JAXP, it's just an old term (Java Api for XML Processing) to refer to the SAX/DOM/StAX parsers. You can't really "use" JAXP.Funnyman
@Kayman Is it something I HAVE to notice (as the environment is different each time I use a parser), or are there "rules of thumb" i.e. more than X MB, more than Y lines of code, etc.? Because noticing after doing all of the implementation seems to be too late.Macaroon
D
13

(Although you haven't said so explicitly, your question seems to relate exclusively to the Java world, and this answer reflects that.)

JAXP is a set of interfaces covering XML parsing, XSLT transformation, and XML schema validation. If we just focus on the XML parsing side, its main contribution is to provide a mechanism for locating an XML parser implementation, so your source code isn't locked into a particular product. Frankly that's of limited value these days; the only two SAX/DOM parsers in common use are the one embedded in the JDK, and Apache Xerces. Apache Xerces is better in every respect except that you need to download it separately.

As for the other parsing interfaces, they break down into two categories: event-based APIs and tree-based APIs. Tree-based APIs are much easier to work with, but can use a lot of memory when handling large documents.

The two dominant event-based APIs are SAX (push) and StAX (pull). Pull parsing is something many programmers find easier because you can use the program stack to maintain state information; unfortunately though the StAX API is a bit buggy - different implementations have fixed its gaps in different ways. The most complete and reliable implementation of StAX is the Woodstox parser; the most complete and reliable implementation of SAX is Apache Xerces. But don't attempt to use an event-based parsing approach unless your application really needs that level of performance (and unless you have the level of experience needed to avoid losing all the performance gains at the application level.)

For tree-based APIs, the DOM remains dominant solely because it was defined by W3C and is implemented in the JDK, and is therefore perceived as "standard"; also it's the one mentioned in all the books on the subject. However, of all the tree models, it is unquestionably the worst designed (mainly because it predates the introduction of namespaces). Alternatives include JDOM2, DOM4J, XOM, and AXIOM. I tend to recommend JDOM2 or XOM.

Dogmatic answered 5/1, 2016 at 12:0 Comment(7)
You're right, I have changed my title in order to have "Java" in it. So JAXP is some sort of box that contains DOM/SAX (XML Parsing), XSLT, ...? And if I use DOM / SAX directly, I am indirectly "using" JAXP, as DOM and SAX originate from JAXP? I've read some reviews about XOM and it seems to be quite good, but the licence (LGPL) might make it hard for me to use in my projects. But I have to read more about that.Macaroon
Note that the SAX/DOM implementation in the JDK is based on Apache Xerces, and it is actually better maintained than the original.Reducer
@AndreasVeithen, Yes, it is a fork of the original. But it has some very serious bugs which have been known for donkey's years (well, at least since 2009) and have never been fixed. You don't even get any kind of acknowledgement when you report them, they just go into a black hole.Dogmatic
@hamena314, I wouldn't describe JAXP (specifically the XML parsing part of JAXP) as "containing" DOM/SAX services, more as a kind of router that enables you to find a supplier of DOM/SAX services. The distinction is that if you know the class name of the DOM/SAX implementation you want to use, and you don't want portability across different implementations, then you can usually bypass the JAXP search mechanism.Dogmatic
@AndreasVeithen for an example of such a bug see bugs.java.com/bugdatabase/view_bug.do?bug_id=8145969. Although this was reported recently, it is a very old bug, and I reported it at least five years ago, though I cannot find my previous reports in the Oracle database (only an email from me to a customer telling them I had reported it).Dogmatic
Update: the Oracle bug tracker claims that this bug is fixed in JDK 9. At last.Dogmatic
Update: from JDK 9 I am no longer advising people against using the JDK version of Xerces; the major problems that existed in earlier JDK versions appear to be fixed.Dogmatic
B
1

JAXP is just Sun's (now Oracle's) name for a collection of SAX and DOM classes they bundle with the JDK. If you're using JAXP, you're also using SAX and/or DOM. It's not a different thing.

JAXP also adds a few helper classes in the javax.xml.parsers package that fill gaps in SAX 1 and DOM 1, i.e. old versions of these libraries from 15+ years ago. However these are not necessary with SAX2/DOM3 that are used today. Worse yet, javax.xml.parsers classes such as DocumentBuilderFactory and SAXParserFactory are designed in a confusing way (they're not namespace aware by default) so they are almost always used incorrectly. Then developers come here to ask why their program doesn't do what they think it should. Just ignore these classes and use XMLReaderFactory (SAX 2) or DOMImplementationLS (DOM 3) instead.

Barker answered 5/1, 2016 at 15:12 Comment(3)
Namespace aware means, that in an XML document a company might have an XML-element named adress and later in the document a employee might have an XML-element named also adress? Is that, what you are refering to? And despite using differeng Factory(?) classes like DOMImplementationLS instead of DocumentBuilderFactory, are there any other differences in usage?Macaroon
@ElliotteRustyHarold I have always taken the view that JAXP is an interface, but when you say that Oracle/Sun use the name to refer to "a collection of SAX and DOM classes" (that is, a specific implementation), I think you are right. They have a very bad track record at confusing the interface with their specific implementation.Dogmatic
@Macaroon Besides the builder and factory classes, there are NO differences in usage between JAXP SAX and regular SAX. They are the same classes. They are just bundled with the JDK. Same answer for DOM. Namespace aware, in this context, has to do with how the parser passes local and qualified names to which methods. You always want this turned on and the javax.xml.parsers classes turn this off by default. :-(Barker

© 2022 - 2024 — McMap. All rights reserved.