Is XMLReader a SAX parser, a DOM parser, or neither?
Asked Answered
I

4

4

I am testing various methods to read (possibly large, with very frequent reads) XML configuration files in PHP. No writing is ever needed. I have two successful implementations, one using SimpleXML (which I know is a DOM parser) and one using XMLReader.

I know that a DOM reader must read the whole tree and therefore uses more memory. My tests reflect that. I also know that A SAX parser is an "event-based" parser that uses less memory because it reads each node from the stream without checking what is next.

XMLReader also reads from a stream with the cursor providing data about the node it is currently at. So, it definitely sounds like XMLReader (https://www.php.net/xmlreader) is not a DOM parser, but my question is, is it a SAX parser, or something else? It seems like XMLReader behaves the way a SAX parser does but does not throw the events themselves (in other words, can you construct a SAX parser with XMLReader?)

If it is something else, does the classification it's in have a name?

Inner answered 15/6, 2010 at 19:40 Comment(2)
See also this question, answer have benchmark linkKnawel
See this other related question, about use of LibXML2 implementartion, and use of SAX interface instead Expat interface... And about terminology of the present question (confused here, better there).Knawel
M
6

XMLReader calls itself a "pull parser."

The XMLReader extension is an XML Pull parser. The reader acts as a cursor going forward on the document stream and stopping at each node on the way.

It later goes on to say it uses libxml.

This page on Java XML Pull Parsing may be of some possible interest. If XMLReader is related to this project's goals and intent, then the answer to your question falls squarely into the "neither" category.

Mameluke answered 15/6, 2010 at 19:47 Comment(0)
D
4

A SAX parser is a parser which implements the SAX API. That is: a given parser is a SAX parser if and only if you can code against it using the SAX API. Same for a DOM parser: this classification is purely about the API it supports, not how that API is implemented. Thus a SAX parser might very well be a DOM parser, too; and hence you cannot be so sure about using less memory or other characteristics.

However to get to the real question: XMLReader seems the better choice because since it is a pull parser you request the data you want quite specifically and therefore there should be less overhead involved.

Dot answered 15/6, 2010 at 19:51 Comment(0)
G
1

XMLReader is an interface that a SAX2 parser must implement. Thus you could say that you have a SAX parser when you access it through XMLReader and for short that XMLReader is the SAX parser.

See the javadoc of XMLReader.

XMLReader is the interface that an XML parser's SAX2 driver must implement. This interface allows an application to set and query features and properties in the parser, to register event handlers for document processing, and to initiate a document parse.

I think this information is relevant because:

  • It comes from the official Web site for SAX
  • Even if the javadoc is for Java, SAX originated in the Java language.
Guillen answered 31/10, 2011 at 5:7 Comment(4)
No, this is wrong. The PHP XMLReader is a pull parser, whereas the Java XMLReader is an event-based push parser. Therefore, the PHP XMLReader is neither a SAX parser nor a DOM parser.Immorality
You should put your comment at the right place, that is under the OP question.Guillen
Yes, but I was trying to point-out that the OP was talking about PHP, not Java. The Java XMLReader and its interface is completely different and unrelated.Immorality
OK, I didn't read the question with enough attention and the accepted answer led me to focus on Java. My badGuillen
I
1

In short, it is neither.

SAX parsers are stream-oriented, event-based push parsers. You register callback functions to handle events such as startElement and endElement, then call parse() to process the entire XML document, one node at a time. To my knowledge, PHP doesn't have a well-maintained SAX parser. However, there is XMLParser, which uses the very similar Expat library.

DOM parsers require you to load the entire XML document into memory, but they provide an object-oriented tree of the XML nodes. Examples of DOM parsers in PHP include SimpleXML and DOM.

The PHP XMLReader is neither of these. It is a stream-oriented "pull parser" that requires you to create a big loop and call the read() function to move the cursor forward, processing one node at a time.

The big benefit of XMLParser and XMLReader vs SimpleXML and DOM is that stream-oriented parsers are memory efficient, only loading the current node into memory. On the other hand, SimpleXML and DOM are easier to use, but they require you to load the entire XML document into memory, and this is bad for very large XML documents.

Immorality answered 12/2, 2013 at 1:22 Comment(6)
Ops, I undertand that PHP's SAX is the XML Parser, see also my answer explaning it.Knawel
@PeterKrauss, that's an inaccurate statement. Quoting the XML Parser doc that you linked to… "This PHP extension implements support for James Clark's expat in PHP." And quoting the Expat Wikipedia page, "Expat is not a SAX-compliant parser." Of course, there are 3rd-party wrappers for Expat that implement the SAX and SAX2 interface. PHP's XML Parser does NOT implement the SAX interface, although it's very similar.Immorality
yes... and inaccuracy starts with the PHP-guide (!), that not cite the term "SAX" (this author say that it is SAX), and when citing expat not say if it is a "expat interface" or the "original old expat software". Another portion of the PHP-Guide say that use LibXML2, and LibXML2 page say that implements SAX: "a SAX2 like interface and a minimal SAX1 implementation compatible with early expat versions". It is ambiguous...Knawel
No, you are wrong. The Java org.xml.sax package is the normative implementation of SAX. Does the PHP XML Parser interface follow the Java org.xml.sax interface? No, it does not. For example, the PHP XML Parser does not have the startDocument and endDocument event handlers. Therefore, the PHP XML Parser is not SAX compliant. End of story.Immorality
"SAX was originally a Java-only API... 1998s", Java only starts the history (!). SAX is not a Java concept, neither Java/Oracle is "the SAX owner". SAX is a generic abstraction for XML parses, alternative to DOM... So, Oracle's org.xml.sax is good, perhaps, for Java users, and is a good "reference model", but is not a standard body like W3C. So, please, not radicalize, the discussion. You have good arguements, wait for them to take effect.Knawel
PS (for other readers): the libXML2 SAX interface have startDocument and all other SAX event handlers. Historically Expat and SAX APIs was created "simultaneous" at the 1998s, as competitors... For final-user the effect is similar (use of event-driven parser), and PHP4 adopted Expat.Knawel

© 2022 - 2024 — McMap. All rights reserved.