Pull Parsing versus Push Parsing

<< Streaming API for XML | StAX Use Cases >>

Pull Parsing versus Push Parsing

The DOM model involves creating in-memory objects representing an entire document tree
and the complete infoset state for an XML document. Once in memory, DOM trees can be
navigated freely and parsed arbitrarily, and as such provide maximum flexibility for developers.
However, the cost of this flexibility is a potentially large memory footprint and significant
processor requirements, because the entire representation of the document must be held in
memory as objects for the duration of the document processing. This may not be an issue when
working with small documents, but memory and processor requirements can escalate quickly
with document size.

Streaming refers to a programming model in which XML infosets are transmitted and parsed
serially at application runtime, often in real time, and often from dynamic sources whose
contents are not precisely known beforehand. Moreover, stream-based parsers can start
generating output immediately, and infoset elements can be discarded and garbage collected
immediately after they are used. While providing a smaller memory footprint, reduced
processor requirements, and higher performance in certain situations, the primary trade-off
with stream processing is that you can only see the infoset state at one location at a time in the
document. You are essentially limited to the "cardboard tube" view of a document, the
implication being that you need to know what processing you want to do before reading the
XML document.

Streaming models for XML processing are particularly useful when your application has strict
memory limitations, as with a cell phone running J2ME, or when your application needs to
simultaneously process several requests, as with an application server. In fact, it can be argued
that the majority of XML business logic can benefit from stream processing, and does not
require the in-memory maintenance of entire DOM trees.

Pull Parsing versus Push Parsing

Streaming pull parsing refers to a programming model in which a client application calls
methods on an XML parsing library when it needs to interact with an XML infoset; that is, the
client only gets (pulls) XML data when it explicitly asks for it.

Streaming push parsing refers to a programming model in which an XML parser sends (pushes)
XML data to the client as the parser encounters elements in an XML infoset; that is, the parser
sends the data whether or not the client is ready to use it at that time.

Pull parsing provides several advantages over push parsing when working with XML streams:

With pull parsing, the client controls the application thread, and can call methods on the
parser when needed. By contrast, with push processing, the parser controls the application
thread, and the client can only accept invocations from the parser.

Pull parsing libraries can be much smaller and the client code to interact with those libraries
much simpler than with push libraries, even for more complex documents.

Pull clients can read multiple documents at one time with a single thread.

Why StAX?

The Java EE 5 Tutorial · September 2007

550