gnu.xml.pipeline

Class DomConsumer

Implemented Interfaces:
EventConsumer
Known Direct Subclasses:
Consumer

public class DomConsumer
extends java.lang.Object
implements EventConsumer

This consumer builds a DOM Document from its input, acting either as a pipeline terminus or as an intermediate buffer. When a document's worth of events has been delivered to this consumer, that document is read with a DomParser and sent to the next consumer. It is also available as a read-once property.

The DOM tree is constructed as faithfully as possible. There are some complications since a DOM should expose behaviors that can't be implemented without API backdoors into that DOM, and because some SAX parsers don't report all the information that DOM permits to be exposed. The general problem areas involve information from the Document Type Declaration (DTD). DOM only represents a limited subset, but has some behaviors that depend on much deeper knowledge of a document's DTD. You shouldn't have much to worry about unless you change handling of "noise" nodes from its default setting (which ignores them all); note if you use JAXP to populate your DOM trees, it wants to save "noise" nodes by default. (Such nodes include ignorable whitespace, comments, entity references and CDATA boundaries.) Otherwise, your main worry will be if you use a SAX parser that doesn't flag ignorable whitespace unless it's validating (few don't).

The SAX2 events used as input must contain XML Names for elements and attributes, with original prefixes. In SAX2, this is optional unless the "namespace-prefixes" parser feature is set. Moreover, many application components won't provide completely correct structures anyway. Before you convert a DOM to an output document, you should plan to postprocess it to create or repair such namespace information. The NSFilter pipeline stage does such work.

Note: changes late in DOM L2 process made it impractical to attempt to create the DocumentType node in any implementation-neutral way, much less to populate it (L1 didn't support even creating such nodes). To create and populate such a node, subclass the inner DomConsumer.Handler class and teach it about the backdoors into whatever DOM implementation you want. It's possible that some revised DOM API (L3?) will make this problem solvable again.

See Also:
DomParser

Nested Class Summary

static class
DomConsumer.Handler
Class used to intercept various parsing events and use them to populate a DOM document.

Constructor Summary

DomConsumer(Class impl)
Configures this pipeline terminus to use the specified implementation of DOM when constructing its result value.
DomConsumer(Class impl, EventConsumer n)
Configures this consumer as a buffer/filter, using the specified DOM implementation when constructing its result value.

Method Summary

ContentHandler
getContentHandler()
Returns the document handler being used.
DTDHandler
getDTDHandler()
Returns the DTD handler being used.
Document
getDocument()
Returns the document constructed from the preceding sequence of events.
Object
getProperty(String id)
Returns the lexical handler being used.
boolean
isHidingCDATA()
Returns true if the consumer is saving CDATA boundaries, or false (the default) otherwise.
boolean
isHidingComments()
Returns true if the consumer is hiding comments (the default), and false if they should be placed into the output document.
boolean
isHidingReferences()
Returns true if the consumer is hiding entity references nodes (the default), and false if EntityReference nodes should instead be created.
boolean
isHidingWhitespace()
Returns true if the consumer is hiding ignorable whitespace (the default), and false if such whitespace should be placed into the output document as children of element nodes.
void
setErrorHandler(ErrorHandler handler)
protected void
setHandler(DomConsumer.Handler h)
This is the hook through which a subclass provides a handler which knows how to access DOM extensions, specific to some implementation, to record additional data in a DOM.
void
setHidingCDATA(boolean flag)
Controls whether the consumer will save CDATA boundaries.
void
setHidingComments(boolean flag)
Controls whether the consumer is hiding comments.
void
setHidingReferences(boolean flag)
Controls whether the consumer will hide entity expansions, or will instead mark them with entity reference nodes.
void
setHidingWhitespace(boolean flag)
Controls whether the consumer hides ignorable whitespace

Constructor Details

DomConsumer

public DomConsumer(Class impl)
            throws SAXException
Configures this pipeline terminus to use the specified implementation of DOM when constructing its result value.

Parameters:
impl - class implementing Document which publicly exposes a default constructor

Throws:
SAXException - when there is a problem creating an empty DOM document using the specified implementation


DomConsumer

public DomConsumer(Class impl,
                   EventConsumer n)
            throws SAXException
Configures this consumer as a buffer/filter, using the specified DOM implementation when constructing its result value.

This event consumer acts as a buffer and filter, in that it builds a DOM tree and then writes it out when endDocument is invoked. Because of the limitations of DOM, much information will as a rule not be seen in that replay. To get a full fidelity copy of the input event stream, use a TeeConsumer.

Parameters:
impl - class implementing Document which publicly exposes a default constructor

Throws:
SAXException - when there is a problem creating an empty DOM document using the specified DOM implementation

Method Details

getContentHandler

public final ContentHandler getContentHandler()
Returns the document handler being used.
Specified by:
getContentHandler in interface EventConsumer


getDTDHandler

public final DTDHandler getDTDHandler()
Returns the DTD handler being used.
Specified by:
getDTDHandler in interface EventConsumer


getDocument

public final Document getDocument()
Returns the document constructed from the preceding sequence of events. This method should not be used again until another sequence of events has been given to this EventConsumer.


getProperty

public final Object getProperty(String id)
            throws SAXNotRecognizedException
Returns the lexical handler being used. (DOM construction can't really use declaration handlers.)
Specified by:
getProperty in interface EventConsumer


isHidingCDATA

public final boolean isHidingCDATA()
Returns true if the consumer is saving CDATA boundaries, or false (the default) otherwise.

See Also:
setHidingCDATA(boolean)


isHidingComments

public final boolean isHidingComments()
Returns true if the consumer is hiding comments (the default), and false if they should be placed into the output document.

See Also:
setHidingComments(boolean)


isHidingReferences

public final boolean isHidingReferences()
Returns true if the consumer is hiding entity references nodes (the default), and false if EntityReference nodes should instead be created. Such EntityReference nodes will normally be empty, unless an implementation arranges to populate them and then turn them back into readonly objects.

See Also:
setHidingReferences(boolean)


isHidingWhitespace

public final boolean isHidingWhitespace()
Returns true if the consumer is hiding ignorable whitespace (the default), and false if such whitespace should be placed into the output document as children of element nodes.

See Also:
setHidingWhitespace(boolean)


setErrorHandler

public void setErrorHandler(ErrorHandler handler)
Specified by:
setErrorHandler in interface EventConsumer


setHandler

protected void setHandler(DomConsumer.Handler h)
This is the hook through which a subclass provides a handler which knows how to access DOM extensions, specific to some implementation, to record additional data in a DOM. Treat this as part of construction; don't call it except before (or between) parses.


setHidingCDATA

public final void setHidingCDATA(boolean flag)
Controls whether the consumer will save CDATA boundaries.

Parameters:
flag - True to treat CDATA text differently from other text nodes

See Also:
isHidingCDATA()


setHidingComments

public final void setHidingComments(boolean flag)
Controls whether the consumer is hiding comments.

See Also:
isHidingComments()


setHidingReferences

public final void setHidingReferences(boolean flag)
Controls whether the consumer will hide entity expansions, or will instead mark them with entity reference nodes.

Parameters:
flag - False if entity reference nodes will appear

See Also:
isHidingReferences()


setHidingWhitespace

public final void setHidingWhitespace(boolean flag)
Controls whether the consumer hides ignorable whitespace

See Also:
isHidingComments()