Re: [Saxdotnet-devel] Requiring EndDocument

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Jeff Rafter wrote:
> This is actually something that is already in place and only affects 
> implementations, but we need to clarify when we require EndDocument to
> be called.
> 
> This is the proposal:
> 
> (a) EndDocument is required if StartDocument is called
> (b) Even when there is a FatalError
> (c) In the event of an encoding error, StartDocument should have been 
> called so EndDocument still needs to be called.
> (d) If there is an external exception (such as an IOExcpetion or a user 
> generated SaxExcpetion in a callback) EndDocument is still required.
> (e) EndDocument should be the final event in a SAX event stream.
> 
> Comments welcome,

Just a few general thoughts about his issue:

A) When is it really important for the SAX consumer to be able to rely
    on EndDocument() being called after StartDocument() has been called?

Clearly this is the case when the link between the SAX event generator
and the event consumer is restricted to the IContentHandler interface.
That is, the event generator has no access to, or control over, the
consumer other than through the IContentHandler call-backs.

This excludes the most common situation of calling a Parse() method
on an XML parser, as returning from the Parse() call (with or without
exception) is the equivalent of EndDocument(). The Parse() call is the
additional link.

It also excludes IXmlFilter chains, as with them the process is driven
from the end of the chain, again calling a Parse() method.

Where it applies would be a chain (pipeline) of ContentHandler
instances driven from the event generator, especially when they are
dynamically assembled.

B) Is it actually possible for the SAX event generator to guarantee
    to the consumer that it will receive the EndDocument() event?

The general answer would be no. The reason is that the connection
between event generator and consumer can cross process boundaries.
Especially remote call-backs simply cannot guarantee that EndDocument()
will be called on the consumer, but even inter-process calls on the same
machine are not fail-safe. That makes it impossible to fulfill the
EndDocument() contract on anything but in-process call-backs
(as long as the hardware has no problem, of course).

In practice, non-local SAX call-backs will rarely be used, but it is not
a completely unrealistic use case either.

C) So how would one deal with a scenario where the final call to
    EndDocument() is not guaranteed?

The cleanup routines normally called from EndDocument() should also
be callable from Finalize(), or any other "Reset()" kind of method.
So, once the consumer gets notified of the end of the event stream
(successful or not), the cleanup should proceed.

- For in-process consumers (.NET), driving the event generation:
   On return from Parse(), call EndDocument() (or the equivalent cleanup code)
   if it has not been called yet.

- For in-process consumers (.NET) not driving the event generation
   (downstream modules in a content handler chain):
   This depends if the consumer has knowledge of how it is going to
   be called. If it knows it will never be called across process boundaries,
   then it should assume that EndDocument() will be called. Otherwise it is the
   same case as for out-of-process consumers. See below.

- For out-of-process consumers:
   The "end-of-document" cleanup code should be put into a separate method
   callable from several points - from EndDocument() and  from Finalize(), or
   any other "Reset()" kind of method. It is the responsibility of whatever
   controller/container operates the consumer to make sure it gets notified
   of any communication errors. The SAX event generator cannot make such
   guarantees.

SUMMARY

It seems one has to separate the IXmlReader.Parse() method (and its 
equivalents) from the contract defined for IContentHandler. Exceptions thrown 
during Parse() may or may not be detectable by the event consumer and 
therefore should not have an influence on the contract.

Even if one requires that the SAX event generator must always make a call to 
IContentHandler.EndDocument() (when StartDocument() was called), one cannot 
guarantee that the SAX consumer will also receive that event. This means, the 
SAX consumer has to be aware of how it communicates with the SAX generator.

So let's then phrase the requirements again:

===========
For a stream of SAX events that represent an XML document, the SAX event 
producer must call IContentHandler.StartDocument() exactly once, *before* any 
part of the input, on which the SAX events are based, is processed.

IContentHandler.EndDocument() *must* be called by the SAX event producer 
exactly once as the last event in a SAX event stream initiated by a 
IContentHandler.StartDocument() call, regardless of any exceptional or error 
situation encountered. Depending on the call communication mechanism, however, 
this is no guarantee that the SAX event consumer will also receive that call.
===========

Note: It seems to me this covers Jeff's requirements a) to e) above and makes 
it clear that this not only applies to the standard configuration of calling 
IXmlReader.Parse() on a SAX parser - for which these requirements would not 
strictly be necessary - but more generally to any sequence of SAX events that 
represent an XML document.

Karl