Re: [Saxdotnet-devel] Requiring EndDocument
Brought to you by:
jeffrafter,
kwaclaw
From: Karl W. <ka...@wa...> - 2004-12-18 20:54:16
|
Jeff Rafter wrote: > This is actually something that is already in place and only affects > implementations, but we need to clarify when we require EndDocument to > be called. > > This is the proposal: > > (a) EndDocument is required if StartDocument is called > (b) Even when there is a FatalError > (c) In the event of an encoding error, StartDocument should have been > called so EndDocument still needs to be called. > (d) If there is an external exception (such as an IOExcpetion or a user > generated SaxExcpetion in a callback) EndDocument is still required. > (e) EndDocument should be the final event in a SAX event stream. > > Comments welcome, Just a few general thoughts about his issue: A) When is it really important for the SAX consumer to be able to rely on EndDocument() being called after StartDocument() has been called? Clearly this is the case when the link between the SAX event generator and the event consumer is restricted to the IContentHandler interface. That is, the event generator has no access to, or control over, the consumer other than through the IContentHandler call-backs. This excludes the most common situation of calling a Parse() method on an XML parser, as returning from the Parse() call (with or without exception) is the equivalent of EndDocument(). The Parse() call is the additional link. It also excludes IXmlFilter chains, as with them the process is driven from the end of the chain, again calling a Parse() method. Where it applies would be a chain (pipeline) of ContentHandler instances driven from the event generator, especially when they are dynamically assembled. B) Is it actually possible for the SAX event generator to guarantee to the consumer that it will receive the EndDocument() event? The general answer would be no. The reason is that the connection between event generator and consumer can cross process boundaries. Especially remote call-backs simply cannot guarantee that EndDocument() will be called on the consumer, but even inter-process calls on the same machine are not fail-safe. That makes it impossible to fulfill the EndDocument() contract on anything but in-process call-backs (as long as the hardware has no problem, of course). In practice, non-local SAX call-backs will rarely be used, but it is not a completely unrealistic use case either. C) So how would one deal with a scenario where the final call to EndDocument() is not guaranteed? The cleanup routines normally called from EndDocument() should also be callable from Finalize(), or any other "Reset()" kind of method. So, once the consumer gets notified of the end of the event stream (successful or not), the cleanup should proceed. - For in-process consumers (.NET), driving the event generation: On return from Parse(), call EndDocument() (or the equivalent cleanup code) if it has not been called yet. - For in-process consumers (.NET) not driving the event generation (downstream modules in a content handler chain): This depends if the consumer has knowledge of how it is going to be called. If it knows it will never be called across process boundaries, then it should assume that EndDocument() will be called. Otherwise it is the same case as for out-of-process consumers. See below. - For out-of-process consumers: The "end-of-document" cleanup code should be put into a separate method callable from several points - from EndDocument() and from Finalize(), or any other "Reset()" kind of method. It is the responsibility of whatever controller/container operates the consumer to make sure it gets notified of any communication errors. The SAX event generator cannot make such guarantees. SUMMARY It seems one has to separate the IXmlReader.Parse() method (and its equivalents) from the contract defined for IContentHandler. Exceptions thrown during Parse() may or may not be detectable by the event consumer and therefore should not have an influence on the contract. Even if one requires that the SAX event generator must always make a call to IContentHandler.EndDocument() (when StartDocument() was called), one cannot guarantee that the SAX consumer will also receive that event. This means, the SAX consumer has to be aware of how it communicates with the SAX generator. So let's then phrase the requirements again: =========== For a stream of SAX events that represent an XML document, the SAX event producer must call IContentHandler.StartDocument() exactly once, *before* any part of the input, on which the SAX events are based, is processed. IContentHandler.EndDocument() *must* be called by the SAX event producer exactly once as the last event in a SAX event stream initiated by a IContentHandler.StartDocument() call, regardless of any exceptional or error situation encountered. Depending on the call communication mechanism, however, this is no guarantee that the SAX event consumer will also receive that call. =========== Note: It seems to me this covers Jeff's requirements a) to e) above and makes it clear that this not only applies to the standard configuration of calling IXmlReader.Parse() on a SAX parser - for which these requirements would not strictly be necessary - but more generally to any sequence of SAX events that represent an XML document. Karl |