Thread: [Saxdotnet-devel] Requiring EndDocument
Brought to you by:
jeffrafter,
kwaclaw
From: Jeff R. <li...@je...> - 2004-12-07 19:35:17
|
This is actually something that is already in place and only affects implementations, but we need to clarify when we require EndDocument to be called. This is the proposal: (a) EndDocument is required if StartDocument is called (b) Even when there is a FatalError (c) In the event of an encoding error, StartDocument should have been called so EndDocument still needs to be called. (d) If there is an external exception (such as an IOExcpetion or a user generated SaxExcpetion in a callback) EndDocument is still required. (e) EndDocument should be the final event in a SAX event stream. Comments welcome, Jeff Rafter |
From: Karl W. <ka...@wa...> - 2004-12-07 19:50:12
|
> This is actually something that is already in place and only affects > implementations, but we need to clarify when we require EndDocument to > be called. Actually, what you sate below differes from what is currently documented: The docs state: <quote> To be more specific about when EndDocument() must be called: It must always be called once StartDocument() has been called, even after a fatal error, unless IXmlReader.Parse() returned with an exception. </quote> > This is the proposal: > > (a) EndDocument is required if StartDocument is called > (b) Even when there is a FatalError > (c) In the event of an encoding error, StartDocument should have been > called so EndDocument still needs to be called. > (d) If there is an external exception (such as an IOExcpetion or a user > generated SaxExcpetion in a callback) EndDocument is still required. > (e) EndDocument should be the final event in a SAX event stream. > > Comments welcome, > I agree with a, b, c and e. About exceptions, whether user generated (call-back) or parser generated (IO, AV due to parser bug), they should be a dead stop. When the Parse() (or Resume()) method returned with an exception, EndDocument should not have been called. Nor should there have been any other call-backs after the exception. If you throw an exception, you want everything to stop, except for cleanup, which you should do right away in a finally clause, until it is handled. That is how exceptions normally work. If the parser throws the exception, only Parse()/Resume() can handle it, if a call-back throws an exception (that is not handled by the call-back itself), then who would reliably handle it? Again, only Parse()/Resume(). You do not want any other functions, that have nothing to do with exception handling or cleanup, called in between. Karl |
From: Karl W. <ka...@wa...> - 2004-12-18 20:54:16
|
Jeff Rafter wrote: > This is actually something that is already in place and only affects > implementations, but we need to clarify when we require EndDocument to > be called. > > This is the proposal: > > (a) EndDocument is required if StartDocument is called > (b) Even when there is a FatalError > (c) In the event of an encoding error, StartDocument should have been > called so EndDocument still needs to be called. > (d) If there is an external exception (such as an IOExcpetion or a user > generated SaxExcpetion in a callback) EndDocument is still required. > (e) EndDocument should be the final event in a SAX event stream. > > Comments welcome, Just a few general thoughts about his issue: A) When is it really important for the SAX consumer to be able to rely on EndDocument() being called after StartDocument() has been called? Clearly this is the case when the link between the SAX event generator and the event consumer is restricted to the IContentHandler interface. That is, the event generator has no access to, or control over, the consumer other than through the IContentHandler call-backs. This excludes the most common situation of calling a Parse() method on an XML parser, as returning from the Parse() call (with or without exception) is the equivalent of EndDocument(). The Parse() call is the additional link. It also excludes IXmlFilter chains, as with them the process is driven from the end of the chain, again calling a Parse() method. Where it applies would be a chain (pipeline) of ContentHandler instances driven from the event generator, especially when they are dynamically assembled. B) Is it actually possible for the SAX event generator to guarantee to the consumer that it will receive the EndDocument() event? The general answer would be no. The reason is that the connection between event generator and consumer can cross process boundaries. Especially remote call-backs simply cannot guarantee that EndDocument() will be called on the consumer, but even inter-process calls on the same machine are not fail-safe. That makes it impossible to fulfill the EndDocument() contract on anything but in-process call-backs (as long as the hardware has no problem, of course). In practice, non-local SAX call-backs will rarely be used, but it is not a completely unrealistic use case either. C) So how would one deal with a scenario where the final call to EndDocument() is not guaranteed? The cleanup routines normally called from EndDocument() should also be callable from Finalize(), or any other "Reset()" kind of method. So, once the consumer gets notified of the end of the event stream (successful or not), the cleanup should proceed. - For in-process consumers (.NET), driving the event generation: On return from Parse(), call EndDocument() (or the equivalent cleanup code) if it has not been called yet. - For in-process consumers (.NET) not driving the event generation (downstream modules in a content handler chain): This depends if the consumer has knowledge of how it is going to be called. If it knows it will never be called across process boundaries, then it should assume that EndDocument() will be called. Otherwise it is the same case as for out-of-process consumers. See below. - For out-of-process consumers: The "end-of-document" cleanup code should be put into a separate method callable from several points - from EndDocument() and from Finalize(), or any other "Reset()" kind of method. It is the responsibility of whatever controller/container operates the consumer to make sure it gets notified of any communication errors. The SAX event generator cannot make such guarantees. SUMMARY It seems one has to separate the IXmlReader.Parse() method (and its equivalents) from the contract defined for IContentHandler. Exceptions thrown during Parse() may or may not be detectable by the event consumer and therefore should not have an influence on the contract. Even if one requires that the SAX event generator must always make a call to IContentHandler.EndDocument() (when StartDocument() was called), one cannot guarantee that the SAX consumer will also receive that event. This means, the SAX consumer has to be aware of how it communicates with the SAX generator. So let's then phrase the requirements again: =========== For a stream of SAX events that represent an XML document, the SAX event producer must call IContentHandler.StartDocument() exactly once, *before* any part of the input, on which the SAX events are based, is processed. IContentHandler.EndDocument() *must* be called by the SAX event producer exactly once as the last event in a SAX event stream initiated by a IContentHandler.StartDocument() call, regardless of any exceptional or error situation encountered. Depending on the call communication mechanism, however, this is no guarantee that the SAX event consumer will also receive that call. =========== Note: It seems to me this covers Jeff's requirements a) to e) above and makes it clear that this not only applies to the standard configuration of calling IXmlReader.Parse() on a SAX parser - for which these requirements would not strictly be necessary - but more generally to any sequence of SAX events that represent an XML document. Karl |
From: Elliotte H. <el...@me...> - 2004-12-20 16:08:53
|
Folks, I just got an e-mail from David Megginson informing me that the JavaDocs and some other docs at sax.sourceforge.net have been updated. Apparently, he did not have access to the web site for some time, and the documentation there was not up to date with the latest round of revisions for SAX 2.0.2 that went on some months back for Java 1.5. Anyway, this probably fixes at least some but not all of the inconsistencies that have been noted here about what's null and what's the empty string. I haven't checked in detail yet, but it's worth double checking all of our assumptions and comments over the last month or so against the latest docs. -- Elliotte Rusty Harold el...@me... XML in a Nutshell 3rd Edition Just Published! http://www.cafeconleche.org/books/xian3/ http://www.amazon.com/exec/obidos/ISBN=0596007647/cafeaulaitA/ref=nosim |
From: Karl W. <ka...@wa...> - 2004-12-20 19:09:54
|
Elliotte Harold wrote: > Folks, > > I just got an e-mail from David Megginson informing me that the JavaDocs > and some other docs at sax.sourceforge.net have been updated. > Apparently, he did not have access to the web site for some time, and > the documentation there was not up to date with the latest round of > revisions for SAX 2.0.2 that went on some months back for Java 1.5. > > Anyway, this probably fixes at least some but not all of the > inconsistencies that have been noted here about what's null and what's > the empty string. I haven't checked in detail yet, but it's worth double > checking all of our assumptions and comments over the last month or so > against the latest docs. I asked Dave Megginson and he does not remember why exactly namespace URI string and prefixes are treated differently, but he thinks it might have been programming convenience. One xml-dev reference I found, was by Tim Bray, arguing that since namespace URI references are not allowed to be empty strings (as per the namespace specs), passing an empty string should indicate an absent URI ref and then one can use the Equals method on it without worrying about null. (RFC 2396 gives a different meaning to empty URI references). In C# this seems less of an issue, as there are two other null-safe options: 1) static method: if (String.Equals(uri, "http://my.namespace.com")) {...} 2) operator: if (uri == "http://my.namespace.com") {...} Of which the latter would be the standard way to compare strings. Karl |
From: Karl W. <ka...@wa...> - 2004-12-20 19:27:30
|
Jeff Rafter wrote: >> In C# this seems less of an issue, as there are two >> other null-safe options: >> 1) static method: if (String.Equals(uri, "http://my.namespace.com")) >> {...} >> 2) operator: if (uri == "http://my.namespace.com") {...} > > > Admittedly this is pretty standard in C#, however it is plausible that > some Java coder will come to C# with limited training and just hack > away-- in which case they may not follow this pattern (or if someone is > porting a Java library for instance...) > > In any event it can go either way and I will code to it... but I believe > that it should be very very clearly documented if we do not go with > string.empty. If I get a vote I still vote for string.empty-- though > your arguments have made me less zealous. Would you vote for String.Empty in all cases, or just for namespace URIs and prefixes? The former would make us even less conformant with the Java specs. > OFFTOPIC: sorry I have been out of the discussion for the past two > weeks-- I have been doing a lot of unexpected business travel... You may have a few posts to reply to... :-) Karl |