saxdotnet-devel Mailing List for SAX for .NET
Brought to you by:
jeffrafter,
kwaclaw
You can subscribe to this list here.
2004 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
(45) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2005 |
Jan
(20) |
Feb
(1) |
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: Karl W. <ka...@wa...> - 2005-02-15 15:24:07
|
I think we have everything at a release level except for the Conformance demo. Anyone interested should check out current CVS. Karl |
From: Karl W. <ka...@wa...> - 2005-01-17 18:07:50
|
Jeff Rafter wrote: >> However, I think I mentioned that RPC is not a common SAX use case. > > > I agree, it is not common. I think that when we have that documentation > around end document, if we include information about this, we need to > explain it clearly (as you did above). Also, I think we need to be > explicit about what happens in the case of a user generated exception in > a callback (i.e., EndDocument is *still* called). This is what I have in CVS currently as doc for EndDocument(): /// <summary>See <see href="http://www.saxproject.org/apidoc/org/xml/sax/ContentHandler.html#endDocument()"> /// ContentHandler.endDocument</see> on www.saxproject.org.</summary> /// <remarks>Differences to Java: /// <list type="bullet"> /// <item>Stricter about when to call: <c>EndDocument</c> <b>must</b> be called by the /// SAX event producer exactly once as the last event in a SAX event stream initiated /// by a <see cref="IContentHandler.StartDocument"/> call, regardless of any exceptional /// or error situation encountered. Depending on the call communication mechanism, however, /// this is no guarantee that the SAX event consumer will also receive that call.</item> /// </list></remarks> >> I am not sure I understand you fully. > > > Well, I was imagining the case where there was a two-gig element name, > or two-gig (count) of attributes on an element. We cannot pass the > information back clearly... maybe a specialized exception, or something > similar to an API FatalError would be useful at that point... so that > the cause is clearly identified. Again, this is not something for this > release. I just feel as though there are things that we could do to fix > this... Normally, overflow situations are already handled by the runtime system. What is it we should do over and above that? Karl |
From: Jeff R. <li...@je...> - 2005-01-17 17:23:30
|
> What I mean is that in a scenario like DCOM (or some other RPC mechanism) > one cannot guarantee that the call reaches the target, since RPC > mechanisms may have communication failures. You may guarantee that > the call is made (i.e. that it originates), but not that it arrives. > So in such a situation the callee cannot rely on the EndDocument() > callback. > However, I think I mentioned that RPC is not a common SAX use case. I agree, it is not common. I think that when we have that documentation around end document, if we include information about this, we need to explain it clearly (as you did above). Also, I think we need to be explicit about what happens in the case of a user generated exception in a callback (i.e., EndDocument is *still* called). > A feature called reader-control has been added. It is obviously a read-only > feature. Excellent. > I am not sure I understand you fully. Well, I was imagining the case where there was a two-gig element name, or two-gig (count) of attributes on an element. We cannot pass the information back clearly... maybe a specialized exception, or something similar to an API FatalError would be useful at that point... so that the cause is clearly identified. Again, this is not something for this release. I just feel as though there are things that we could do to fix this... > Very good indeed! Yes quite. I also took Elliotte's advice and looked through the cvs on the sax.sf.net site for changes to documentation. There were no changes regarding for the issues we discussed (as near as I can tell) since last April. So we have been working with the latest documentation base... Cheers, Jeff |
From: Karl W. <ka...@wa...> - 2005-01-17 01:51:13
|
Jeff Rafter wrote: >> D) Requiring EndDocument > > >> However, this can be guaranteed only for in-process call-backs. > > > Can you explain this a little more? Is this the point about non SAX > generated Exceptions (i.e., user throws an exception in a callback)? > Otherwise, agreed. What I mean is that in a scenario like DCOM (or some other RPC mechanism) one cannot guarantee that the call reaches the target, since RPC mechanisms may have communication failures. You may guarantee that the call is made (i.e. that it originates), but not that it arrives. So in such a situation the callee cannot rely on the EndDocument() callback. However, I think I mentioned that RPC is not a common SAX use case. >> F) Merging the IXmlReaderControl with IXmlReader > > > I agree with Elliotte's opposition here. But at the time we were talking > about adding a lot more exceptional circumstances. Since that time, we > have scaled back that approach quite a bit. So I think that we can merge > still. We also wanted to have an accompanying mechanism to check support > without raising the exception (a feature?). But I think this is > manageable. So I think the 2:1 is accurate. A feature called reader-control has been added. It is obviously a read-only feature. > >> G) SAX holes >> >> - Max string length: String.Length is defined as int, >> not much we can do here. > > > The only thing we could do is come up with some mechanism for (a) > passing the stream info in a callback, (b) allowing subsequent calls on > all callbacks for the items to be aggregated (c) raise an exception in > the event that something exceeds the limit. I am not sure I understand you fully. >> I) Re-define what "parsing" means in SAX. >> allow for "optional" argument or return values. > > > Agreed, I will work on the new wording. Thanks! > > So based on that we have very little left undecided... just a little > more work to do... Very good indeed! Karl |
From: Karl W. <ka...@wa...> - 2005-01-14 17:11:49
|
Another review of the current status for the various threads: (only 3 participants so far) When I mark an item/solution as accepted, please speak up if you have objections. A) "Unifying" the core and extension interfaces: No opposition, seems accepted. B) Changing Exceptions in the API SaxNotSupportedException and SaxNotRecognizedException have been removed in favour of the built-in .NET exceptions ArgumentException and NotSupportedException. I assume this is accepted. C) Reducing the versioning / eliminating the Ixxx2 interfaces When unifying IAttributes and IAttributes2, the IsDeclared() methods have been removed in favour of an additional return value for GetType(). This value is UNDECLARED. Also, GetType() differentiates between NMTOKEN and ENUMERATION, adding the latter as another new return value . I assume this is accepted as well. D) Requiring EndDocument For the purpose of enabling content handler chains (filter pipelines), we should require that EndDocument() is always called when StartDocument() was called. However, this can be guaranteed only for in-process call-backs. Documentation has been updated, and I assume this is accepted. E) StartElement, when URI is not present We decided to follow what the standard NET libraries expect as an input for an URI, if the URI is supposed to be absent. It was found that many of them expect an empty string and not null. So, the solution proposed is to follow the Java API and require that a namespace URI has the value "" when a qualified name is not in any namespace. This applies to IAttributes.GetUri() and the URI arguments passed to Start/EndElement(). As a similar consequence we require that the prefix argument passed to Start/EndPrefixMapping is "" for the default namespace. For the rest of the API, absence of a string parameter will still be indicated through a null reference. This is especially necessary for IDeclhandler.AttributeDecl(), where the value parameter has a defined meaning for "". Assuming acceptance. F) Merging the IXmlReaderControl with IXmlReader Some opposition from Elliotte, that is, 2:1 in favour of adding Suspend/Resume/Abort() to IXmlReader. I propose we go with the merge and make these methods optional (allowed to throw NotSupportedException). However, the IXmlReader.Status property will be non-optional, as it is easy to implement. Still open, but I hope acceptance is not far. G) SAX holes - Max string length: String.Length is defined as int, not much we can do here. - Skipped entities in attribute values: possibility of marking spot in attribute value (at user option). I suggest we do nothing for this release, as the API changes might be invasive. Assuming acceptance. H) Replace is-standalone feature with ILocator member The is-standalone feature was removed and a new property was added to ILocator called EntityType. It is an enumeration of this type: public enum ParsedEntityType { /// <summary>Document entity without specified value for the standalone flag.</summary> Document, /// <summary>Document entity with standalone="no".</summary> NotStandalone, /// <summary>Document entity with standalone="yes".</summary> Standalone, /// <summary>External general entity.</summary> General, /// <summary>External parameter entity.</summary> Parameter } No discussion yet (but off-list agreement between Jeff and Karl). We might want to add a value of "Other" or "Unknown" for those cases where we are not actually parsing an XML document. I) Re-define what "parsing" means in SAX. This is a new issue that came up while discussing the fact that certain API members and/or names only make sense when parsing an actual XML document (ParseError, SaxParseException, IXmlReader.Parse(), ILocator.PublicId, ...). This puts unnecessary limitations on SAX event generation not based on parsing a document. We have two choices, rename and re-design part of the SAX API, or re-define the meaning of parsing. Luckily, it seems that those API members that force a document interpretation allow for "optional" argument or return values. So, as a starting point this was proposed: "For the purpose of the SAX API we define "parsing" as generating a sequence of call-backs on SAX handler interfaces that represent a well-formed XML document. This applies even if no actual XML document is being processed." Jeff thinks it needs some refinement, and I asked him to refine ... I also think we should add this: Whenever the term "document" is used in SAX, it should be interpreted more generally as "source of well-formed SAX events". This would for instance make it unnecessary to add a new member (see above) to ParsedEntityType. SUMMARY So, we have F, H and I as open issues, but I think we are getting close. If anyone sees new issues, please come forward. Karl |
From: Karl W. <ka...@wa...> - 2005-01-13 22:55:07
|
Jeff Rafter wrote: >> One of the things Elliotte mentioned was passing the URI to >> other libraries, instead of checking it. That is what D.Megginson >> probably >> meant with convenience. So, which libs in .NET would one use? >> And what do they accept? It would be funny if after all the >> discussion they accepted null for an URI. I just checked a few: Uri constructor accepts an empty relative Uri, but not null. XmlQualifiedName has an empty string for "no namespace". So I guess its settled, prefix and Uri are emtpy strings in StartElementHandler and IAttributes when no namespace is present. Incorrect, but convenient. > Elliotte also mentioned that he didn't like to include if (uri == null) > everywhere-- and that knowing the uri would never be null saved him from > the clutter. Yes, but that only applies if he can just pass it on and let someone else make the check. If some of *his* code needs to be executed depending on whether the name is in a namespace or not, then he needs to check anyway, and it makes no difference whether you check against null or "". Just imagine if all those framework classes above would accept null instead. Our decision would certainly be different. >> I am hesitant to dictate my opinion here. In the end, anything is >> workable. >> Unfortunately not too much feedback. > > > So you are a benevolent dictator... I like that... > >> If we go with String.Empty for URIs and prefixes, then I would suggest >> they should be the only such case, as then we would be pretty much in >> agreement >> with the original Java API, and the work effort to change all other such >> API cases and their implementations could be quite inconvenient. > > > 100% agree... OK. > >>> In any event-- we need to decide, we need to document the decision >>> very clearly and we need to make sure our SAX conformance application >>> checks for all of the appropriate cases. Which may involve adding a >>> secondary test suite. >> >> >> Yes, definitely. > > > As a side note-- the Java SAX Conformance suite relies on the fact that > the URI will not be null (as a side affect). Anything that does not do > so is not "SAX Conformant" according to Elliotte's suite. Now, I think > that is wrong because it is not legislated-- but every parser he tested > is conformant on that point... meaning they all pass string.empty and > not null. > > =============== > The XmlReader.NamespaceURI has this: > > Property Value > The namespace URI of the current node; otherwise an empty string. > > Remarks > This property is relevant to Element and Attribute nodes only. > > In XmlDocument.CreateElement: > > namespaceURI > The namespace URI of the new element (if any). String.Empty and a > null reference (Nothing in Visual Basic) are equivalent. > > In XmlElement.NamespaceURI: > > The namespace URI of this node. If there is no namespace URI, this > property returns String.Empty. > > And finally, XmlNamespaceManager.AddNamespace Method throws an > ArgumentNullException in the case: > > The value for prefix or uri is a null reference (Nothing in Visual > Basic). > =========== > > This all seems pretty compelling... :) Yes, as I said - convenience interacting with the libraries. It would be interesting if these are new libs adjusting to a precedent set by SAX originally? Karl |
From: Jeff R. <li...@je...> - 2005-01-13 21:01:49
|
> One of the things Elliotte mentioned was passing the URI to > other libraries, instead of checking it. That is what D.Megginson probably > meant with convenience. So, which libs in .NET would one use? > And what do they accept? It would be funny if after all the > discussion they accepted null for an URI. Elliotte also mentioned that he didn't like to include if (uri == null) everywhere-- and that knowing the uri would never be null saved him from the clutter. > I am hesitant to dictate my opinion here. In the end, anything is workable. > Unfortunately not too much feedback. So you are a benevolent dictator... I like that... > If we go with String.Empty for URIs and prefixes, then I would suggest > they should be the only such case, as then we would be pretty much in > agreement > with the original Java API, and the work effort to change all other such > API cases and their implementations could be quite inconvenient. 100% agree... >> In any event-- we need to decide, we need to document the decision >> very clearly and we need to make sure our SAX conformance application >> checks for all of the appropriate cases. Which may involve adding a >> secondary test suite. > > Yes, definitely. As a side note-- the Java SAX Conformance suite relies on the fact that the URI will not be null (as a side affect). Anything that does not do so is not "SAX Conformant" according to Elliotte's suite. Now, I think that is wrong because it is not legislated-- but every parser he tested is conformant on that point... meaning they all pass string.empty and not null. =============== The XmlReader.NamespaceURI has this: Property Value The namespace URI of the current node; otherwise an empty string. Remarks This property is relevant to Element and Attribute nodes only. In XmlDocument.CreateElement: namespaceURI The namespace URI of the new element (if any). String.Empty and a null reference (Nothing in Visual Basic) are equivalent. In XmlElement.NamespaceURI: The namespace URI of this node. If there is no namespace URI, this property returns String.Empty. And finally, XmlNamespaceManager.AddNamespace Method throws an ArgumentNullException in the case: The value for prefix or uri is a null reference (Nothing in Visual Basic). =========== This all seems pretty compelling... :) Jeff |
From: Karl W. <ka...@wa...> - 2005-01-13 20:38:36
|
Jeff Rafter wrote: >> So, I simply don't see a contradiction. > > > It seems that we are the only ones arguing about this. Elliotte seemed > to be in favor of string.empty as well and his last email on the subject > was very strong-- but he also added the caveat that he is a Java guy for > the sake of this discussion. When I asked David Meggisonso he said the orginal reason was for programming convenience. > So if we took a vote: > > ====================================== > 2 use string.empty for the URI param > when xmlns="" or xmlns is not present. > > 1 use null for the above > ====================================== > > This would indicate that we would either need to change such params in > other callbacks or live with the inconsistency. I guess we already have one inconsistency at our hands that we cannot bypass: In the AttributeDecl() call-back, there are defined meanings for value=="" and value==null, so we must allow both. One of the things Elliotte mentioned was passing the URI to other libraries, instead of checking it. That is what D.Megginson probably meant with convenience. So, which libs in .NET would one use? And what do they accept? It would be funny if after all the discussion they accepted null for an URI. > Now of course, you are project admin and SAX is historically a > dictatorship-- as the only other implementer I can tell you that I will > implement it however you decide. I am hesitant to dictate my opinion here. In the end, anything is workable. Unfortunately not too much feedback. If we go with String.Empty for URIs and prefixes, then I would suggest they should be the only such case, as then we would be pretty much in agreement with the original Java API, and the work effort to change all other such API cases and their implementations could be quite inconvenient. > In any event-- we need to decide, we need to document the decision very > clearly and we need to make sure our SAX conformance application checks > for all of the appropriate cases. Which may involve adding a secondary > test suite. Yes, definitely. Karl |
From: Karl W. <ka...@wa...> - 2005-01-13 20:34:35
|
Jeff Rafter wrote: >> For the purpose of the SAX API we define "parsing" as >> generating a sequence of call-backs on SAX handler interfaces >> that represent a well-formed XML document. This applies even >> if no actual XML document is being processed. > > > I can agree with that-- it would be good to couch it in such wording > though with the statement that we know what SAX Processing is and the > difference between an XML Processor and Application [1] proper are... > also, it should be noted (maybe in an example) that in the case of > something like a CSV parser that generates SAX events line number and > column number are still useful concepts. When not useful they should be > -1 and the entity should be (dare I say it?) null... > > [1] http://www.w3.org/TR/REC-xml/#dt-xml-proc Would you mind making such additions/corrections? You already seem to know what should be added. I can do the CVS stuff, to save you time, as there are already lots of changes in CVS that you would have to check out first. :-) Karl |
From: Jeff R. <li...@je...> - 2005-01-13 20:20:29
|
> For the purpose of the SAX API we define "parsing" as > generating a sequence of call-backs on SAX handler interfaces > that represent a well-formed XML document. This applies even > if no actual XML document is being processed. I can agree with that-- it would be good to couch it in such wording though with the statement that we know what SAX Processing is and the difference between an XML Processor and Application [1] proper are... also, it should be noted (maybe in an example) that in the case of something like a CSV parser that generates SAX events line number and column number are still useful concepts. When not useful they should be -1 and the entity should be (dare I say it?) null... [1] http://www.w3.org/TR/REC-xml/#dt-xml-proc Otherwsie, sounds good. Jeff |
From: Karl W. <ka...@wa...> - 2005-01-13 20:13:22
|
Jeff Rafter wrote: >> I must have missed this. Could you explain it again? > > > Something like this could be placed in your object heirarchy. One could > also very easily create a tee-like observer pattern and some > ExceptionHandler class. This could be designed into a subclass > ContentHandler and used from within callbacks > > public FooContentHandler : ContentHandler { > > public ExceptionHandler exceptionHandler; > > public void startElement(...) { > // some code happens, we need to throw an exception > exceptionHandler.Handle(new FooException()); > } > } > > public ExceptionHandler { > > public void Handle(Exception e) > throws SAXException { > if (....) { > > } else > throw new SAXException(e); > } > } Yes, this is definitely possible. Although I like the name of ErrorHandler better, since passing error information does not have to be based on using Exception objects. This was already discussed (sort of) on xml-dev, but I really would prefer to use the existing API. The API above would be similar to any (proprietary) way of exchanging information between those content handler that you have control over, and your application. The question is, do we need to standardize on this in SAX? As far as all the other interfaces are concerned, they form a contract for an IXmlReader implementation, so standardizing them is good. But I don't think we should make too many restrictions on how content handler implementations and the application communicate. This could not even be tested for conformance. What about a validating IXmlFilter implementation? Well, it must conform to the IXmlreader contract, and therefore should call back on IErrorHandler. > Now these are just some random ideas thrown out after a long night in > the rain so I could be way off. Also, in the back of my mind I am > wondering about how Java handles that exception class if not actually > thrown. I guess the GC disposes of them. > On top of that I saw in some of your examples that you handled a > FileNotFound exception in a slightly different way (it seems that IO > exceptions would need a slightly different pattern anyway because they > are not treated strictly as SAXParseExceptions to begin with). But all > of leads me to think that you can have your cake and eat it too... > >> It would be a simple documentation change. > > > I like simple... What do you think of this: For the purpose of the SAX API we define "parsing" as generating a sequence of call-backs on SAX handler interfaces that represent a well-formed XML document. This applies even if no actual XML document is being processed. Karl |
From: Jeff R. <li...@je...> - 2005-01-13 20:10:24
|
> So, I simply don't see a contradiction. It seems that we are the only ones arguing about this. Elliotte seemed to be in favor of string.empty as well and his last email on the subject was very strong-- but he also added the caveat that he is a Java guy for the sake of this discussion. So if we took a vote: ====================================== 2 use string.empty for the URI param when xmlns="" or xmlns is not present. 1 use null for the above ====================================== This would indicate that we would either need to change such params in other callbacks or live with the inconsistency. Now of course, you are project admin and SAX is historically a dictatorship-- as the only other implementer I can tell you that I will implement it however you decide. In any event-- we need to decide, we need to document the decision very clearly and we need to make sure our SAX conformance application checks for all of the appropriate cases. Which may involve adding a secondary test suite. Cheers, Jeff |
From: Karl W. <ka...@wa...> - 2005-01-13 19:37:06
|
Jeff Rafter wrote: >> We really only have two cases: foo has a namespace, or it doesn't. >> >> Or did you mean something else? > > > <snip/> > >>> <foo bar=""> >>> Would the bar attribute's value be null or string.empty? >> >> String.Empty. null would mean: no value. >> > > This is the contradiction I am referring to... technically the value of > bar is emtpy which is more or less null. But we make the distinction > because it is helpful to know that even though there is no value, the > attribute bar is present. I am not sure if so documented, but if an attribute is passed through SAX (IAttributes), it can never have a null value argument, simply because an attribute without value is nonsense. null means "no value" (see below). > This is the same as the xmlns="" declarations. > Technically the value means null-- literally it is empty-- and it is > helpful to be able to distinguish between the two (at least in editor > applications)... I don't quite agree - null means "no value". there is no null value for strings. null is a value for references/pointers. The string arguments in SAX are not strings, but string references, and as such they can be null, meaning they do not point to any string object. This is also one of the reasons why namespace URI references are not allowed to be empty strings in XML, even though URI references in general are allowed to, and do have a specific meaning assigned to an empty string. Otherwise one could not express "absence" or "removal" of namespaces in the serialized format, because XML is a text format - everything must be expressed as text. xmlns="" can be interpreted on two levels: 1) an attribute with name xmlns and value "". 2) an expression of the fact that the default namespace is turned off. The concept of null comes into play at a later stage, and that is when the parser wants to pass a name to the app. How does it express that this name has/does not have a namespace? If the uri argument, which is a string reference, points to nothing (==null) then we don't have an uri string object, and therefore no namespace. So, I simply don't see a contradiction. Karl |
From: Jeff R. <li...@je...> - 2005-01-13 18:54:58
|
> We really only have two cases: foo has a namespace, or it doesn't. > > Or did you mean something else? <snip/> >> <foo bar=""> >> Would the bar attribute's value be null or string.empty? > String.Empty. null would mean: no value. > This is the contradiction I am referring to... technically the value of bar is emtpy which is more or less null. But we make the distinction because it is helpful to know that even though there is no value, the attribute bar is present. This is the same as the xmlns="" declarations. Technically the value means null-- literally it is empty-- and it is helpful to be able to distinguish between the two (at least in editor applications)... Jeff |
From: Jeff R. <li...@je...> - 2005-01-13 18:48:23
|
> I must have missed this. Could you explain it again? Something like this could be placed in your object heirarchy. One could also very easily create a tee-like observer pattern and some ExceptionHandler class. This could be designed into a subclass ContentHandler and used from within callbacks public FooContentHandler : ContentHandler { public ExceptionHandler exceptionHandler; public void startElement(...) { // some code happens, we need to throw an exception exceptionHandler.Handle(new FooException()); } } public ExceptionHandler { public void Handle(Exception e) throws SAXException { if (....) { } else throw new SAXException(e); } } Now these are just some random ideas thrown out after a long night in the rain so I could be way off. Also, in the back of my mind I am wondering about how Java handles that exception class if not actually thrown. On top of that I saw in some of your examples that you handled a FileNotFound exception in a slightly different way (it seems that IO exceptions would need a slightly different pattern anyway because they are not treated strictly as SAXParseExceptions to begin with). But all of leads me to think that you can have your cake and eat it too... > It would be a simple documentation change. I like simple... Jeff |
From: Jeff R. <li...@je...> - 2005-01-13 18:36:13
|
test2 |
From: Karl W. <ka...@wa...> - 2005-01-13 18:18:03
|
Jeff Rafter wrote: > Karl, > > What did you think about my idea for a supplemental Exception handler > interface? This could be grafted on without much change to the > fundamentals of the API. I must have missed this. Could you explain it again? > Otherwise, might I suggest making ParseError > descend from SAXError and making SAXError more generic. Then you could > simply use "is" or a cast to get the more complete parse error > information when applicable. If you look at SAX more closely, a lot of stuff that now is based on the assumption of "parsing an XML document" could be re-defined more generically in terms of a well-formed event sequence. That could be a lot of work. For instance: ParseError.Throw() has this code: throw new SaxParseException(this). So, if we change to SAXError, we should first make "Throw()" a virtual method and add it to SAXError where it throws a SaxExcpetion. Then we need to add a SAXError member to SaxException and and a corresponding constructor. We have to remove it from SAXParseException (as it already exists in the base class) and change its constructor. Then we need to override ParseError.Throw(). Then we have to modify the ParseErrorImpl class accordingly. And then we still haven't covered all areas where "parsing" should be generalized. Yes, I could introduce a SAXError class, but what if we just document that "parsing" really does not mean that there has to be an underlying XML document? Any well-formed event stream is "parsing". Then we could leave ParseError and even SaxParseException as is. The Locator related fields in ParseError (and SaxParseException) are alread optional (they can return null or -1), even in Java. It would be a simple documentation change. Karl |
From: Karl W. <ka...@wa...> - 2005-01-13 18:09:08
|
Jeff Rafter wrote: >> One could think of this as a guideline: >> >> - If we would say: this string parameter/argument/value >> can be absent, then let's use null to indicate it. >> - If we would rather say: this string parameter/argument/value >> can be empty, then let's use "" to indicate it. >> >> Coming back to the SAX API: >> How would the above guideline be resolved for namespace URIs >> and prefixes when an XML name is not in any namespace? > > > I think that these are good guidelines-- and tough to argue with... but > for namespace URIs I think there is some ambiguity still... > > a) <foo/> > b) <foo xmlns="http://foo"/> > c) <foo xmlns=""/> > > Most naturally I would see this as > > a) null > b) "http://foo" > c) string.empty Are you asking what to pass for the namespace uri of foo in the StartElementHandler()? a) don't know, depends if there is a default namespace b) "http://foo" c) null (there is no namespace for foo) We really only have two cases: foo has a namespace, or it doesn't. Or did you mean something else? > But I could see this making for needlessly complex handler code. Which > is why we want to land on either null or string.empty. string.empty > gives you less chance of a runtime exception but does not represent case > (a) very well. Using null does not represent (c) very well. I think that > in the XML Corpus this is one of the few areas where "" has a specific > meaning. You brought up arguments about API consistency and should we > use string.empty if no Public ID is provided (for instance)... but what > about the case where you have: > > <foo bar=""> > > Would the bar attribute's value be null or string.empty? String.Empty. null would mean: no value. Karl |
From: Karl W. <ka...@wa...> - 2005-01-11 01:03:31
|
Trying to get the empty string vs. null discussion restarted. Karl Waclawek wrote: > > I had another look at the Java API. > It seems it is quite inconsistent with respect to empty string vs. null. > > Examples: > > - In EntityResolver, all occurrences of publicId or baseUri > are supposed to be null, when absent. > > - DTDHandler.notationDecl(): publicId, systemId can be null, > if not provided > > - DTDHandler.unparsedEntityDecl: publicId can be null if not provided > > - DeclHandler.attributeDecl(): mode and value can be null, where > in the latter there is actually a semantic difference between > value = null (meaning: none defined) and value = empty string > (meaning: a value is specified, and it is the empty string). > > - DeclHandler.externalEntityDecl: publicId can be null, if not provided > > - LexicalHandler.startDTD(): publicId, systemId can be null, > if not declared > > It actually seems that the prevalent approach is to use null for absence > of a string parameter, and only in the case of namespaces does the API > stray from this rule. > > However, I would strongly suggest that we remain consistent in > SAX for .NET. If we pick String.Empty, then we need to allow one > inconsistency - and that is for the Value parameter passed to the > attributeDecl() call-back. Originally, this thread was about namespace URI references and prefixes. The question was whether they should be passed as empty strings or as null references when absent/not applicable. However, whatever the outcome of this discussion, it should be applicable to the whole API, but not as a general rule of either always null or always empty. The reason is that in the case of DeclHandler.AttributeDecl(), the "value" parameter has well defined meanings for both, empty string and null, so both must be allowed depending on the intended semantics. Maybe we should first ask, when such a problem of deciding between an empty string or null actually exists? I would say that is the case when the meaning of both is "roughly" the same: an undefined or absent string value. Two examples where I would intuitively make different decisions: 1) ParseError.Message, or SAXParseException.Message: In this case I would always assume that there is a message, even if it is an empty string. It looks wrong to me to even allow a null value. 2) Locator.PublicId: If there is no public identifier, then one should make its absence clear. null is better at that. One could think of this as a guideline: - If we would say: this string parameter/argument/value can be absent, then let's use null to indicate it. - If we would rather say: this string parameter/argument/value can be empty, then let's use "" to indicate it. Coming back to the SAX API: How would the above guideline be resolved for namespace URIs and prefixes when an XML name is not in any namespace? Karl |
From: Karl W. <ka...@wa...> - 2005-01-11 00:30:58
|
Reply to self: Karl Waclawek wrote: > On xml-dev there is a thread called "[xml-dev] SAXException, checked, buy why?". > The problem that came up is what to do when trying to pass recoverable > errors to the application? Alan Gutierrez thinks one is not allowed to use > error handler call-backs, but I think that maybe this is just a result > of underspecification. In any case, the objects passed to the error > call-backs are SAXParseException objects in Java, and ParseError objects > in SAX for .NET. > > I am thinking we should rename the class from ParseError to SAXError, > as errors from application code (as you might have in your content handlers) > are not really parse errors, and you should still be allowed to pass > them to the error handlers, to avoid throwing an exception. > > However, all members of ParseError (or SAXParseException ) are geared towards > errors that can occur when parsing an XML document (publicId, SystemId, > Line/ColumnNumber, etc.). There is also the problem that SaxParseException has ParseError as member. So, renaming ParseError gives cascading problems. > So, what to do? > Should we stick with the limitation that only parse errors can > be reported to the error handlers? Is that not rather limiting? Maybe the right approach is to simply docoument a broader definition of "parsing". One could say that in the context of SAX, "parsing" denotes any form of generating a stream of ContentHandler/LexicalHandler events that correspond to a well-formed XML document. With that definition, a "parse" error could also be an error in the underlying event generation when it is not XML document based. So, with giving "parsing" a broader definition we also give ParseError a broader meaning. Karl |
From: Karl W. <ka...@wa...> - 2005-01-03 18:35:01
|
On xml-dev there is a thread called "[xml-dev] SAXException, checked, buy why?". The problem that came up is what to do when trying to pass recoverable errors to the application? Alan Gutierrez thinks one is not allowed to use error handler call-backs, but I think that maybe this is just a result of underspecification. In any case, the objects passed to the error call-backs are SAXParseException objects in Java, and ParseError objects in SAX for .NET. I am thinking we should rename the class from ParseError to SAXError, as errors from application code (as you might have in your content handlers) are not really parse errors, and you should still be allowed to pass them to the error handlers, to avoid throwing an exception. However, all members of ParseError (or SAXParseException ) are geared towards errors that can occur when parsing an XML document (publicId, SystemId, Line/ColumnNumber, etc.). So, what to do? Should we stick with the limitation that only parse errors can be reported to the error handlers? Is that not rather limiting? Karl |
From: Karl W. <ka...@wa...> - 2004-12-20 19:27:30
|
Jeff Rafter wrote: >> In C# this seems less of an issue, as there are two >> other null-safe options: >> 1) static method: if (String.Equals(uri, "http://my.namespace.com")) >> {...} >> 2) operator: if (uri == "http://my.namespace.com") {...} > > > Admittedly this is pretty standard in C#, however it is plausible that > some Java coder will come to C# with limited training and just hack > away-- in which case they may not follow this pattern (or if someone is > porting a Java library for instance...) > > In any event it can go either way and I will code to it... but I believe > that it should be very very clearly documented if we do not go with > string.empty. If I get a vote I still vote for string.empty-- though > your arguments have made me less zealous. Would you vote for String.Empty in all cases, or just for namespace URIs and prefixes? The former would make us even less conformant with the Java specs. > OFFTOPIC: sorry I have been out of the discussion for the past two > weeks-- I have been doing a lot of unexpected business travel... You may have a few posts to reply to... :-) Karl |
From: Karl W. <ka...@wa...> - 2004-12-20 19:09:54
|
Elliotte Harold wrote: > Folks, > > I just got an e-mail from David Megginson informing me that the JavaDocs > and some other docs at sax.sourceforge.net have been updated. > Apparently, he did not have access to the web site for some time, and > the documentation there was not up to date with the latest round of > revisions for SAX 2.0.2 that went on some months back for Java 1.5. > > Anyway, this probably fixes at least some but not all of the > inconsistencies that have been noted here about what's null and what's > the empty string. I haven't checked in detail yet, but it's worth double > checking all of our assumptions and comments over the last month or so > against the latest docs. I asked Dave Megginson and he does not remember why exactly namespace URI string and prefixes are treated differently, but he thinks it might have been programming convenience. One xml-dev reference I found, was by Tim Bray, arguing that since namespace URI references are not allowed to be empty strings (as per the namespace specs), passing an empty string should indicate an absent URI ref and then one can use the Equals method on it without worrying about null. (RFC 2396 gives a different meaning to empty URI references). In C# this seems less of an issue, as there are two other null-safe options: 1) static method: if (String.Equals(uri, "http://my.namespace.com")) {...} 2) operator: if (uri == "http://my.namespace.com") {...} Of which the latter would be the standard way to compare strings. Karl |
From: Elliotte H. <el...@me...> - 2004-12-20 16:08:53
|
Folks, I just got an e-mail from David Megginson informing me that the JavaDocs and some other docs at sax.sourceforge.net have been updated. Apparently, he did not have access to the web site for some time, and the documentation there was not up to date with the latest round of revisions for SAX 2.0.2 that went on some months back for Java 1.5. Anyway, this probably fixes at least some but not all of the inconsistencies that have been noted here about what's null and what's the empty string. I haven't checked in detail yet, but it's worth double checking all of our assumptions and comments over the last month or so against the latest docs. -- Elliotte Rusty Harold el...@me... XML in a Nutshell 3rd Edition Just Published! http://www.cafeconleche.org/books/xian3/ http://www.amazon.com/exec/obidos/ISBN=0596007647/cafeaulaitA/ref=nosim |
From: Karl W. <ka...@wa...> - 2004-12-18 20:54:16
|
Jeff Rafter wrote: > This is actually something that is already in place and only affects > implementations, but we need to clarify when we require EndDocument to > be called. > > This is the proposal: > > (a) EndDocument is required if StartDocument is called > (b) Even when there is a FatalError > (c) In the event of an encoding error, StartDocument should have been > called so EndDocument still needs to be called. > (d) If there is an external exception (such as an IOExcpetion or a user > generated SaxExcpetion in a callback) EndDocument is still required. > (e) EndDocument should be the final event in a SAX event stream. > > Comments welcome, Just a few general thoughts about his issue: A) When is it really important for the SAX consumer to be able to rely on EndDocument() being called after StartDocument() has been called? Clearly this is the case when the link between the SAX event generator and the event consumer is restricted to the IContentHandler interface. That is, the event generator has no access to, or control over, the consumer other than through the IContentHandler call-backs. This excludes the most common situation of calling a Parse() method on an XML parser, as returning from the Parse() call (with or without exception) is the equivalent of EndDocument(). The Parse() call is the additional link. It also excludes IXmlFilter chains, as with them the process is driven from the end of the chain, again calling a Parse() method. Where it applies would be a chain (pipeline) of ContentHandler instances driven from the event generator, especially when they are dynamically assembled. B) Is it actually possible for the SAX event generator to guarantee to the consumer that it will receive the EndDocument() event? The general answer would be no. The reason is that the connection between event generator and consumer can cross process boundaries. Especially remote call-backs simply cannot guarantee that EndDocument() will be called on the consumer, but even inter-process calls on the same machine are not fail-safe. That makes it impossible to fulfill the EndDocument() contract on anything but in-process call-backs (as long as the hardware has no problem, of course). In practice, non-local SAX call-backs will rarely be used, but it is not a completely unrealistic use case either. C) So how would one deal with a scenario where the final call to EndDocument() is not guaranteed? The cleanup routines normally called from EndDocument() should also be callable from Finalize(), or any other "Reset()" kind of method. So, once the consumer gets notified of the end of the event stream (successful or not), the cleanup should proceed. - For in-process consumers (.NET), driving the event generation: On return from Parse(), call EndDocument() (or the equivalent cleanup code) if it has not been called yet. - For in-process consumers (.NET) not driving the event generation (downstream modules in a content handler chain): This depends if the consumer has knowledge of how it is going to be called. If it knows it will never be called across process boundaries, then it should assume that EndDocument() will be called. Otherwise it is the same case as for out-of-process consumers. See below. - For out-of-process consumers: The "end-of-document" cleanup code should be put into a separate method callable from several points - from EndDocument() and from Finalize(), or any other "Reset()" kind of method. It is the responsibility of whatever controller/container operates the consumer to make sure it gets notified of any communication errors. The SAX event generator cannot make such guarantees. SUMMARY It seems one has to separate the IXmlReader.Parse() method (and its equivalents) from the contract defined for IContentHandler. Exceptions thrown during Parse() may or may not be detectable by the event consumer and therefore should not have an influence on the contract. Even if one requires that the SAX event generator must always make a call to IContentHandler.EndDocument() (when StartDocument() was called), one cannot guarantee that the SAX consumer will also receive that event. This means, the SAX consumer has to be aware of how it communicates with the SAX generator. So let's then phrase the requirements again: =========== For a stream of SAX events that represent an XML document, the SAX event producer must call IContentHandler.StartDocument() exactly once, *before* any part of the input, on which the SAX events are based, is processed. IContentHandler.EndDocument() *must* be called by the SAX event producer exactly once as the last event in a SAX event stream initiated by a IContentHandler.StartDocument() call, regardless of any exceptional or error situation encountered. Depending on the call communication mechanism, however, this is no guarantee that the SAX event consumer will also receive that call. =========== Note: It seems to me this covers Jeff's requirements a) to e) above and makes it clear that this not only applies to the standard configuration of calling IXmlReader.Parse() on a SAX parser - for which these requirements would not strictly be necessary - but more generally to any sequence of SAX events that represent an XML document. Karl |