Thread: [saxdotnet-devel] Discussion Status 2
Brought to you by:
jeffrafter,
kwaclaw
From: Karl W. <ka...@wa...> - 2005-01-14 17:11:49
|
Another review of the current status for the various threads: (only 3 participants so far) When I mark an item/solution as accepted, please speak up if you have objections. A) "Unifying" the core and extension interfaces: No opposition, seems accepted. B) Changing Exceptions in the API SaxNotSupportedException and SaxNotRecognizedException have been removed in favour of the built-in .NET exceptions ArgumentException and NotSupportedException. I assume this is accepted. C) Reducing the versioning / eliminating the Ixxx2 interfaces When unifying IAttributes and IAttributes2, the IsDeclared() methods have been removed in favour of an additional return value for GetType(). This value is UNDECLARED. Also, GetType() differentiates between NMTOKEN and ENUMERATION, adding the latter as another new return value . I assume this is accepted as well. D) Requiring EndDocument For the purpose of enabling content handler chains (filter pipelines), we should require that EndDocument() is always called when StartDocument() was called. However, this can be guaranteed only for in-process call-backs. Documentation has been updated, and I assume this is accepted. E) StartElement, when URI is not present We decided to follow what the standard NET libraries expect as an input for an URI, if the URI is supposed to be absent. It was found that many of them expect an empty string and not null. So, the solution proposed is to follow the Java API and require that a namespace URI has the value "" when a qualified name is not in any namespace. This applies to IAttributes.GetUri() and the URI arguments passed to Start/EndElement(). As a similar consequence we require that the prefix argument passed to Start/EndPrefixMapping is "" for the default namespace. For the rest of the API, absence of a string parameter will still be indicated through a null reference. This is especially necessary for IDeclhandler.AttributeDecl(), where the value parameter has a defined meaning for "". Assuming acceptance. F) Merging the IXmlReaderControl with IXmlReader Some opposition from Elliotte, that is, 2:1 in favour of adding Suspend/Resume/Abort() to IXmlReader. I propose we go with the merge and make these methods optional (allowed to throw NotSupportedException). However, the IXmlReader.Status property will be non-optional, as it is easy to implement. Still open, but I hope acceptance is not far. G) SAX holes - Max string length: String.Length is defined as int, not much we can do here. - Skipped entities in attribute values: possibility of marking spot in attribute value (at user option). I suggest we do nothing for this release, as the API changes might be invasive. Assuming acceptance. H) Replace is-standalone feature with ILocator member The is-standalone feature was removed and a new property was added to ILocator called EntityType. It is an enumeration of this type: public enum ParsedEntityType { /// <summary>Document entity without specified value for the standalone flag.</summary> Document, /// <summary>Document entity with standalone="no".</summary> NotStandalone, /// <summary>Document entity with standalone="yes".</summary> Standalone, /// <summary>External general entity.</summary> General, /// <summary>External parameter entity.</summary> Parameter } No discussion yet (but off-list agreement between Jeff and Karl). We might want to add a value of "Other" or "Unknown" for those cases where we are not actually parsing an XML document. I) Re-define what "parsing" means in SAX. This is a new issue that came up while discussing the fact that certain API members and/or names only make sense when parsing an actual XML document (ParseError, SaxParseException, IXmlReader.Parse(), ILocator.PublicId, ...). This puts unnecessary limitations on SAX event generation not based on parsing a document. We have two choices, rename and re-design part of the SAX API, or re-define the meaning of parsing. Luckily, it seems that those API members that force a document interpretation allow for "optional" argument or return values. So, as a starting point this was proposed: "For the purpose of the SAX API we define "parsing" as generating a sequence of call-backs on SAX handler interfaces that represent a well-formed XML document. This applies even if no actual XML document is being processed." Jeff thinks it needs some refinement, and I asked him to refine ... I also think we should add this: Whenever the term "document" is used in SAX, it should be interpreted more generally as "source of well-formed SAX events". This would for instance make it unnecessary to add a new member (see above) to ParsedEntityType. SUMMARY So, we have F, H and I as open issues, but I think we are getting close. If anyone sees new issues, please come forward. Karl |
From: Karl W. <ka...@wa...> - 2005-01-17 01:51:13
|
Jeff Rafter wrote: >> D) Requiring EndDocument > > >> However, this can be guaranteed only for in-process call-backs. > > > Can you explain this a little more? Is this the point about non SAX > generated Exceptions (i.e., user throws an exception in a callback)? > Otherwise, agreed. What I mean is that in a scenario like DCOM (or some other RPC mechanism) one cannot guarantee that the call reaches the target, since RPC mechanisms may have communication failures. You may guarantee that the call is made (i.e. that it originates), but not that it arrives. So in such a situation the callee cannot rely on the EndDocument() callback. However, I think I mentioned that RPC is not a common SAX use case. >> F) Merging the IXmlReaderControl with IXmlReader > > > I agree with Elliotte's opposition here. But at the time we were talking > about adding a lot more exceptional circumstances. Since that time, we > have scaled back that approach quite a bit. So I think that we can merge > still. We also wanted to have an accompanying mechanism to check support > without raising the exception (a feature?). But I think this is > manageable. So I think the 2:1 is accurate. A feature called reader-control has been added. It is obviously a read-only feature. > >> G) SAX holes >> >> - Max string length: String.Length is defined as int, >> not much we can do here. > > > The only thing we could do is come up with some mechanism for (a) > passing the stream info in a callback, (b) allowing subsequent calls on > all callbacks for the items to be aggregated (c) raise an exception in > the event that something exceeds the limit. I am not sure I understand you fully. >> I) Re-define what "parsing" means in SAX. >> allow for "optional" argument or return values. > > > Agreed, I will work on the new wording. Thanks! > > So based on that we have very little left undecided... just a little > more work to do... Very good indeed! Karl |
From: Jeff R. <li...@je...> - 2005-01-17 17:23:30
|
> What I mean is that in a scenario like DCOM (or some other RPC mechanism) > one cannot guarantee that the call reaches the target, since RPC > mechanisms may have communication failures. You may guarantee that > the call is made (i.e. that it originates), but not that it arrives. > So in such a situation the callee cannot rely on the EndDocument() > callback. > However, I think I mentioned that RPC is not a common SAX use case. I agree, it is not common. I think that when we have that documentation around end document, if we include information about this, we need to explain it clearly (as you did above). Also, I think we need to be explicit about what happens in the case of a user generated exception in a callback (i.e., EndDocument is *still* called). > A feature called reader-control has been added. It is obviously a read-only > feature. Excellent. > I am not sure I understand you fully. Well, I was imagining the case where there was a two-gig element name, or two-gig (count) of attributes on an element. We cannot pass the information back clearly... maybe a specialized exception, or something similar to an API FatalError would be useful at that point... so that the cause is clearly identified. Again, this is not something for this release. I just feel as though there are things that we could do to fix this... > Very good indeed! Yes quite. I also took Elliotte's advice and looked through the cvs on the sax.sf.net site for changes to documentation. There were no changes regarding for the issues we discussed (as near as I can tell) since last April. So we have been working with the latest documentation base... Cheers, Jeff |
From: Karl W. <ka...@wa...> - 2005-01-17 18:07:50
|
Jeff Rafter wrote: >> However, I think I mentioned that RPC is not a common SAX use case. > > > I agree, it is not common. I think that when we have that documentation > around end document, if we include information about this, we need to > explain it clearly (as you did above). Also, I think we need to be > explicit about what happens in the case of a user generated exception in > a callback (i.e., EndDocument is *still* called). This is what I have in CVS currently as doc for EndDocument(): /// <summary>See <see href="http://www.saxproject.org/apidoc/org/xml/sax/ContentHandler.html#endDocument()"> /// ContentHandler.endDocument</see> on www.saxproject.org.</summary> /// <remarks>Differences to Java: /// <list type="bullet"> /// <item>Stricter about when to call: <c>EndDocument</c> <b>must</b> be called by the /// SAX event producer exactly once as the last event in a SAX event stream initiated /// by a <see cref="IContentHandler.StartDocument"/> call, regardless of any exceptional /// or error situation encountered. Depending on the call communication mechanism, however, /// this is no guarantee that the SAX event consumer will also receive that call.</item> /// </list></remarks> >> I am not sure I understand you fully. > > > Well, I was imagining the case where there was a two-gig element name, > or two-gig (count) of attributes on an element. We cannot pass the > information back clearly... maybe a specialized exception, or something > similar to an API FatalError would be useful at that point... so that > the cause is clearly identified. Again, this is not something for this > release. I just feel as though there are things that we could do to fix > this... Normally, overflow situations are already handled by the runtime system. What is it we should do over and above that? Karl |