saxdotnet-devel Mailing List for SAX for .NET (Page 2)
Brought to you by:
jeffrafter,
kwaclaw
You can subscribe to this list here.
2004 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
(45) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2005 |
Jan
(20) |
Feb
(1) |
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: Karl W. <ka...@wa...> - 2004-12-12 16:45:07
|
Karl Waclawek wrote: > > In any case, the fact that the ArgumentException in .NET has > a field for the parameter's name should be sufficient to make > that distinction. If it refers to the feature's name then we have > the equivalent of a SAXNotRecognizedException, if it refers to the > value, then this is equivalent to a SAXNotSupportedException. To summarize: how would this look then in the API: * bool IXmlReader.GetFeature(name) - ArgumentException(name) means: not recognized - NotSupportedException means: this feature cannot be read (right now) * void IXmlReader.SetFeature(name, value) - ArgumentException(name) means: not recognized - ArgumentException(value) means: value not supported (right now) - NotSupportedException mans: this feature cannot be set (right now) * IProperty IXmlReader.GetProperty(name) - ArgumentException(name) means: not recognized - NotSupportedException: this exception makes no sense here * object IProperty.Value { get; } - NotSupportedException means: the value cannot be read (right now); unlikely to happen * object IProperty.Value { set; } - ArgumentException(value) means: value not supported (right now) - NotSupportedException means: the value cannot be set (right now) If the above makes sense then we can rely on built-in exceptions for feature and property access. Karl |
From: Karl W. <ka...@wa...> - 2004-12-12 05:34:05
|
Elliotte Harold wrote: > I don't know what really makes sense exception wise in .NET. It's > certainly plausible that you don't want to follow Java here. However > there is an important difference between the SAXNotSupportedException > and SAXNotRecognizedException, even if I always have trouble remembering > which is which. Hoping I don't get these backwards (as I often do), > SAXNotRecognizedException means that the parser will never allow you to > set or read this feature/property. It's simply not available in this > implementation. SAXNotSupportedException means something a little > different. Either > > 1. You can't read/write the value of this property at the current time, > but you might be able to later. For instance, you can't turn on > validation when you're halfway through parsing a document. Only before > you begin parsing a document or after you finish parsing one. > > 2. The specific value you tried to set is not supported. For instance, > you passed an object that does not implement LexicalHandler as the value > for the lexical-handler property, or you passed true for the validation > feature with a parser that does not validate. > > By contrast, you might see a SAXNotRecognizedException if you tried to > use a Xerces specific feature/property with AElfred or vice versa. > > It's a subtle distinction, but I do think it's valuable to maintain it. I think I pretty much understood these the same way. The question is, is it enough to differentiate these two types of situations through the error message alone? Which would mean that the difference is only useful for informational purposes. Or would you want to react to this difference through different behaviour in the application? I have difficulty coming up with a plausible scenario for that. In any case, the fact that the ArgumentException in .NET has a field for the parameter's name should be sufficient to make that distinction. If it refers to the feature's name then we have the equivalent of a SAXNotRecognizedException, if it refers to the value, then this is equivalent to a SAXNotSupportedException. Karl |
From: Elliotte H. <el...@me...> - 2004-12-12 03:08:37
|
Karl Waclawek wrote: > > In the .NET line of thinking we should probably get rid of both, > the SAXNotSupportedException and SAXNotRecognizedException, > as in both cases, what you really have is an invalid argument > (the name of a property or feature), which therefore should > throw a standard ArgumentException with a message (or code) indicating > "not supported" or "not recognized". > How significant is the difference anyway? I don't know what really makes sense exception wise in .NET. It's certainly plausible that you don't want to follow Java here. However there is an important difference between the SAXNotSupportedException and SAXNotRecognizedException, even if I always have trouble remembering which is which. Hoping I don't get these backwards (as I often do), SAXNotRecognizedException means that the parser will never allow you to set or read this feature/property. It's simply not available in this implementation. SAXNotSupportedException means something a little different. Either 1. You can't read/write the value of this property at the current time, but you might be able to later. For instance, you can't turn on validation when you're halfway through parsing a document. Only before you begin parsing a document or after you finish parsing one. 2. The specific value you tried to set is not supported. For instance, you passed an object that does not implement LexicalHandler as the value for the lexical-handler property, or you passed true for the validation feature with a parser that does not validate. By contrast, you might see a SAXNotRecognizedException if you tried to use a Xerces specific feature/property with AElfred or vice versa. It's a subtle distinction, but I do think it's valuable to maintain it. -- Elliotte Rusty Harold el...@me... XML in a Nutshell 3rd Edition Just Published! http://www.cafeconleche.org/books/xian3/ http://www.amazon.com/exec/obidos/ISBN=0596007647/cafeaulaitA/ref=nosim |
From: Karl W. <ka...@wa...> - 2004-12-12 01:56:27
|
Karl Waclawek wrote: >>Currently in the .NET API there is redundancy wrt to exceptions: >> >>(1) SaxNotSupportedException and NotSupportedException >>(2) SaxArgumentException and ArgumentException >> >>Should these be removed in favor of the .NET equivalents? > > > I think there is actually no SAXArgumentException in the JAVA API. > So we can remove it without much discussion. > > Mostly, .NET approaches exception handling differently. > There are no checked exceptions, the "best practice" > is to use the built-in exceptions, and only define > your own if you cannot find an existing exception > class that has adequate semantics. > A good explanation (vs. checked exceptions) is in this article: > http://www.artima.com/intv/handcuffs.html > > > In our case here one can argue that the built-in > NotSupportedException is meant to be used when > a method is not implemented, whereas the SAXNotSupportedException > is meant to be used for unsupported features and properties. In the .NET line of thinking we should probably get rid of both, the SAXNotSupportedException and SAXNotRecognizedException, as in both cases, what you really have is an invalid argument (the name of a property or feature), which therefore should throw a standard ArgumentException with a message (or code) indicating "not supported" or "not recognized". How significant is the difference anyway? Karl |
From: Karl W. <ka...@wa...> - 2004-12-11 17:35:34
|
Elliotte Harold wrote: > Karl Waclawek wrote: > > >> I can understand this, but you would usually check the Uri anyway, >> since you want to know if the name has a namespace. So you would >> get used to "if (uri == null)" fairly quickly. > > > In fact, I find that I don't do that nearly as often as you'd think. As > long as I know the URI is not null there are a lot of cases where I can > treat the empty space exactly the same as any other namespace. If I had > to worry about it being null, I absolutely couldn't do that and I'd have > to litter my code with a lot of null checks or catch > (NullPointerException). It just works out cleaner to be able to treat no > namespace the same as any other namespace. which I can do most (not all) > of the time if it's the empty string and none of the time if it's null. I had another look at the Java API. It seems it is quite inconsistent with respect to empty string vs. null. Examples: - In EntityResolver, all occurrences of publicId or baseUri are supposed to be null, when absent. - DTDHandler.notationDecl(): publicId, systemId can be null, if not provided - DTDHandler.unparsedEntityDecl: publicId can be null if not provided - DeclHandler.attributeDecl(): mode and value can be null, where in the latter there is actually a semantic difference between value = null (meaning: none defined) and value = empty string (meaning: a value is specified, and it is the empty string). - DeclHandler.externalEntityDecl: publicId can be null, if not provided - LexicalHandler.startDTD(): publicId, systemId can be null, if not declared It actually seems that the prevalent approach is to use null for absence of a string parameter, and only in the case of namespaces does the API stray from this rule. However, I would strongly suggest that we remain consistent in SAX for .NET. If we pick String.Empty, then we need to allow one inconsistency - and that is for the Value parameter passed to the attributeDecl() call-back. Karl |
From: Elliotte H. <el...@me...> - 2004-12-10 14:58:17
|
Karl Waclawek wrote: > I don't seem to get list mesages anymore, except when cc'ed to me. > Are you experiencing the same? Did you get the "IsStandlone feature" > related nessage? I'm not sure off the top of my head, but I am getting duplicates of most messages, which suggests I'm getting them personally and through the list. -- Elliotte Rusty Harold el...@me... XML in a Nutshell 3rd Edition Just Published! http://www.cafeconleche.org/books/xian3/ http://www.amazon.com/exec/obidos/ISBN=0596007647/cafeaulaitA/ref=nosim |
From: Karl W. <ka...@wa...> - 2004-12-10 14:05:45
|
Elliotte Harold wrote: > Karl Waclawek wrote: > > >> - Should GetType() potentially differentiate between "NMTOKEN" >> and and enumeration? Choices: return "ENUMERATION" or the >> actual list of values. > > > > One more note on this: The XML Infoset spec states: > > # [attribute type] An indication of the type declared for this attribute > in the DTD. Legitimate values are ID, IDREF, IDREFS, ENTITY, ENTITIES, > NMTOKEN, NMTOKENS, NOTATION, CDATA, and ENUMERATION. If there is no > declaration for the attribute, this property has no value. If no > declaration has been read, but the [all declarations processed] property > of the document information item is false (so there may be an unread > declaration), then the value of this property is unknown. Applications > should treat no value and unknown as equivalent to a value of CDATA. The > value of this property is not affected by the validity of the attribute > value. > > So it does distinguish between ENUMERATION and NMTOKEN. Now I remember. Goin back to my source in SAXExpat, there is actually a comment about the Infoset, and that is why SAXExpat returns "NUMERATION". So, my suggestion is: "ENUMERATION", to bein line with the Infoset. > > >> Question: What is GetType() actually useful for? > Thanks for the clarification. One administrative question: I don't seem to get list mesages anymore, except when cc'ed to me. Are you experiencing the same? Did you get the "IsStandlone feature" related nessage? Karl |
From: Elliotte H. <el...@me...> - 2004-12-10 11:51:43
|
Karl Waclawek wrote: > - Should GetType() potentially differentiate between "NMTOKEN" > and and enumeration? Choices: return "ENUMERATION" or the > actual list of values. One more note on this: The XML Infoset spec states: # [attribute type] An indication of the type declared for this attribute in the DTD. Legitimate values are ID, IDREF, IDREFS, ENTITY, ENTITIES, NMTOKEN, NMTOKENS, NOTATION, CDATA, and ENUMERATION. If there is no declaration for the attribute, this property has no value. If no declaration has been read, but the [all declarations processed] property of the document information item is false (so there may be an unread declaration), then the value of this property is unknown. Applications should treat no value and unknown as equivalent to a value of CDATA. The value of this property is not affected by the validity of the attribute value. So it does distinguish between ENUMERATION and NMTOKEN. > Question: What is GetType() actually useful for? It is important to know which attributes have ID, IDREF, and IDREFS types. That has a lot of practical implication for XPath, XPointer, and other applications. ENTITY, ENTITIES, and NOTATION are also practically important in the rare use cases where someone is using unparsed entities and notations. For serialization it's sometimes useful to know whether an attribute has type CDATA or not, to support proper round-tripping. However, the distinction between ENUMERATION and NMTOKEN is pretty minimal and somewhat tautological. For instance, you need it to make XInclude work, because XInclude is defined in terms of the Infoset, but the practical impact is pretty small. I can imagine it might be useful if you were trying to derive a DTD from an instance document. -- Elliotte Rusty Harold el...@me... XML in a Nutshell 3rd Edition Just Published! http://www.cafeconleche.org/books/xian3/ http://www.amazon.com/exec/obidos/ISBN=0596007647/cafeaulaitA/ref=nosim |
From: Karl W. <ka...@wa...> - 2004-12-10 01:49:04
|
Just a brief review of the current status for the various threads: (only 3 participants so far) A) "Unifying" the core and extension interfaces: No opposition, seems accepted. B) Changing Exceptions in the API Issue around SaxNotSupportedException vs. built in NotSupportedException C) Reducing the versioning / eliminating the Ixxx2 interfaces It seems there is no opposition to the actual proposal, but there are a few details to consider: - Should we remove the IsDeclared mtehod from ILocator and replace it with returning "UNDECLARED" from GetType() - Should GetType() potentially differentiate between "NMTOKEN" and and enumeration? Choices: return "ENUMERATION" or the actual list of values. Question: What is GetType() actually useful for? D) Requiring EndDocument Jeff and I differ in one point: if EndDocument() should be called when IXmlReader.Parse() throws an exception (originating from the parser or a call-back). No feedback yet. E) StartElement, when URI is not present The basic question is, do we indicate the absence of a namespace by returning uri = "" or uri = null? So far, there is a 2:1 vote for uri = "". For matters of consistence the same approach (indicating absence of a string value) should be used throughout, so also for the arguments in Start/EndPrefixMapping(). F) Merging the IXmlReaderControl with IXmlReader Some opposition from Elliotte, that is, 2:1 in favour of adding Suspend/Resume() to IXmlReader. G) SAX holes - Max string length: String.Length is defined as int, not much we can do here. - Skipped entities in attribute values: possibility of marking spot in attribute value (at user option). H) Replace is-standalone feature with ILocator member No discussion yet (but off-list agreement between Jeff and Karl). As far as the current source and documentation is concerned, A, F, H are implemented, some of C is done (except for open issues). D and E are documented along Karl's line of thinking, B is partially implemented by removing SaxArgumentException, G is unresolved. Karl |
From: Karl W. <ka...@wa...> - 2004-12-09 01:29:09
|
It seems quite unnatural to report the standalone flag in the XML declaration through a parser feature. If anything, it should have been a property. We propose to remove the is-standalone feature and add a new property - let's call it "EntityType" - to the ILocator interface, whose type is defined like this (copied from the source, incl. doc comments): /// <summary>Describes the kind of entity that is being parsed.</summary> /// <remarks>For document entities, the *declared* rather than the effective /// value of the standalone flag is reported.</remarks> public enum ParsedEntityType { /// <summary>Document entity without specified value for the standalone flag.</summary> Document, /// <summary>Document entity with standalone="no".</summary> NotStandalone, /// <summary>Document entity with standalone="yes".</summary> Standalone, /// <summary>External general entity.</summary> General, /// <summary>External parameter entity.</summary> Parameter } This enumerated type covers more than just the standalone flag, but does it in a way so that the standalone flag is mutually exclusive with external entities. ILocator would then look like this (incl ILocator2 methods): public interface ILocator { string PublicId { get; } string SystemId { get; } int LineNumber { get; } int ColumnNumber { get; } string XmlVersion { get; } string Encoding { get; } ParsedEntityType EntityType { get; } } Karl |
From: Karl W. <ka...@wa...> - 2004-12-08 00:24:20
|
> Karl Waclawek wrote: > > >> <quote> >> Because of the streaming event model that SAX uses, some entity boundaries cannot be reported >> under >> any circumstances: >> >> a.. general entities within attribute values >> b.. parameter entities within declarations >> These will be silently expanded, with no indication of where the original entity boundaries were. >> </quote> > > > The problem is not failing to report entity boundaries. That's minor. The problem is when an > entity in an attribute value is not expanded. This has the potential to lose information silently, > which I think is a very bad thing. Absolutely, but the reason for that is that the boundaries can't be reported. If they could, then SkippedEntity() could be called. While working on Expat we discussed that potentially we could insert invalid XML characters into the attribute value reported to the application, to serve as markers for the start and end of the entity. (as per user option) But nothing really caught on. Karl |
From: Elliotte H. <el...@me...> - 2004-12-07 21:03:22
|
Karl Waclawek wrote: > > In general, adding such functionality to the factory might be useful, > regardless of the above. Do you have a link at hand? http://java.sun.com/j2se/1.4.2/docs/api/javax/xml/parsers/SAXParserFactory.html Be careful. There's a lot of stuff in that class (like it's lack of namespace awareness by default) we don't want to imitate. -- Elliotte Rusty Harold el...@me... XML in a Nutshell 3rd Edition Just Published! http://www.cafeconleche.org/books/xian3/ http://www.amazon.com/exec/obidos/ISBN=0596007647/cafeaulaitA/ref=nosim |
From: Elliotte H. <el...@me...> - 2004-12-07 21:01:34
|
Karl Waclawek wrote: > <quote> > Because of the streaming event model that SAX uses, some entity boundaries cannot be reported under > any circumstances: > > a.. general entities within attribute values > b.. parameter entities within declarations > These will be silently expanded, with no indication of where the original entity boundaries were. > </quote> The problem is not failing to report entity boundaries. That's minor. The problem is when an entity in an attribute value is not expanded. This has the potential to lose information silently, which I think is a very bad thing. -- Elliotte Rusty Harold el...@me... XML in a Nutshell 3rd Edition Just Published! http://www.cafeconleche.org/books/xian3/ http://www.amazon.com/exec/obidos/ISBN=0596007647/cafeaulaitA/ref=nosim |
From: Karl W. <ka...@wa...> - 2004-12-07 20:56:13
|
> I know of two holes in the SAX API where it fails to address all > possible XML documents. It would be worth considering these and if > possible plugging them: > > 1. Maximum limits: > > XML places no size limits on anything. SAX does. There cannot be more > than 2.1 billion or so attributes in an element, characters in an > attribute values, characters in an element name, data in a processing > instruction. Possibly .NET does not have these issues. Does .Net have a > maximum string size or array size like Java does? If it does, this > concern should at least be addressed in the documentation if not solved. The size is limited by the data type of the Length property. Seems to be the same as in Java. > 2. Skipped entities in attribute values: SAX says nothing about how to > handle these. Most parsers just drop them out, though a few add > something like &entity; to the attributer's string value. This should be > fixed, but it might require some major redesign. It says something in LexicalHandler.startEntity(): <quote> Because of the streaming event model that SAX uses, some entity boundaries cannot be reported under any circumstances: a.. general entities within attribute values b.. parameter entities within declarations These will be silently expanded, with no indication of where the original entity boundaries were. </quote> It would be really hard to find a way to do that. Karl |
From: Karl W. <ka...@wa...> - 2004-12-07 20:34:03
|
> Karl Waclawek wrote: > > > > Mostly for simplicity of coding. > > Let's say your factory returns an IXmlReader. > > Then you just check the reader-control feature > > (I think we haven't metioned that yet) > > and go on coding with Resume() and Suspend(). > > I don't find this simpler at all. The simple, obvious solution is to > either have a separate factory that creates suspendable parsers or have > a separate getSuspendableReader method. If the client knows they want a > suspendable parser, then let them ask the factory for one instead of > asking for some parser and getting back Lord knows what. And what if the client doesn't know? What if an application provides extra functionality depending on whether the system configured parser supports it? Even the demo application works like that. > It might be worth looking at how JAXP's SAXParserFactory class works and > consider using something along those lines that allows you to set > features and properties on the factory rather than on each individual > XmlReader. In general, adding such functionality to the factory might be useful, regardless of the above. Do you have a link at hand? If not, don't bother, I'll search Google. Karl |
From: Elliotte H. <el...@me...> - 2004-12-07 20:29:30
|
I know of two holes in the SAX API where it fails to address all possible XML documents. It would be worth considering these and if possible plugging them: 1. Maximum limits: XML places no size limits on anything. SAX does. There cannot be more than 2.1 billion or so attributes in an element, characters in an attribute values, characters in an element name, data in a processing instruction. Possibly .NET does not have these issues. Does .Net have a maximum string size or array size like Java does? If it does, this concern should at least be addressed in the documentation if not solved. 2. Skipped entities in attribute values: SAX says nothing about how to handle these. Most parsers just drop them out, though a few add something like &entity; to the attributer's string value. This should be fixed, but it might require some major redesign. -- Elliotte Rusty Harold el...@me... XML in a Nutshell 3rd Edition Just Published! http://www.cafeconleche.org/books/xian3/ http://www.amazon.com/exec/obidos/ISBN=0596007647/cafeaulaitA/ref=nosim |
From: Karl W. <ka...@wa...> - 2004-12-07 20:28:01
|
> Karl Waclawek wrote: > > > > I can understand this, but you would usually check the Uri anyway, > > since you want to know if the name has a namespace. So you would > > get used to "if (uri == null)" fairly quickly. > > In fact, I find that I don't do that nearly as often as you'd think. As > long as I know the URI is not null there are a lot of cases where I can > treat the empty space exactly the same as any other namespace. If I had > to worry about it being null, I absolutely couldn't do that and I'd have > to litter my code with a lot of null checks or catch > (NullPointerException). It just works out cleaner to be able to treat no > namespace the same as any other namespace. which I can do most (not all) > of the time if it's the empty string and none of the time if it's null. As I said before, I am not dead-set against String.Empty. If the majority of discussion participants vote for it, then I will change it in the API. > > Somewhat confusing is that RFC 2396 assigns some semantics to empty URIs: > > <quote> > > 4.2. Same-document References > > > > A URI reference that does not contain a URI is a reference to the > > current document. In other words, an empty URI reference within a > > document is interpreted as a reference to the start of that document, > > and a reference containing only a fragment identifier is a reference > > to the identified fragment of that document. Traversal of such a > > reference should not result in an additional retrieval action. > > However, if the URI reference occurs in a context that is always > > intended to result in a new request, as in the case of HTML's FORM > > element, then an empty URI reference represents the base URI of the > > current document and should be replaced by that URI when transformed > > into a request. > > </quote> > > This is irrelevant here. namespace URIs do not indicate any document, > whether they're the empty string or not. Well, this is not clear to me. There is a current document, which has an URI, and the URI reference would point to it. So, one could make a case that we have an URI and therefore a namespace. > > xmlns="" is just syntax for null, as it means that there is no default > > namespace (where one might have been before), it doesn't mean that the > > default namespace has an empty string as URI (which is not a valid URI). > > There is no such thing as null in XML. The concept of null is completely > foreign to it. There is no null namespace URI, no null element, no null > value. The concept of null is simply the concept of absence. Since there is no special syntax provided, the above is the one that has been assigned that meaning (for namespaces, not in general). Karl |
From: Elliotte H. <el...@me...> - 2004-12-07 20:24:28
|
Karl Waclawek wrote: > Mostly for simplicity of coding. > Let's say your factory returns an IXmlReader. > Then you just check the reader-control feature > (I think we haven't metioned that yet) > and go on coding with Resume() and Suspend(). I don't find this simpler at all. The simple, obvious solution is to either have a separate factory that creates suspendable parsers or have a separate getSuspendableReader method. If the client knows they want a suspendable parser, then let them ask the factory for one instead of asking for some parser and getting back Lord knows what. It might be worth looking at how JAXP's SAXParserFactory class works and consider using something along those lines that allows you to set features and properties on the factory rather than on each individual XmlReader. -- Elliotte Rusty Harold el...@me... XML in a Nutshell 3rd Edition Just Published! http://www.cafeconleche.org/books/xian3/ http://www.amazon.com/exec/obidos/ISBN=0596007647/cafeaulaitA/ref=nosim |
From: Elliotte H. <el...@me...> - 2004-12-07 20:19:49
|
Karl Waclawek wrote: > I can understand this, but you would usually check the Uri anyway, > since you want to know if the name has a namespace. So you would > get used to "if (uri == null)" fairly quickly. In fact, I find that I don't do that nearly as often as you'd think. As long as I know the URI is not null there are a lot of cases where I can treat the empty space exactly the same as any other namespace. If I had to worry about it being null, I absolutely couldn't do that and I'd have to litter my code with a lot of null checks or catch (NullPointerException). It just works out cleaner to be able to treat no namespace the same as any other namespace. which I can do most (not all) of the time if it's the empty string and none of the time if it's null. > Somewhat confusing is that RFC 2396 assigns some semantics to empty URIs: > <quote> > 4.2. Same-document References > > A URI reference that does not contain a URI is a reference to the > current document. In other words, an empty URI reference within a > document is interpreted as a reference to the start of that document, > and a reference containing only a fragment identifier is a reference > to the identified fragment of that document. Traversal of such a > reference should not result in an additional retrieval action. > However, if the URI reference occurs in a context that is always > intended to result in a new request, as in the case of HTML's FORM > element, then an empty URI reference represents the base URI of the > current document and should be replaced by that URI when transformed > into a request. > </quote> This is irrelevant here. namespace URIs do not indicate any document, whether they're the empty string or not. > xmlns="" is just syntax for null, as it means that there is no default > namespace (where one might have been before), it doesn't mean that the > default namespace has an empty string as URI (which is not a valid URI). There is no such thing as null in XML. The concept of null is completely foreign to it. There is no null namespace URI, no null element, no null value. I've seen developers of various stripes try to introduce null into XML, and it's almost always a disaster. There is null in Java and, I assume, C#. But this is not something that has any reflection in XML. -- Elliotte Rusty Harold el...@me... XML in a Nutshell 3rd Edition Just Published! http://www.cafeconleche.org/books/xian3/ http://www.amazon.com/exec/obidos/ISBN=0596007647/cafeaulaitA/ref=nosim |
From: Karl W. <ka...@wa...> - 2004-12-07 20:13:48
|
> Is it possible these could be no-ops instead of throwing exceptions? I > find all this talk of methods that sometimes work and sometimes don't to > be very confusing. > > For a class like XmlReader that is accessed directly by the programmer, > rather than being passed in as an argument or used through a callback, > wouldn't it be preferable to have subclasses with extra functionality? > Why do we need to stuff every method into one interface? Mostly for simplicity of coding. Let's say your factory returns an IXmlReader. Then you just check the reader-control feature (I think we haven't metioned that yet) and go on coding with Resume() and Suspend(). Otherwise you have to perform a type cast, and you have to use a second variable of type IXmlReaderSubclass as initally all you got is an IXmlReader. Or, as it was with the old and separate IXmlReaderControl interface, you have to keep two references to the parser, one for the standard IXmlReader functionality, and one for using the suspend/resume feature. Karl |
From: Karl W. <ka...@wa...> - 2004-12-07 20:06:52
|
> Jeff Rafter wrote: > > > We have been discussing the various possibilities of enforcing a rule > > about what to do when a URI is not present. For example consider this: > > > > <foo></foo> > > > > What should the StartElement call consist of? > > > > QName : "foo" > > LocalName : "foo" > > Uri : null or string.Empty? > > > This really, really, really needs to be the empty string, unless null > works very differently in C# than in Java. It's just incredibly useful > to be able to assume the namespace URI (and other arguments to this > method) are never null. I see this again and again in my SAX work. API > quality wise, null should be avoided here. I can understand this, but you would usually check the Uri anyway, since you want to know if the name has a namespace. So you would get used to "if (uri == null)" fairly quickly. In general, using a special value to indicate presence or absence of the value is more fragile than using a separate indicator. Most general, this should be a separate boolean flag, as some types (integers, booleans) do not provide the extra level of indirection that allows the use of null. In this case however, we have a reference type (C# lingo). > > > I am not saying that I am against String.Empty, but from > > a programming point of view it is simply a weaker choice. > > > > - null indicates absence better than picking a special value. > > - what if an empty string ever got a special meaning for URIs? > > It already has a special meaning for URIs and namespace URIs. There's > approximately zero chance of this changing. Somewhat confusing is that RFC 2396 assigns some semantics to empty URIs: <quote> 4.2. Same-document References A URI reference that does not contain a URI is a reference to the current document. In other words, an empty URI reference within a document is interpreted as a reference to the start of that document, and a reference containing only a fragment identifier is a reference to the identified fragment of that document. Traversal of such a reference should not result in an additional retrieval action. However, if the URI reference occurs in a context that is always intended to result in a new request, as in the case of HTML's FORM element, then an empty URI reference represents the base URI of the current document and should be replaced by that URI when transformed into a request. </quote> > > XML-wise these two elements are equivalent: > > <data /> > <data xmlns="" /> > > Unless you're prepared to use null for both of them, which I think would > be harder for implementers, you need to pick the empty string here. xmlns="" is just syntax for null, as it means that there is no default namespace (where one might have been before), it doesn't mean that the default namespace has an empty string as URI (which is not a valid URI). > Unlike SQL, XML namespaces do not distinguish between the empty string > and no value. Namespaces in XML says, "The default namespace can be set > to the empty string. This has the same effect, within the scope of the > declaration, of there being no default namespace." > See above. Karl |
From: Elliotte H. <el...@me...> - 2004-12-07 19:56:39
|
Karl Waclawek wrote: > While we are at it: Why are enumerations reported as NMTOKEN in the Java API" > <quote> > For an enumerated attribute that is not a notation, > the parser will report the type as "NMTOKEN". > </quote> > > Why not use the type as it is declared? Or use the the term "ENUMERATION"? I've actually seen Java SAX parsers that do each of these. i.e. return "ENUMERATION" as the type or return "(value1 | value2 | value3)". This has actually died out in the java space as parsers come into tighter compliance with the spec, but either one is reasonable as long as you specify it. I see no reason why SAX.NET has to follow Java SAX here. -- Elliotte Rusty Harold el...@me... XML in a Nutshell 3rd Edition Just Published! http://www.cafeconleche.org/books/xian3/ http://www.amazon.com/exec/obidos/ISBN=0596007647/cafeaulaitA/ref=nosim |
From: Elliotte H. <el...@me...> - 2004-12-07 19:53:36
|
Jeff Rafter wrote: > Karl has proposed merging the IXmlReaderControl functionality into the > IXmlReader. This would add a "Suspend" and "Resume" function to the core > XmlReader. For implementations that do not support this behavior > (AElfred) calling these methods would generate a NotSupportedException. > Is it possible these could be no-ops instead of throwing exceptions? I find all this talk of methods that sometimes work and sometimes don't to be very confusing. For a class like XmlReader that is accessed directly by the programmer, rather than being passed in as an argument or used through a callback, wouldn't it be preferable to have subclasses with extra functionality? Why do we need to stuff every method into one interface? -- Elliotte Rusty Harold el...@me... XML in a Nutshell 3rd Edition Just Published! http://www.cafeconleche.org/books/xian3/ http://www.amazon.com/exec/obidos/ISBN=0596007647/cafeaulaitA/ref=nosim |
From: Karl W. <ka...@wa...> - 2004-12-07 19:50:12
|
> This is actually something that is already in place and only affects > implementations, but we need to clarify when we require EndDocument to > be called. Actually, what you sate below differes from what is currently documented: The docs state: <quote> To be more specific about when EndDocument() must be called: It must always be called once StartDocument() has been called, even after a fatal error, unless IXmlReader.Parse() returned with an exception. </quote> > This is the proposal: > > (a) EndDocument is required if StartDocument is called > (b) Even when there is a FatalError > (c) In the event of an encoding error, StartDocument should have been > called so EndDocument still needs to be called. > (d) If there is an external exception (such as an IOExcpetion or a user > generated SaxExcpetion in a callback) EndDocument is still required. > (e) EndDocument should be the final event in a SAX event stream. > > Comments welcome, > I agree with a, b, c and e. About exceptions, whether user generated (call-back) or parser generated (IO, AV due to parser bug), they should be a dead stop. When the Parse() (or Resume()) method returned with an exception, EndDocument should not have been called. Nor should there have been any other call-backs after the exception. If you throw an exception, you want everything to stop, except for cleanup, which you should do right away in a finally clause, until it is handled. That is how exceptions normally work. If the parser throws the exception, only Parse()/Resume() can handle it, if a call-back throws an exception (that is not handled by the call-back itself), then who would reliably handle it? Again, only Parse()/Resume(). You do not want any other functions, that have nothing to do with exception handling or cleanup, called in between. Karl |
From: Jeff R. <li...@je...> - 2004-12-07 19:35:17
|
This is actually something that is already in place and only affects implementations, but we need to clarify when we require EndDocument to be called. This is the proposal: (a) EndDocument is required if StartDocument is called (b) Even when there is a FatalError (c) In the event of an encoding error, StartDocument should have been called so EndDocument still needs to be called. (d) If there is an external exception (such as an IOExcpetion or a user generated SaxExcpetion in a callback) EndDocument is still required. (e) EndDocument should be the final event in a SAX event stream. Comments welcome, Jeff Rafter |