Thread: Re: [Sax-devel] XInclude and SAX Incompatibility
Brought to you by:
dmegginson
From: Peter M. <pe...@ca...> - 2003-06-24 17:40:09
|
> What do you mean? Unparsed entity and Notation declarations are reported > before the EndDTD event, so you know which ones are declared before > the first StartElement call. Perhaps I didn't explain well enough. Suppose you had two xml files, foo.xml and bar.xml. foo.xml: <!DOCTYPE x [ <!ELEMENT x (include)> <!ELEMENT include EMPTY> <!ATTLIST include href CDATA #REQUIRED> ]> <x> <include href="bar.xml"/> </x> bar.xml: <!DOCTYPE stuff [ <!ENTITY foobar "hello world"> <!ELEMENT stuff (#PCDATA)> ]> <stuff>&foobar;</stuff> The document after processing would have the same infoset as if foo.xml was: <x> <stuff>&foobar;</stuff> </x> When this is parsed, all of the DTD events from foo.xml are sent, and then an EndDTD event. When we encounter the include, we start parsing bar.xml, and merging it into the same information set as foo.xml. It's not important that we send most of the DTD events -- XInclude doesn't specify that the document type information needs to correct. However, it is important that the foobar entity is added to the [entities] property of the result infoset, because at some point &foobar; will need to be resolved. So we need to put an UnparsedEntityDecl event into the stream. But this UnparsedEntitiyDecl event can't be sent, because we already sent an EndDTD event on the stream, at the end of the DTD for foo.xml. The only way (that I can see) to solve this is to either relax the restriction on UnparsedEntityDecl (and NotationDecl) events, or to buffer all of the events at the level of XInclude processing. Peter McCracken "Karl Waclawek" <ka...@wa... To: <xm...@li...>, <sax...@li...>, t> <sax...@li...>, Peter McCracken/Toronto/IBM@IBMCA cc: 06/24/2003 12:00 Subject: Re: [Sax-devel] XInclude and SAX Incompatibility PM > (1) When attributes with references to unparsed entities or notations are > encountered in included documents, these unparsed entities and notations > must be added to the [unparsed entities] and [notations] properties of the > document information item, as defined in the XML Infoset specification [2], > [3]. In SAX, this means sending DTD events. Why? These are not declarations. They are reported by the content handler. > SAX specifies that no DTD > events can be sent after the endDTD/startElement event. However, there is > no way of knowing which unparsed entities and notations must be sent until > after element events start being processed. What do you mean? Unparsed entity and Notation declarations are reported before the EndDTD event, so you know which ones are declared before the first StartElement call. Maybe I am mis-understanding you? Karl |
From: Peter M. <pe...@ca...> - 2003-06-24 20:03:17
|
> >(1) When attributes with references to unparsed entities or notations are > >encountered in included documents, these unparsed entities and notations > >must be added to the [unparsed entities] and [notations] properties of the > >document information item, as defined in the XML Infoset specification [2], > Actually no. That's a may, not a must. XInclude implementations are > not required to support notation and unparsed entity information > items/properties. Hmmmm... in the XInclude spec (at http://www.w3.org/TR/xinclude/) it says in section 4.5.1: "Any unparsed entity information item appearing in the [references] property of an attribute on the included items or any descendant thereof is added to the [unparsed entities] property of the source infoset's document information item, if it is not a duplicate of an existing member." While I agree that it doesn't say "must", I do not see any allowance for this to be optional, either. Are you certain? > >(2) The XInclude spec allows document fragments to be included using > >XPointer [4] paths. These create lots of problems with stream-based > >processing. For instance, an XML document could include a fragment of > >itself which has already been processed, and unless the document stream can > >be re-opened and reparsed, that information is not available. > Yes, that's tricky. In general however, I would claim it is possible > to reopen and reparse a stream in this circumstance. You can only > point to the stream with an XPointer if it has a URL. Given that it > has a URL you can open it. The content at the URL may be time varying > but that it is not an issue unique to streaming implementations of > XInclude. I'm envisioning the scenario when no URL is given (href=" #fragment-identifier"). This would be resolved against the current document. And the current document might not necessarily be able to be reopened, if it is coming from an input stream other than a regular file. > No, I don't think SAX needs to change. Given the very limited use of > unparsed entities and notations in the community today, I think > simply omitting them from the subset of the infoset that is handled > is fully adequate for almost all practical purposes. I agree that these issues are edge cases, and an implementation can be made that works most of the time. But still, it remains that issues exist which prevent full compliance with XInclude. I'm bringing this up here to see what people's thoughts are about having a fully compliant XInclude implementation for SAX. Cheers, Peter McCracken |
From: Elliotte R. H. <el...@me...> - 2003-06-25 01:14:42
|
At 4:03 PM -0400 6/24/03, Peter McCracken wrote: >"Any unparsed entity information item appearing in the [references] >property >of an attribute on the included items or any descendant thereof is added to >the [unparsed entities] >property of the source infoset's document information item, if it is not a >duplicate of an >existing member." > >While I agree that it doesn't say "must", I do not see any allowance for >this to be optional, either. Are you certain? Yes. See section 5.3. There need not be any unparsed entity information items in the infoset. The section you describe merely indicates what should happen if they are present. >> >(2) The XInclude spec allows document fragments to be included using >> >XPointer [4] paths. These create lots of problems with stream-based >> >processing. For instance, an XML document could include a fragment of >> >itself which has already been processed, and unless the document stream >can >> >be re-opened and reparsed, that information is not available. > >> Yes, that's tricky. In general however, I would claim it is possible >> to reopen and reparse a stream in this circumstance. You can only >> point to the stream with an XPointer if it has a URL. Given that it >> has a URL you can open it. The content at the URL may be time varying >> but that it is not an issue unique to streaming implementations of >> XInclude. > >I'm envisioning the scenario when no URL is given (href=" >#fragment-identifier"). This would be resolved against the current >document. And the current document might not necessarily be able to be >reopened, if it is coming from an input stream other than a regular file. > That's not legal. According to section 5.3 all element and document information items *must* have base URI properties, against which a relative URL can be resolved. >I agree that these issues are edge cases, and an implementation can be made >that works most of the time. But still, it remains that issues exist which >prevent full compliance with XInclude. I'm bringing this up here to see >what people's thoughts are about having a fully compliant XInclude >implementation for SAX. This is backwards. If this is important to you, bring it up with the XInclude working group, and insist that they make a fully SAX compatible version of XInclude. I suspect you'll find they think they already have. -- Elliotte Rusty Harold el...@me... Processing XML with Java (Addison-Wesley, 2002) http://www.cafeconleche.org/books/xmljava http://www.amazon.com/exec/obidos/ISBN%3D0201771861/cafeaulaitA |
From: Norman W. <nd...@nw...> - 2003-07-08 20:10:53
|
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 / Elliotte Rusty Harold <el...@me...> was heard to say: | At 4:03 PM -0400 6/24/03, Peter McCracken wrote: [...] |>I'm envisioning the scenario when no URL is given (href=" |>#fragment-identifier"). This would be resolved against the current |>document. And the current document might not necessarily be able to be |>reopened, if it is coming from an input stream other than a regular file. | | That's not legal. According to section 5.3 all element and document | information items *must* have base URI properties, against which a | relative URL can be resolved. Yes, but per Section 4.2 of RFC2396: 4.2. Same-document References A URI reference that does not contain a URI is a reference to the current document. In other words, an empty URI reference within a document is interpreted as a reference to the start of that document, and a reference containing only a fragment identifier is a reference to the identified fragment of that document. Traversal of such a reference should not result in an additional retrieval action. However, if the URI reference occurs in a context that is always intended to result in a new request, as in the case of HTML's FORM element, then an empty URI reference represents the base URI of the current document and should be replaced by that URI when transformed into a request. a reference of the form "#foo" is always a reference to the current document. It *is not* resolved according to the current base URI. You might argue that XInclude, like HTML's FORM, "is always intended to result in a new request", but I wouldn't. Be seeing you, norm - -- Norman Walsh <nd...@nw...> | I'm NOT in denial! http://nwalsh.com/ | -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.0.6 (GNU/Linux) Comment: Processed by Mailcrypt 3.5.7 <http://mailcrypt.sourceforge.net/> iD8DBQE/CyUGOyltUcwYWjsRAlIrAKCcV0LittlyKVoedHFjhLttJseE1wCfWqot Ra1UP2CRxokprtDPTpn2HXw= =kHQK -----END PGP SIGNATURE----- |
From: Karl W. <ka...@wa...> - 2003-06-24 18:31:21
|
> > What do you mean? Unparsed entity and Notation declarations are reported > > before the EndDTD event, so you know which ones are declared before > > the first StartElement call. > > Perhaps I didn't explain well enough. Suppose you had two xml files, > foo.xml and bar.xml. > > foo.xml: > <!DOCTYPE x [ > <!ELEMENT x (include)> > <!ELEMENT include EMPTY> > <!ATTLIST include > href CDATA #REQUIRED> > ]> > <x> > <include href="bar.xml"/> > </x> > > > bar.xml: > <!DOCTYPE stuff [ > <!ENTITY foobar "hello world"> > <!ELEMENT stuff (#PCDATA)> > ]> > <stuff>&foobar;</stuff> > > The document after processing would have the same infoset as if foo.xml > was: > <x> > <stuff>&foobar;</stuff> > </x> > > When this is parsed, all of the DTD events from foo.xml are sent, and then > an EndDTD event. When we encounter the include, we start parsing bar.xml, > and merging it into the same information set as foo.xml. It's not > important that we send most of the DTD events -- XInclude doesn't specify > that the document type information needs to correct. However, it is > important that the foobar entity is added to the [entities] property of the > result infoset, because at some point &foobar; will need to be resolved. OK, I understand your problem better now - although the above is not an unparsed entity problem. I assume you just wanted to demonstrate the point. I know nothing about XInclude, but I would think that it is quite problematic to merge schema/doctype information from two different documents. Conflicting declarations? Are the two infosets supposed to depend on each other? Is it not possible to process both files independently (using separate parsers)? Why does your process need to retain the unparsed entity declaration? Once bar.xml is parsed (separately) and the ENTITY attribute has been reported (within the context of that parse), should the unparsed entity declaration not be discarded? Or do you say that the rest of foo.xml might depend on that declaration? > So > we need to put an UnparsedEntityDecl event into the stream. But this > UnparsedEntitiyDecl event can't be sent, because we already sent an EndDTD > event on the stream, at the end of the DTD for foo.xml. I guess I would need to know XInclude to be able to really understand your concerns. Sorry. :-) Karl |