Thread: Re: [xml-dev] Re: [Sax-devel] XInclude and SAX Incompatibility
Brought to you by:
dmegginson
From: Peter M. <pe...@ca...> - 2003-06-24 19:06:40
|
> OK, I understand your problem better now - although the > above is not an unparsed > entity problem. I assume you just wanted to demonstrate > the point. Yes, I made a mistake there. A more appropriate example would have been: foo.xml: <!DOCTYPE x [ <!ELEMENT x (include)> <!ELEMENT include EMPTY> <!ATTLIST include href CDATA #REQUIRED> ]> <x> <include href="bar.xml"/> </x> bar.xml: <!DOCTYPE stuff [ <!NOTATION jpg SYSTEM "jpg"> <!ENTITY foobar SYSTEM "pic.jpg" NDATA jpg> <!ELEMENT stuff EMPTY> <!ATTLIST stuff pic ENTITY #IMPLIED> ]> <stuff pic="foobar"/> The document after processing would have the same infoset as if foo.xml was: <x> <stuff pic="foobar"> </x> The rest of the argument is still relevant, however. > I know nothing about XInclude, but I would think that it > is quite problematic > to merge schema/doctype information from two different > documents. > Conflicting declarations? Are the two infosets supposed to > depend on each other? The XInclude spec explains what to do in the case of conflicting declarations. And the two infosets are supposed to both be able to create standalone infosets, but some things are allowed to depend on each other (like IDREFs). But yes, it can be quite problematic :) > Is it not possible to process both files independently > (using separate parsers)? > Why does your process need to retain the unparsed entity > declaration? > Once bar.xml is parsed (separately) and the ENTITY > attribute has been reported > (within the context of that parse), should the unparsed > entity declaration > not be discarded? Or do you say that the rest of foo.xml > might depend on that declaration? Well, the point of XInclude is to have the two (or more) documents merged into a single document -- and in SAX, that means we need to create a stream of events that combine the separate documents in the appropriate way. So we could process both files independently, and then, when both documents are parsed, figure out what events need to be sent in what order to create a valid SAX stream. But this would mean buffering all of the events of all of the documents, and that's not in the spirit of a stream based API at all. So a good XInclude processor for SAX would have to do the merging on the fly. As for whether the entity declaration could be discarded -- the merged document needs to be self contained in its references. So if it has a reference to an unparsed entity (or notation), that unparsed entity must be present in the document, which means an UnparsedEntityDecl event must be present in the SAX stream. > I guess I would need to know XInclude to be able to really > understand your concerns. > Sorry. :-) No apology necessary, Karl. You're making me explain myself better, and I'm glad someone is taking an interest, too. Peter |
From: Karl W. <ka...@wa...> - 2003-06-24 19:50:49
|
> foo.xml: > <!DOCTYPE x [ > <!ELEMENT x (include)> > <!ELEMENT include EMPTY> > <!ATTLIST include href CDATA #REQUIRED> > ]> > <x> > <include href="bar.xml"/> > </x> > > > bar.xml: > <!DOCTYPE stuff [ > <!NOTATION jpg SYSTEM "jpg"> > <!ENTITY foobar SYSTEM "pic.jpg" NDATA jpg> > <!ELEMENT stuff EMPTY> > <!ATTLIST stuff pic ENTITY #IMPLIED> > ]> > <stuff pic="foobar"/> > > The document after processing would have the same infoset as if foo.xml > was: > <x> > <stuff pic="foobar"> > </x> Yes, this example looks more like what I imagined. > The rest of the argument is still relevant, however. > > > I know nothing about XInclude, but I would think that it > > is quite problematic > > to merge schema/doctype information from two different > > documents. > > Conflicting declarations? Are the two infosets supposed to > > depend on each other? > > The XInclude spec explains what to do in the case of conflicting > declarations. And the two infosets are supposed to both be able to create > standalone infosets, but some things are allowed to depend on each other > (like IDREFs). But yes, it can be quite problematic :) I assume you need a non-validating parser then? > > Is it not possible to process both files independently > > (using separate parsers)? > > Why does your process need to retain the unparsed entity > > declaration? > > Once bar.xml is parsed (separately) and the ENTITY > > attribute has been reported > > (within the context of that parse), should the unparsed > > entity declaration > > not be discarded? Or do you say that the rest of foo.xml > > might depend on that declaration? > > Well, the point of XInclude is to have the two (or more) documents merged > into a single document -- and in SAX, that means we need to create a stream > of events that combine the separate documents in the appropriate way. So > we could process both files independently, and then, when both documents > are parsed, figure out what events need to be sent in what order to create > a valid SAX stream. OK, I get it now. You combine the SAX events and want the combined event stream to represent a proper SAX2 stream. Changing the specs may cause problems for many SAX2 based applications. I am just writing a SAX2 based validator (hence my interest) and this would really throw me off, as this validator perform some global processing in the EndDTD event. > But this would mean buffering all of the events of all > of the documents, and that's not in the spirit of a stream based API at > all. So a good XInclude processor for SAX would have to do the merging on > the fly. If there was a "stoppable/interruptable" SAX parser, you could run all parsers to the EndDTD event first, which means you get all DTD events before any elements are reported, and then have the parsers continue in the right order of dependency. But his would require to know all "included" elements beforehand. If you queue/buffer the events, you would need to do so until you encountered the last inclusion - but you don't know when this happens, so you have to buffer all of them. This points to a basic "mismatch" between the XML specs and the concept of inclusion - at least, that is my impression. > As for whether the entity declaration could be discarded -- the > merged document needs to be self contained in its references. So if it has > a reference to an unparsed entity (or notation), that unparsed entity must > be present in the document, which means an UnparsedEntityDecl event must be > present in the SAX stream. Yes, based on the lights going on, this is clear to me now. ;-) > > I guess I would need to know XInclude to be able to really > understand > your concerns. > > Sorry. :-) > > No apology necessary, Karl. You're making me explain myself better, and > I'm glad someone is taking an interest, too. :-) Unfortunately, I can't really see a way out of your dilemma. I don't think changing the SAX2 specs will be possible. Also, the way the XML specs are written, there needs to be some EndDTD event where one can do things like matching up ATTLIST declarations with ELEMENT declarations, check unparsed entity declarations against notation declarations, and build a validation model, since these declarations can occur in any order. One can't delay this until after the first element is encountered. Karl |