Thread: [Sax-devel] Support of W3C XML Schema
Brought to you by:
dmegginson
From: Eric v. d. V. <vd...@dy...> - 2002-02-12 12:51:33
|
Hi, Just wondering if you have in mind anything to send PSVI contributions to the applications when a W3C XML Schema validation is involved. Thanks Eric -- Rendez-vous a Paris pour mes formations XML/XSLT. http://dyomedea.com/formation/ ------------------------------------------------------------------------ Eric van der Vlist http://xmlfr.org http://dyomedea.com http://xsltunit.org http://4xt.org http://examplotron.org ------------------------------------------------------------------------ |
From: Elliotte R. H. <el...@me...> - 2002-02-12 15:12:41
|
At 1:51 PM +0100 2/12/02, Eric van der Vlist wrote: >Hi, > >Just wondering if you have in mind anything to send PSVI >contributions to the applications when a W3C XML Schema validation >is involved. > There is no formal effort to do this as far as I know. I've given a little thought to it on my own, and for the moment here's where I am: 1. The key new feature for a possible SAX 3.0 would be type information. That's as far as I've gotten. :-) The questions currently on my table are: 1. How are types represented? as strings? as instances of some class? as some subclasses of some class? I really don't know. The simple solution is just to provide the names. The complex solution is to define a Type class, but its unclear what it would look like. what its subclasses would be, and how implementations would deal with types discovered at runtime. It does not seem possible to have a Type subclass for every possible type. 2. Should the primary methods like characters and Attr.getValue() continue to return text data only? or should they be revised to actually provide some reasonable Java equivalent to each type? e.g. return ints for xsd:int data. The dichotomy between object and primitive types in Java makes this much harder. 3. Do we need to handle more than just the W3C XML Schema Language? e.g. do we want a more abstract model that would be accessible to RELAX NG and Schematron? Once those questions were answered, then we could begin considering other questions, like whether this required a revised variant of ContentHandler (analogous to the transition from DocumentHandler to ContentHandler between SAX1 and SAX2 that added namespace support) or whether it could be layered on top of the existing SAX2 APIs. -- +-----------------------+------------------------+-------------------+ | Elliotte Rusty Harold | el...@me... | Writer/Programmer | +-----------------------+------------------------+-------------------+ | The XML Bible, 2nd Edition (Hungry Minds, 2001) | | http://www.ibiblio.org/xml/books/bible2/ | | http://www.amazon.com/exec/obidos/ISBN=0764547607/cafeaulaitA/ | +----------------------------------+---------------------------------+ | Read Cafe au Lait for Java News: http://www.cafeaulait.org/ | | Read Cafe con Leche for XML News: http://www.ibiblio.org/xml/ | +----------------------------------+---------------------------------+ |
From: Eric v. d. V. <vd...@dy...> - 2002-02-12 15:40:13
|
Elliotte Rusty Harold wrote: > At 1:51 PM +0100 2/12/02, Eric van der Vlist wrote: > >>Hi, >> >>Just wondering if you have in mind anything to send PSVI >>contributions to the applications when a W3C XML Schema validation >>is involved. >> >> > > There is no formal effort to do this as far as I know. I've given a > little thought to it on my own, and for the moment here's where I am: > > 1. The key new feature for a possible SAX 3.0 would be type information. > > That's as far as I've gotten. :-) The questions currently on my table are: > > 1. How are types represented? as strings? as instances of some class? > as some subclasses of some class? I really don't know. The simple > solution is just to provide the names. The complex solution is to > define a Type class, but its unclear what it would look like. what > its subclasses would be, and how implementations would deal with > types discovered at runtime. It does not seem possible to have a Type > subclass for every possible type. Supporting W3C XML schema anonymous type declarations would be a challenge with strings... For all the other cases, using a URI as a string identifier could be a solution. DOM Level 3 Abstract Schemas seems to be searching a solution for the same issue: http://www.w3.org/TR/DOM-Level-3-ASLS/ I am not sure if these AS are abstract enough to support RELAX NG, though... > 2. Should the primary methods like characters and Attr.getValue() > continue to return text data only? or should they be revised to > actually provide some reasonable Java equivalent to each type? e.g. > return ints for xsd:int data. The dichotomy between object and > primitive types in Java makes this much harder. Yes! > > 3. Do we need to handle more than just the W3C XML Schema Language? > e.g. do we want a more abstract model that would be accessible to > RELAX NG and Schematron? Schematron looks like another kind of beast, but other grammar based languages such as RELAX NG and XML DTDS are needed IMO. > Once those questions were answered, then we could begin considering > other questions, like whether this required a revised variant of > ContentHandler (analogous to the transition from DocumentHandler to > ContentHandler between SAX1 and SAX2 that added namespace support) or > whether it could be layered on top of the existing SAX2 APIs. > What about an "Infoset ornament" using specific namespaces? There is this nasty problem of the attributes which may not easily be adorned, but this would provide a flexible way to add meta information to the original document, probably flexible enough to support even the contributions generated by Schematron. Eric -- Rendez-vous a Paris pour mes formations XML/XSLT. http://dyomedea.com/formation/ ------------------------------------------------------------------------ Eric van der Vlist http://xmlfr.org http://dyomedea.com http://xsltunit.org http://4xt.org http://examplotron.org ------------------------------------------------------------------------ |
From: Elliotte R. H. <el...@me...> - 2002-02-12 16:29:13
|
At 4:40 PM +0100 2/12/02, Eric van der Vlist wrote: >Supporting W3C XML schema anonymous type declarations would be a >challenge with strings... For all the other cases, using a URI as a >string identifier could be a solution. > Yes, that is tough. I hadn't thought of that. >DOM Level 3 Abstract Schemas seems to be searching a solution for >the same issue: > >http://www.w3.org/TR/DOM-Level-3-ASLS/ > >I am not sure if these AS are abstract enough to support RELAX NG, though... > I've been spending quite a bit of time with AS lately, and my opinion is that it's a horrible design that will sink under the waves very quickly. The AS type system is most definitely not extensible enough to handle even W3C XML schemas, much less RELAX NG. There are many other flaws, but the bottom line is I'd think we'd be better off starting from scratch. >> Once those questions were answered, then we could begin >>considering other questions, like whether this required a revised >>variant of ContentHandler (analogous to the transition from >>DocumentHandler to ContentHandler between SAX1 and SAX2 that added >>namespace support) or whether it could be layered on top of the >>existing SAX2 APIs. >> >What about an "Infoset ornament" using specific namespaces? > Can you elaborate? I'm not sure what you mean? -- +-----------------------+------------------------+-------------------+ | Elliotte Rusty Harold | el...@me... | Writer/Programmer | +-----------------------+------------------------+-------------------+ | The XML Bible, 2nd Edition (Hungry Minds, 2001) | | http://www.ibiblio.org/xml/books/bible2/ | | http://www.amazon.com/exec/obidos/ISBN=0764547607/cafeaulaitA/ | +----------------------------------+---------------------------------+ | Read Cafe au Lait for Java News: http://www.cafeaulait.org/ | | Read Cafe con Leche for XML News: http://www.ibiblio.org/xml/ | +----------------------------------+---------------------------------+ |
From: Mikael S. <mik...@ho...> - 2002-02-13 20:54:59
|
At 10:10 2002-02-12 -0500, Elliotte Rusty Harold wrote: >1. How are types represented? as strings? As far as I know, there is no string syntax for XML Schema datatypes with facets and stuff. It would be useful to have a standard string syntax for it though. |
From: Mikael S. <mik...@ho...> - 2002-02-12 15:57:25
|
At 10:10 2002-02-12 -0500, Elliotte Rusty Harold wrote: >3. Do we need to handle more than just the W3C XML Schema Language? e.g. >do we want a more abstract model that would be accessible to RELAX NG and >Schematron? If we only handle types, we can restrict it to XML Schema part 2: datatypes, which actually is independent from XML Schema part 1: structures. You can use XML Schema datatypes with RELAX NG. |
From: Elliotte R. H. <el...@me...> - 2002-02-12 16:29:11
|
At 4:50 PM +0100 2/12/02, Mikael St=C2ldal wrote: >If we only handle types, we can restrict it to XML Schema part 2: >datatypes, which actually is independent from XML Schema part 1: >structures. You can use XML Schema datatypes with RELAX NG. Yes, restricting it to simple types would make this easier, but I'm not convinced that's on the right side of the 80/20 split. I suspect we need complex types too. Also keep in mind that although you can use XML Schema data types with RELAX NG you can use other data types as well. I don't know that we want a solution that is limited to just the W3C XML Schema data types. -- +-----------------------+------------------------+-------------------+ | Elliotte Rusty Harold | el...@me... | Writer/Programmer | +-----------------------+------------------------+-------------------+ | The XML Bible, 2nd Edition (Hungry Minds, 2001) | | http://www.ibiblio.org/xml/books/bible2/ | | http://www.amazon.com/exec/obidos/ISBN=3D0764547607/cafeaulaitA/ | +----------------------------------+---------------------------------+ | Read Cafe au Lait for Java News: http://www.cafeaulait.org/ | | Read Cafe con Leche for XML News: http://www.ibiblio.org/xml/ | +----------------------------------+---------------------------------+ |
From: Eric v. d. V. <vd...@dy...> - 2002-02-12 16:55:53
|
Elliotte Rusty Harold wrote: > I've been spending quite a bit of time with AS lately, and my opinion > is that it's a horrible design that will sink under the waves very > quickly. The AS type system is most definitely not extensible enough > to handle even W3C XML schemas, much less RELAX NG. There are many > other flaws, but the bottom line is I'd think we'd be better off > starting from scratch. My first feeling looking at the Abstract Schemas is that it seemed to be as horrible as W3C XML Schema but just more abstract :=) ... After a second glance, I wonder if there are not things to keep. This idea of asking questions to the API (such as can I do this and this) for instance seems to be very flexible and allow a great deal of schema independence. > >>> Once those questions were answered, then we could begin >>>considering other questions, like whether this required a revised >>>variant of ContentHandler (analogous to the transition from >>>DocumentHandler to ContentHandler between SAX1 and SAX2 that added >>>namespace support) or whether it could be layered on top of the >>>existing SAX2 APIs. >>> >>> >>What about an "Infoset ornament" using specific namespaces? >> >> > > Can you elaborate? I'm not sure what you mean? > I was just refering to my old idea to define a simple XML serialization for the PSVI. In this case, <foo/> could become <foo psvi:type="bar"/> but we're left with the issue of defining the type of the attributes. OTH, this principle would give lots of flexibility such as defining a common namespace which could be complemented by specific namespaces for such and such language without having to change the API. Writing this make me think that this serialization could be hidden and that we could add generic methods such as getProperty("http://whatever.org/psvi#type") or getProperties("http://whatever.org/psvi") to get the property (or properties) associated to an element or attribute for the language defined by whatever.org. This would be the same basic idea than SAX 2 properties, but applied to elements and attributes... Eric -- Rendez-vous a Paris pour mes formations XML/XSLT. http://dyomedea.com/formation/ ------------------------------------------------------------------------ Eric van der Vlist http://xmlfr.org http://dyomedea.com http://xsltunit.org http://4xt.org http://examplotron.org ------------------------------------------------------------------------ |
From: Elliotte R. H. <el...@me...> - 2002-02-12 19:19:00
|
At 5:55 PM +0100 2/12/02, Eric van der Vlist wrote: >My first feeling looking at the Abstract Schemas is that it seemed >to be as horrible as W3C XML Schema but just more abstract :=) ... >After a second glance, I wonder if there are not things to keep. >This idea of asking questions to the API (such as can I do this and >this) for instance seems to be very flexible and allow a great deal >of schema independence. Perhaps that's a good idea in principal, (I still think the design is atrocious) but it's not really relevant to SAX, which is a read-only API. >I was just refering to my old idea to define a simple XML >serialization for the PSVI. > >In this case, <foo/> could become <foo psvi:type="bar"/> but we're >left with the issue of defining the type of the attributes. OTH, >this principle would give lots of flexibility such as defining a >common namespace which could be complemented by specific namespaces >for such and such language without having to change the API. > No, I don't think its a good idea to add phantom attributes that aren't really in the document. Furthermore, this adds a level of indirection that doesn't really buy us anything. A more direct API is cleaner and easier to understand. >Writing this make me think that this serialization could be hidden >and that we could add generic methods such as >getProperty("http://whatever.org/psvi#type") or >getProperties("http://whatever.org/psvi") to get the property (or >properties) associated to an element or attribute for the language >defined by whatever.org. > That's more generic. It would work for attributes, but not really for elements because there's no element object. I suppose you could put this on the XMLReader, and have it apply to the element whose startElement() method you're currently in, but then it doesn't work for attributes. I also think it's maybe too generic for data typing. I think data typing is a common enough desire that something more concrete is justified. I'd like to see if there could be a standard SAX class or interface for types, and a standard SAX representation of them, rather than just a bag of different bindings for different languages. -- +-----------------------+------------------------+-------------------+ | Elliotte Rusty Harold | el...@me... | Writer/Programmer | +-----------------------+------------------------+-------------------+ | The XML Bible, 2nd Edition (Hungry Minds, 2001) | | http://www.ibiblio.org/xml/books/bible2/ | | http://www.amazon.com/exec/obidos/ISBN=0764547607/cafeaulaitA/ | +----------------------------------+---------------------------------+ | Read Cafe au Lait for Java News: http://www.cafeaulait.org/ | | Read Cafe con Leche for XML News: http://www.ibiblio.org/xml/ | +----------------------------------+---------------------------------+ |
From: David B. <da...@pa...> - 2002-02-12 20:12:23
|
> Just wondering if you have in mind anything to send PSVI contributions > to the applications when a W3C XML Schema validation is involved. A clean layer should do the job nicely. New handler interfaces, reporting new PSVI events. Parser implementations have no real reason to change at all to support these new extension events, so there'd be no reason to make an incompatible change to the SAX2 core. Though I'd expect that there wouldn't be many implementations of a SAX filter (or whatever) adding a stream of PSVI events to the infoset event stream which SAX already produces. (Of course, turning SAX2 infoset event streams into ones with PSVI contributions preserves the basic flexibility of SAX: it's not just for parsers.) Since the primitive types are often used without the rest of the W3C schema baggage, it'd be good to start with some way to represent those. I'd suspect some kind of tools would be needed to work with derived types, so the most appropriate implementation of the abstract type could be used. Such tools might best differ depending on which schema models are used. One person suggested looking at Microsoft's MSXML 4.0 PSVI support, starting with IMXSchemaDeclHandler, for such issues. I didn't ... anyone have any comments on it? - Dave |
From: Eric v. d. V. <vd...@dy...> - 2002-02-13 09:07:38
|
David Brownell wrote: > A clean layer should do the job nicely. New handler interfaces, > reporting new PSVI events. > > Parser implementations have no real reason to change at all to > support these new extension events, so there'd be no reason > to make an incompatible change to the SAX2 core. Yes. This makes me think that there are (at least) 2 reasons why SAX is important! The first one is as a simple API to build my own applications and, you're right, I don't need nor want them to get exposed to all the complexity of the PSVI. The second one is for interoperability and I don't remember having ever seen such a level of interoperability that has been achieved with SAX. Not only can I choose the parser I want to use with any application and make combinations such as using AElfred with Xalan or Xerces with Saxon, but I can also write my own custom parser to apply a transformation on data coming out of CSV files or a RDBMS. If we want this to be still possible with XPath and XSLT 2.0, we need to find out a way to expose all the information needed from the PSVI, otherwise (unless another open API is defined elsewhere) we will have to buy monolithical applications with their embedded parsers. The question here is probably not so much to know if such APIs are needed, but to decide if it's in the scope of SAX which might loose the "Simplicity" which made its first letter: "CAX" doesn't sound as fine as "SAX"! Eric -- Rendez-vous a Paris pour mes formations XML/XSLT. http://dyomedea.com/formation/ ------------------------------------------------------------------------ Eric van der Vlist http://xmlfr.org http://dyomedea.com http://xsltunit.org http://4xt.org http://examplotron.org ------------------------------------------------------------------------ |
From: David B. <da...@pa...> - 2002-02-13 21:37:49
|
> > A clean layer should do the job nicely. New handler interfaces, > > reporting new PSVI events. > > > > Parser implementations have no real reason to change at all to > > support these new extension events, so there'd be no reason > > to make an incompatible change to the SAX2 core. > > Yes. This makes me think that there are (at least) 2 reasons why SAX is > important! > > ... > > If we want this to be still possible with XPath and XSLT 2.0, we need to > find out a way to expose all the information needed from the PSVI, > otherwise (unless another open API is defined elsewhere) we will have to > buy monolithical applications with their embedded parsers. I detest that "otherwise", but purveyors of monolithic software won't agree with me on that. > The question here is probably not so much to know if such APIs are > needed, but to decide if it's in the scope of SAX which might loose the > "Simplicity" which made its first letter: "CAX" doesn't sound as fine as > "SAX"! That makes me think of the mythical Apple engineering paradigm: name first, then t-shirt, product announcement, last the code ... :) I'd like to see a conventional/standard SAX-oriented way to get at the W3C Schema data. Enough people want such an API that it'd make sense to me to at least start the design discussions here, with an eye to "blessing" a good result as a SAX extension. Just be sure the model remains "stream of (PSV) Infoset events". We're at least partially stuck with complex W3C schemas. Better to have good ways to work them with SAX than not. - Dave |