Thread: Re: [Sax-devel] Support of W3C XML Schema
Brought to you by:
dmegginson
From: <ne...@ca...> - 2002-02-12 19:24:14
|
Hi folks, We faced this very problem of how to expose XML schema infoset augmentations--and indeed infoset augmentations generally--during the design of Apache Xerces2's Xerces Native Interface (XNI). The way we solved it is to add a very general container to most of the XNI callbacks; before this addition, these tended to be very SAX-like. This containiner object can carry whatever kinds of augmentation a component (a scanner, validator for a certain kind of grammar or whatever) wishes to provide to some later component (i.e., an application, a standard API generator etc.) If the SAX community ever wanted to contemplate some kind of AugmentedContentHandler interface, perhaps a very extensible approach like this might merit some consideration. At any rate I thought it might serve as a starting point for discussion, since it is used in a production-quality product that supports the lightweight PSVI (and will hopefully support the full PSVI at some point soon). Cheers, Neil Neil Graham XML Parser Development IBM Toronto Lab Phone: 905-413-3519, T/L 969-3519 E-mail: ne...@ca... Elliotte Rusty Harold <el...@me...>@lists.sourceforge.net on 02/12/2002 10:10:24 AM Please respond to Elliotte Rusty Harold <el...@me...> Sent by: sax...@li... To: Eric van der Vlist <vd...@dy...>, sax...@li... cc: Subject: Re: [Sax-devel] Support of W3C XML Schema At 1:51 PM +0100 2/12/02, Eric van der Vlist wrote: >Hi, > >Just wondering if you have in mind anything to send PSVI >contributions to the applications when a W3C XML Schema validation >is involved. > There is no formal effort to do this as far as I know. I've given a little thought to it on my own, and for the moment here's where I am: 1. The key new feature for a possible SAX 3.0 would be type information. That's as far as I've gotten. :-) The questions currently on my table are: 1. How are types represented? as strings? as instances of some class? as some subclasses of some class? I really don't know. The simple solution is just to provide the names. The complex solution is to define a Type class, but its unclear what it would look like. what its subclasses would be, and how implementations would deal with types discovered at runtime. It does not seem possible to have a Type subclass for every possible type. 2. Should the primary methods like characters and Attr.getValue() continue to return text data only? or should they be revised to actually provide some reasonable Java equivalent to each type? e.g. return ints for xsd:int data. The dichotomy between object and primitive types in Java makes this much harder. 3. Do we need to handle more than just the W3C XML Schema Language? e.g. do we want a more abstract model that would be accessible to RELAX NG and Schematron? Once those questions were answered, then we could begin considering other questions, like whether this required a revised variant of ContentHandler (analogous to the transition from DocumentHandler to ContentHandler between SAX1 and SAX2 that added namespace support) or whether it could be layered on top of the existing SAX2 APIs. -- +-----------------------+------------------------+-------------------+ | Elliotte Rusty Harold | el...@me... | Writer/Programmer | +-----------------------+------------------------+-------------------+ | The XML Bible, 2nd Edition (Hungry Minds, 2001) | | http://www.ibiblio.org/xml/books/bible2/ | | http://www.amazon.com/exec/obidos/ISBN=0764547607/cafeaulaitA/ | +----------------------------------+---------------------------------+ | Read Cafe au Lait for Java News: http://www.cafeaulait.org/ | | Read Cafe con Leche for XML News: http://www.ibiblio.org/xml/ | +----------------------------------+---------------------------------+ _______________________________________________ sax-devel mailing list sax...@li... https://lists.sourceforge.net/lists/listinfo/sax-devel |
From: <ne...@ca...> - 2002-02-12 20:35:52
|
Hi Elliotte, >My general take is that this approach, or the SAX features and >properties approach, is appropriate when there are going to be many >implementations providing various different kinds of information >which cannot reasonably be predicted in advance. Right, and given one of XNI's main goals is to provide the maximum possible flexibility, that lack of predictability is definitely true in our case. I would point out though that, especially for applications like exposing the PSVI, property-based and parameter-augmentation-based approaches aren't quite equivalent. The trouble with a property-based approach is that the information augmentations aren't rigidly tethered to the information they augment. With an augmentation, it's blatantly obvious which element or character content or attribute an augmentation pertains to--it's right on the call. It would take a lot of documentation--and I suspect many a bug report--before users would figure out which properties of what objects were to be relied upon when. Just for background, another approach we mulled over long and hard was to construct some sort of "schemaHandler" which would receive events in well-defined sequences w.r.t. the ContentHandler. Eventually though we gave up: we decided this would impose too great a co-ordination burden on the application, particularly for complex API's like the PSVI. >It is not something that should be used for core >functionality. Among other problems it relegates pretty much >everything to an object, so type checking is significantly weakened. But unless you want to support a JAXB or Castor-style binding API, then it's not obvious to me how you'll bind general types tightly to objects. >I think we can design a good API for typed parsing that would be >portable across parsers. Now maybe this API is SAX 3.0 and maybe it >isn't, but either way the functionality should be selected directly >with methods, interfaces and classes rather than through URLs hidden >in strings. It is *not* a good thing to make a request for the type >of an element look exactly the same as a request to get the base URL >of the element or the number of characters in the element or anything >else someone might want to store in a property. The augmentation >approach is a hack to layer arbitrary information on top of an >existing API. However, it is not a good way to provide core >functionality. Defining what core functionality is might not be such an easy task though. But if one were to keep expectations low enough--perhaps an API that supported passing the various Java primitive types or something--you could probably get something quite workable. Given that there isn't currently a standard API for exposing the PSVI (although of course the W3C is trying to work on this), there's certainly a void on this front. So I guess it depends whether the concensus is that something very lightweight and datatype-oriented is needed, or whether something heavier and perhaps a bit more ugly would serve the need better. Cheers, Neil |
From: Jeff R. <jef...@de...> - 2002-02-12 22:44:08
|
> So I guess it depends whether the concensus is that > something very lightweight and datatype-oriented is needed, or whether > something heavier and perhaps a bit more ugly would serve the need better. Right-- this definitely could be something very ugly very quick. For me, one thing that would help is to separate the notion of type identity from type functionality. By functionality I suppose that I mean the method of validation (for simpleTypes this is lexical+value based checking and for complexTypes it is grammar based). I think with PSVI we are trying to simply come up with identity (and what has been added) not the ability to check/validate the contents or map the type to actual Java types. Though I am open to being wrong here. Maybe we need both-- but it seems that it would be wise to separate the two. It strikes me that for identity we may or may not need objects-- for validation objects representing types (or at least in the position of a type system) makes good sense. Cheers, Jeff Rafter Defined Systems http://www.defined.net XML Development and Developer Web Hosting |
From: Elliotte R. H. <el...@me...> - 2002-02-12 23:51:43
|
At 3:35 PM -0500 2/12/02, ne...@ca... wrote: >>It is not something that should be used for core >>functionality. Among other problems it relegates pretty much >>everything to an object, so type checking is significantly weakened. > >But unless you want to support a JAXB or Castor-style binding API, then >it's not obvious to me how you'll bind general types tightly to objects. > I didn't explain myself carefully enough. I apologize. This really isn't what I meant at all. I was referring to the types of the things at the other ends of the property, not the schema type or the types that the schema types would be cast to. By way of elucidation, consider the existing SAX getProperty method. This is declared to return a java.lang.Object. However, depending on which property you request you may get back a String, an Element, a DeclHandler, a LexicalHandler, or a Node and that's just in Xerces alone. Other parsers add to this list. If I invoke getType() on an Attr or have a type argument passed to startElement() then I know it has type org.someone.Type and has the appropriate methods. If I make a mistake the compiler can catch it. I don't have such string type checking with the Object/URL approach. -- +-----------------------+------------------------+-------------------+ | Elliotte Rusty Harold | el...@me... | Writer/Programmer | +-----------------------+------------------------+-------------------+ | The XML Bible, 2nd Edition (Hungry Minds, 2001) | | http://www.ibiblio.org/xml/books/bible2/ | | http://www.amazon.com/exec/obidos/ISBN=0764547607/cafeaulaitA/ | +----------------------------------+---------------------------------+ | Read Cafe au Lait for Java News: http://www.cafeaulait.org/ | | Read Cafe con Leche for XML News: http://www.ibiblio.org/xml/ | +----------------------------------+---------------------------------+ |
From: Elliotte R. H. <el...@me...> - 2002-02-12 23:51:45
|
At 3:35 PM -0500 2/12/02, ne...@ca... wrote: >Defining what core functionality is might not be such an easy task though. >But if one were to keep expectations low enough--perhaps an API that >supported passing the various Java primitive types or something--you could >probably get something quite workable. > >Given that there isn't currently a standard API for exposing the PSVI >(although of course the W3C is trying to work on this), there's certainly a >void on this front. So I guess it depends whether the concensus is that >something very lightweight and datatype-oriented is needed, or whether >something heavier and perhaps a bit more ugly would serve the need better. > It will certainly take work. However I wonder if we can divide the work. Specifically, I've also been thinking about typing in the context of JDOM. It occurs to me that a sufficiently generic org.xml.types library might be useful for both SAX and JDOM and perhaps other APIs as well. What would this library look like? A root Type class with ComplexType and SimpleType subclasses. We'd define base classes for all the W3C Schema Language simple types. The root Type interface would probably have getName() and getURI() methods, e.g. public interface Type { public String getName(); public String getURI(); } The SimpleType class would add a getClass() method to return an appropriate Java class for a type and a toObject() method for casting to that Java type. (I really wish Java differentiate primitive and object types. We could make this so much cleaner.) Then again maybe this belongs in the root Type interface? We could then derive a org.xml.types.xschema package that knew about all the W3C XML Schema Language simple types and implemented this stuff. It would also add more specifically declared methods for casting to more specific Java types. The schema aware content handler would be much like today's ContentHandler except for the startElement() method, which would look like this: public void startElement(String namespaceURI, String localName, String qName, Attributes atts, Type type) The Attributes class would add the necessary getSchemaType(int i) method to return the type of each attribute; e.g. public SimpleType getSchemaType(int i) Anyway, this is very rough and I can already see some problems with it, but it should give you the rough idea of what I'm thinking. -- +-----------------------+------------------------+-------------------+ | Elliotte Rusty Harold | el...@me... | Writer/Programmer | +-----------------------+------------------------+-------------------+ | The XML Bible, 2nd Edition (Hungry Minds, 2001) | | http://www.ibiblio.org/xml/books/bible2/ | | http://www.amazon.com/exec/obidos/ISBN=0764547607/cafeaulaitA/ | +----------------------------------+---------------------------------+ | Read Cafe au Lait for Java News: http://www.cafeaulait.org/ | | Read Cafe con Leche for XML News: http://www.ibiblio.org/xml/ | +----------------------------------+---------------------------------+ |
From: <ne...@ca...> - 2002-02-13 22:25:30
|
Hi folks, On the xerces lists, we certainly get lots of requests for PSVI support--both for DOM and SAX. So the demand definitely is out there... One thing I think it would be useful to be clear on at the start though is just what we want: Do we want to write an API that will expose an attribute value as an Integer (or a org.xml.sax.types.Integer or whatever) when that attribute is of type xsd:integer? Or do we want to build a framework that can handle various kinds of infoset augmentation, with a particular view to exposing the PSVI (in its lightweight or heavyweight form) in all its glory? That is, replete with validation/assessment outcomes, error codes, schema normalized values etc.? I hope it's the latter. Although compared with a lot of people around here I'm a newcomer to this stuff, my understanding is that SAX was intended to do a pretty good job of exposing the XML Infoset. And by and large it does, with some exceptions. Likewise, I think doing a fairly good (perfect would be best! :-)) job of representing the PSVI should be the ultimate goal of this discussion since, like it or not, W3C schemas are here to stay. Of course if we can do it in a way that makes exposing other kinds of infoset augmentations easy, so much the better! Cheers, Neil Neil Graham XML Parser Development IBM Toronto Lab Phone: 905-413-3519, T/L 969-3519 E-mail: ne...@ca... David Brownell <da...@pa...>@lists.sourceforge.net on 02/13/2002 04:35:52 PM Please respond to David Brownell <da...@pa...> Sent by: sax...@li... To: Eric van der Vlist <vd...@dy...> cc: sax...@li... Subject: Re: [Sax-devel] Support of W3C XML Schema > > A clean layer should do the job nicely. New handler interfaces, > > reporting new PSVI events. > > > > Parser implementations have no real reason to change at all to > > support these new extension events, so there'd be no reason > > to make an incompatible change to the SAX2 core. > > Yes. This makes me think that there are (at least) 2 reasons why SAX is > important! > > ... > > If we want this to be still possible with XPath and XSLT 2.0, we need to > find out a way to expose all the information needed from the PSVI, > otherwise (unless another open API is defined elsewhere) we will have to > buy monolithical applications with their embedded parsers. I detest that "otherwise", but purveyors of monolithic software won't agree with me on that. > The question here is probably not so much to know if such APIs are > needed, but to decide if it's in the scope of SAX which might loose the > "Simplicity" which made its first letter: "CAX" doesn't sound as fine as > "SAX"! That makes me think of the mythical Apple engineering paradigm: name first, then t-shirt, product announcement, last the code ... :) I'd like to see a conventional/standard SAX-oriented way to get at the W3C Schema data. Enough people want such an API that it'd make sense to me to at least start the design discussions here, with an eye to "blessing" a good result as a SAX extension. Just be sure the model remains "stream of (PSV) Infoset events". We're at least partially stuck with complex W3C schemas. Better to have good ways to work them with SAX than not. - Dave _______________________________________________ sax-devel mailing list sax...@li... https://lists.sourceforge.net/lists/listinfo/sax-devel |
From: Eric v. d. V. <vd...@dy...> - 2002-02-13 22:32:22
|
ne...@ca... wrote: > Hi folks, > > On the xerces lists, we certainly get lots of requests for PSVI > support--both for DOM and SAX. So the demand definitely is out there... > > One thing I think it would be useful to be clear on at the start though is > just what we want: Do we want to write an API that will expose an > attribute value as an Integer (or a org.xml.sax.types.Integer or whatever) > when that attribute is of type xsd:integer? Or do we want to build a > framework that can handle various kinds of infoset augmentation, with a > particular view to exposing the PSVI (in its lightweight or heavyweight > form) in all its glory? That is, replete with validation/assessment > outcomes, error codes, schema normalized values etc.? If one of the motivations is to keep parsers and applications interoperable, we should need to expose at least all the information needed by XPath20/XSLT20/XQuery10 which is raising the bar quite high! Eric -- Rendez-vous a Paris pour mes formations XML/XSLT. http://dyomedea.com/formation/ ------------------------------------------------------------------------ Eric van der Vlist http://xmlfr.org http://dyomedea.com http://xsltunit.org http://4xt.org http://examplotron.org ------------------------------------------------------------------------ |
From: Elliotte R. H. <el...@me...> - 2002-02-14 04:46:09
|
At 5:25 PM -0500 2/13/02, ne...@ca... wrote: >Hi folks, > >On the xerces lists, we certainly get lots of requests for PSVI >support--both for DOM and SAX. So the demand definitely is out there... > >One thing I think it would be useful to be clear on at the start though is >just what we want: Do we want to write an API that will expose an >attribute value as an Integer (or a org.xml.sax.types.Integer or whatever) >when that attribute is of type xsd:integer? Or do we want to build a >framework that can handle various kinds of infoset augmentation, with a >particular view to exposing the PSVI (in its lightweight or heavyweight >form) in all its glory? That is, replete with validation/assessment >outcomes, error codes, schema normalized values etc.? > Neither. I want something even simpler. I want a specific augmentation for type information and type information only. I do *not* want a general purpose augmentation for anything that someone might choose to add to an XML document. That would only complexify the practical use for type information. Furthermore, I do not want an automatic conversion to some Java type other than String or char[]. I simply want to be able to find out what the type of each thing is, and have some methods available to convert the strings if I so choose. However, I want any conversions to be requested by the client, not automatically performed by the parser. -- +-----------------------+------------------------+-------------------+ | Elliotte Rusty Harold | el...@me... | Writer/Programmer | +-----------------------+------------------------+-------------------+ | The XML Bible, 2nd Edition (Hungry Minds, 2001) | | http://www.ibiblio.org/xml/books/bible2/ | | http://www.amazon.com/exec/obidos/ISBN=0764547607/cafeaulaitA/ | +----------------------------------+---------------------------------+ | Read Cafe au Lait for Java News: http://www.cafeaulait.org/ | | Read Cafe con Leche for XML News: http://www.ibiblio.org/xml/ | +----------------------------------+---------------------------------+ |
From: David B. <da...@pa...> - 2002-02-15 01:15:22
|
> >One thing I think it would be useful to be clear on at the start though is > >just what we want: More like "essential" ... though some evolution is to be expected! :) > > Do we want to write an API that will expose an > >attribute value as an Integer (or a org.xml.sax.types.Integer or whatever) > >when that attribute is of type xsd:integer? Or do we want to build a > >framework that can handle various kinds of infoset augmentation, with a > >particular view to exposing the PSVI (in its lightweight or heavyweight > >form) in all its glory? That is, replete with validation/assessment > >outcomes, error codes, schema normalized values etc.? Validation errors would be reported as usual, through the ErrorHandler.error() routine ... yes? > Neither. I want something even simpler. I want a specific > augmentation for type information and type information only. I do > *not* want a general purpose augmentation for anything that someone > might choose to add to an XML document. That would only complexify > the practical use for type information. I tend to agree, though I could stand to invest more time in the W3C schema world. Policy about how to use the type info is something that should be provided by the handler. It's easiest to see that for non-primitive types ... there are lots of ways to map them to data structures, if that's even the goal in a given context. > Furthermore, I do not want an automatic conversion to some Java type > other than String or char[]. I simply want to be able to find out > what the type of each thing is, and have some methods available to > convert the strings if I so choose. However, I want any conversions > to be requested by the client, not automatically performed by the > parser. Yes. Though I suspect that for non-primitive types, the way that value components get identified will become an issue. Likely for primitive types we'll find ourselves wanting standard ways to convert the types, but those costs should only be incurred when the handler's policy says that's the way to go. - Dave |
From: Mikael S. <mik...@ho...> - 2002-02-19 22:13:33
|
At 23:43 2003-02-13 -0500, Elliotte Rusty Harold wrote: >Furthermore, I do not want an automatic conversion to some Java type other >than String or char[]. Why not? |
From: Elliotte R. H. <el...@me...> - 2002-02-19 22:34:25
|
At 10:56 PM +0100 2/19/02, Mikael St=C2ldal wrote: >At 23:43 2003-02-13 -0500, Elliotte Rusty Harold wrote: >>Furthermore, I do not want an automatic conversion to some Java >>type other than String or char[]. > >Why not? > > 1. Most of the time I don't need it. Most of the time I just want text. 2. If I do need it, chances are good that I don't want it as the exact type the library wants to give me. I want it as some other type. -- +-----------------------+------------------------+-------------------+ | Elliotte Rusty Harold | el...@me... | Writer/Programmer | +-----------------------+------------------------+-------------------+ | The XML Bible, 2nd Edition (Hungry Minds, 2001) | | http://www.ibiblio.org/xml/books/bible2/ | | http://www.amazon.com/exec/obidos/ISBN=3D0764547607/cafeaulaitA/ | +----------------------------------+---------------------------------+ | Read Cafe au Lait for Java News: http://www.cafeaulait.org/ | | Read Cafe con Leche for XML News: http://www.ibiblio.org/xml/ | +----------------------------------+---------------------------------+ |
From: David B. <da...@pa...> - 2002-02-20 03:27:51
|
3. It's called layering. Because of (1) and (2) it's not appropriate to have it at all times ... but such policies can easily be implemented in layers on top of one that handles type recognition. ----- Original Message ----- =46rom: "Elliotte Rusty Harold" <el...@me...> To: <sax...@li...> Sent: Tuesday, February 19, 2002 2:30 PM Subject: Re: [Sax-devel] Support of W3C XML Schema At 10:56 PM +0100 2/19/02, Mikael St=C2ldal wrote: >At 23:43 2003-02-13 -0500, Elliotte Rusty Harold wrote: >>Furthermore, I do not want an automatic conversion to some Java >>type other than String or char[]. > >Why not? > > 1. Most of the time I don't need it. Most of the time I just want tex= t. 2. If I do need it, chances are good that I don't want it as the exact type the library wants to give me. I want it as some other type= . |
From: Elliotte R. H. <el...@me...> - 2002-02-12 20:07:53
|
At 2:24 PM -0500 2/12/02, ne...@ca... wrote: >Hi folks, > >We faced this very problem of how to expose XML schema infoset >augmentations--and indeed infoset augmentations generally--during the >design of Apache Xerces2's Xerces Native Interface (XNI). The way we >solved it is to add a very general container to most of the XNI callbacks; >before this addition, these tended to be very SAX-like. This containiner >object can carry whatever kinds of augmentation a component (a scanner, >validator for a certain kind of grammar or whatever) wishes to provide to >some later component (i.e., an application, a standard API generator etc.) > I knew I'd seem something like that somewhere. :-) My general take is that this approach, or the SAX features and properties approach, is appropriate when there are going to be many implementations providing various different kinds of information which cannot reasonably be predicted in advance. It is an extension mechanism. It is not something that should be used for core functionality. Among other problems it relegates pretty much everything to an object, so type checking is significantly weakened. Of course the actual classes vary a lot from parser to parser so the code isn't really all that portable. In the case of SAX properties, you can even see cases where different parsers provide the same features and properties but use different names for them. I think we can design a good API for typed parsing that would be portable across parsers. Now maybe this API is SAX 3.0 and maybe it isn't, but either way the functionality should be selected directly with methods, interfaces and classes rather than through URLs hidden in strings. It is *not* a good thing to make a request for the type of an element look exactly the same as a request to get the base URL of the element or the number of characters in the element or anything else someone might want to store in a property. The augmentation approach is a hack to layer arbitrary information on top of an existing API. However, it is not a good way to provide core functionality. -- +-----------------------+------------------------+-------------------+ | Elliotte Rusty Harold | el...@me... | Writer/Programmer | +-----------------------+------------------------+-------------------+ | The XML Bible, 2nd Edition (Hungry Minds, 2001) | | http://www.ibiblio.org/xml/books/bible2/ | | http://www.amazon.com/exec/obidos/ISBN=0764547607/cafeaulaitA/ | +----------------------------------+---------------------------------+ | Read Cafe au Lait for Java News: http://www.cafeaulait.org/ | | Read Cafe con Leche for XML News: http://www.ibiblio.org/xml/ | +----------------------------------+---------------------------------+ |
From: Simon St.L. <sim...@si...> - 2002-02-16 17:13:23
|
On Tue, 2002-02-12 at 15:05, Elliotte Rusty Harold wrote: > My general take is that this approach, or the SAX features and > properties approach, is appropriate when there are going to be many > implementations providing various different kinds of information > which cannot reasonably be predicted in advance. It is an extension > mechanism. It is not something that should be used for core > functionality. Among other problems it relegates pretty much > everything to an object, so type checking is significantly weakened. > Of course the actual classes vary a lot from parser to parser so the > code isn't really all that portable. In the case of SAX properties, > you can even see cases where different parsers provide the same > features and properties but use different names for them. MOE [1] is designed to support such functionality through a general Annotations object, which I hope to use for PSVI, CSS, and other decoration. I'm not sure how that kind of Annotation could be integrated with the SAX2 API, however. I'd support a more direct approach to adding typing information to SAX (3.0?), but I suspect it would be at least as dramatic an API change as adding namespaces, and likely more intrusive. [1] - http://moe.sourceforge.net [1] - http://moe.sourceforge.net -- Simon St.Laurent Ring around the content, a pocket full of brackets Errors, errors, all fall down! http://simonstl.com |