From: Gary H. <how...@nt...> - 2005-05-20 00:07:44
|
Hello there. The recent posting to change-ringers about methods.ringing.org has prompted me to join fray regarding the use of XML in computational campanology. My personal bias is toward the use of Java (and my Java ringing class library) and XSLT processing of methods data. A quick email to Martin Bright came up with the suggestion that I get things going here. So here goes... Using the Xerces Java parser (version 2.6.2) and the current method.xsd schema with full validation turned on I get the following schema errors: [Error] method.xsd:78:36: InvalidRegex: Pattern value '(([-xX]|[A-HJ-NP-WYZa-hj-np-wyz0-9]+)\.?)*' is not a valid regular expression. The reported error was: ''-' is an invalid character range. Write '\-'.'. [Error] method.xsd:150:46: src-resolve: Cannot resolve the name 'xlink:type' to a(n) 'attribute declaration' component. [Error] method.xsd:150:46: s4s-elt-invalid-content.1: The content of 'linkedPerformanceType' is invalid. Element 'attribute' is invalid, misplaced, or occurs too often. Which can be resolved by modifying the following elements: <simpleType name="pnType"> <restriction base="xsd:string"> <pattern value="(([\-xX]|[A-HJ-NP-WYZa-hj-np-wyz0-9]+)\.?)*"/> ^add this escape character </restriction> </simpleType> <complexType name="linkedPerformanceType"> <complexContent> <extension base="m:performanceType"> <attribute ref="xlink:show" default="none"/> ^replace path with a known xlink reference </extension> </complexContent> </complexType> This second case is a bit of a fudge just to get rid of any schema errors. It's not clear what the xlink:type is trying to achieve either. Finding a standard schema for xlink was not easy either so I would question its usefulness overall but that's a debate for later as I have other more pressing comments. Running a query returns a document with the following structure: <methods xmlns="http://methods.ringing.org/NS/method" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:db="http://methods.ringing.org/NS/database" version="0.1" db:page="0" db:pagesize="100" db:rows="7"> <method xmlns:a="http://methods.ringing.org/NS/method" xmlns:default="http://methods.ringing.org/NS/method" xmlns:sql="http://boojum.org.uk/NS/XMLServer" id="m15469"> <name>Yorkshire</name> ... <meta> <db:timestamp>2005-05-12T13:24:54</db:timestamp> </meta> </method> .... Wow, so many namespaces! On further anaylsis, those on the <method> tag are redundant as the document will parse without them. However I believe the use of the id attribute is incorrect. The supporting text for the schema states: "This contains a unique ID for the method within the database." My understanding of XML ID elements (which pre-date XML schema) is supported by the following quote from the XML Schema definition: "the scope of an ID is fixed to be the whole document". My first point here is that if you wish to have a unique database reference it should not be of type ID, simply a number or whatever as its scope goes beyond the document generated as the results of a query. I have used IDs (and corresponding IDREFs) in those cases where a relationship is required to be expressed between two elements in an XML document that doesn't fall into the standard hierarchical model provided by the document (e.g. networks of nodes and the links between them). In a ringing context I could see them being used in a document that contains method definitions and touches. Several touches could "point" to the same method definition (via an ID) using an IDREF. In such a case (as in a network definition) the actual value of the ID is irrelevant and can be generated on the fly for that document instance. The only constraint being that all IDs are unique within that document. For me the rule of thumb is: don't use an ID if you don't have a corresponding IDREF to refer to it. The second point is one of separation of concerns: the id attribute you propose relates to a reference in your database; it is not a fundamental attribute of a method. I therefore believe it should be replaced by a more general mechanism allowing a method definition to include an annotation for a database-specific id for example within the <meta> tag. Having suggested this, I don't particularly like the <meta> tag being part of a method definition either. It is noise; it doesn't contain any information I would want to query a method database for. In a similar way, the <methods> tag as it currently stands also has "noisy" attributes with the db: namespace.prefix. I think a mechanism for allowing a list of methods is required but it is more general and should not be encumbered with attributes for one specific query mechanism. This would allow other sites to provide lists of methods in a standard way. If you feel that query information is necessary then it should be provided by a separate wrapper tag: <db:results xmlns:db="http://methods.ringing.org/NS/database" db:page="0" db:pagesize="100" db:rows="7"> <methods xmlns="urn:cccbr-org-uk:methods-1.0"> <method> ... </method> </methods> </db:results> I would also propose that the version attribute of the <methods> tag should be eliminated and version idetification be incorporated into the namespace URI. This would make it very much easier to build a system that can parse either version 1 or version 2 documents using standard entity resolver parsing techniques. Over the years my preference has eveolved towards the use of URNs as URIs and not URLs as with the latter parsers have been known to hit the named site when trying to resolve references. Make it clear that they are just a formal naming convention. The example above brings some of these suggestions together. The schema currently allows "nillable" fields. E.g. "<firsthand xsi:nil="true"/> indicates that the method has never been rung to a peal on handbells." I have never needed to use this feature as the three common states for a field are easily covered in the following much simpler syntax: <tag>value</tag> - a value is present for "tag" <tag></tag> or </tag> - an empty value exists for "tag" (e.g.zero length string) no tag element - "tag" does not exist (i.e. is nil) This keeps the generated document much simpler to read for humans and to parse in programs: there is no need to declare the xsi namespace (which clutters the document); absent fields are just that, absent, keeping the document readable; and when parsing something like <firsthand xsi:nil="true"/> I would have to write extra code in a (SAX) parser to spot the xsi:nil attribute as a special case and process the tag in a completely different manner to what I would do if it read <firsthand date="2005-05-19"/>. For me, this last case is the most compelling reason for not using nillable fields. Also I have just done some expreiments with XPath expressions in an XSLT stylesheet: where I selected all <firsthand> nodes for processing [e.g. method/preformances/firsthand] and found the results included those with xsi:nil="true" as well as those without. This means extra coding would have to be added to stylesheets if nil fields were to be ignored and it would have to be done for each nillable field. I think the simple absence of a tag indicating a nil value is much more intuitive and simpler to process. Here endeth the first lesson. I hope I haven't preached too much but I fear another sermon is likely to follow regarding some aspects of method definition but I wanted to get the discussion going on these aspects first. Regards, Gary Howard |