From: Richard S. <ri...@ex...> - 2005-05-20 15:58:57
|
Martin Bright wrote: > > [Error] method.xsd:78:36: InvalidRegex: Pattern value > > '(([-xX]|[A-HJ-NP-WYZa-hj-np-wyz0-9]+)\.?)*' is not a valid regular > > expression. The reported error was: ''-' is an invalid character range. > > Write '\-'.'. > > Many regex parsers would allow this. The relevant standard (Appendix > F.1 of XML Schema Part 2) is contradictory: > > > * The [, ], - and \ characters are not valid character ranges; I'm inclined to assume that the inclusion of '-' in this list is a mistake. Given the third bullet point, it simply doesn't make sense. I've emailed the relevant W3C mailing list: let's see if they can clarify things. > > * The ^ character is only valid at the beginning of a =B7positive > > character group=B7 if it is part of a =B7negative character gro= up=B7 > > * The - character is a valid character range only at the > > beginning or end of a =B7positive character group=B7. > > I suppose there's no harm in putting the backslash in to make it work. Agreed. I've now done this. > > [Error] method.xsd:150:46: src-resolve: Cannot resolve the name > > 'xlink:type' to a(n) 'attribute declaration' component. > > [Error] method.xsd:150:46: s4s-elt-invalid-content.1: The content of > > 'linkedPerformanceType' is invalid. Element 'attribute' is invalid, > > misplaced, or occurs too often. > > I think maybe the fault here lies with whatever schema you're using for > XLink -- that should be a valid attribute declaration. I've put a suitable XLink schema here: http://www.ex-parrot.com/~richard/schemas/xlink.xsd Can you try using that one and tell us whether you still have problems? > > Finding a standard schema for xlink was not easy either > > so I would question its usefulness overall but that's a > > debate for later as I have other more pressing comments. There's a good reason for this. XLink does not (currently) have a normative schema, but I don't see why this should be an issue -- it's easy enough to provide one. Having said that, I'm not one of XLink's greatest fans, and if you have an alternative suggestion, I'd be interested to hear it. > > Wow, so many namespaces! On further anaylsis, those on the <method> tag > > are redundant > > Yes, I know. It's an issue with DBIx::XMLServer. It's quite a long way > down the list of priorities to fix, though, because the extra > declarations make no semantic difference to the document. Sorting this is complicated. Whilst libxml2 has a clean_namespaces() method, this only removes duplicate namespace declarations, not unused ones. As there are no duplicate namespaces, this doesn't help. (Finding out which namespace declarations are unused is a difficult problem. How do tell, in general, whether a text node or an attribute value contains a QName or something else sensitive to namespace bindings, such as an XPath expression?) That said, we should aim to sort this out eventually. [Martin: The sql namespace can probably be suppressesd with an exclude-result-prefixes attribute in xmlout.xsl.] > > The second point is one of separation of concerns: the id attribute you > > propose > > relates to a reference in your database; it is not a fundamental > > attribute of a method. > > No, but historically it's been common for almost everything in any XML > document to have an ID attribute which contains some arbitrary ID. This > is an exception to the general rule that irrelevant data should be put > in a different namespace. Martin's comment has just reminded me of the W3C's xml:id specification. http://www.w3.org/TR/xml-id/ The suggestion is that future schemas should put IDs in the xml namespace. At the moment I think doing this will break more than it fixes, but we may wish to revisit this decision in the future. (In particular, the ability of parsers to correctly access fragment identifiers -- i.e. #mXXXX on a URL -- might break.) > > Having suggested this, I don't particularly like the <meta> tag being > > part of a method > > definition either. It is noise; it doesn't contain any information I > > would want to > > query a method database for. Martin has already responded to this, so I'm not going to, except to say that all our search script currently puts in the <meta> elt is a database timestamp. This is something for which I can easily imagine wanting to query the database. If I have a local (perhaps off-line) database, I might want to regularly sync this to the server database. Downloading just those methods changed since my previous snapshot was created is an obvious way of doing this. And if you don't like having this in the output, you can always set the 'fields' parameter to specify which fields you want. http://methods.ringing.org/query.html#fields (Thinking about it, we might want to add a way of saying all fields except those in a given list.) > Instead of the <meta> element, we would have like to allow arbitrary > elements from other namespaces as direct children of the <method> > element. But it seems that XML Schema won't allow you to specify this - > something to do with being a regular language. RAS or DFM can no doubt > expand. Yes. Basically we had a choice. Either we could describe the <method> element as an <xsd:sequence>, in which case we would have been allowed to put elements from other namespaces there. Similarly we could have put the performances and classification data directly there. (They can't be in the current schema as they can occur multiple times.) The cost of this flexibility is that we would have had to specify an order for all of these elements, and the XML would only have been valid if these elements were in the right order. The reason for this, as Martin says, is that an <xsd:sequence> is effectively taken as a grammar for a regular language, and keeping track of the number of times elements have occured rapidly becomes extremely difficult. (The schema requried grows combinatorially with the number of elements.) The alternative, which is what we've decided to do, is to describe it with an <xsd:all> element which allows child elements to occur at most once. It also does not allow elements from other namespaces. We get around these restrictions by having container elements, such as the <meta>, <refs>, <performances> and <classification> elements. I think we felt that this was preferable to having an arbitrary order in which the child elements had to occur. > > I would also propose that the version attribute of the <methods> tag > > should be > > eliminated and version idetification be incorporated into the namespace > > URI. This > > would make it very much easier to build a system that can parse either > > version 1 > > or version 2 documents using standard entity resolver parsing technique= s. Namespaces aren't referenced via entity references so I don't see how this is relevant. (Or are you suggesting a DTD that adds an implicit namespace declaration on the root entity? If so, I think this would be a very bad idea.) I assume you're referring to the XML Catalog-like techniques that can be used to select a schema for a namespace. This helps, but so long as you keep the XML Schema documents backwards compatibile (which should be easy as the conceptual schemas need to be backwards compatible), this is a non-issue. Always using the most recent schema for the namespace might result in lots of unnecessary schema for unknown elements, but should otherwise be fine. Finally, it should be remembered that in many real-world applications, you don't actually use the schema -- it's a simply a piece of documentation on what is allowed in the XML. The parser presumably already knows this. > There's a big difference between not telling you > the date of the first peal and telling you that it definitely hasn't > been pealed. Of course, we're not actually saying it definitely hasn't been pealed, we're saying that our database has no knowledge of it being pealed. This, of course, is not an absolute statement, but is dependent on when the database was last updated (and our ability to update and query the database without cocking up, but let's ignore that). Given that, it *might* make more sense to use an attribute in the db namespace, though given xsi:nil exists for exactly this sort of thing, it's sensible to use it. (We might want to add a global timestamp to signify when our database was synced against its upstream data source (currently the MC website).) Thinking further, we *are* inconsistent in our use of xsi:nil, as method names are handled differently from anything else. For a method name, there are three things that I might want to convey in the XML: - the method is named, but has the null name (i.e. it is Little Bob); - according to the database, the method is unnamed; or - it is unspecified whether the method is named. Currently, we use xsi:nil for the former case, ignore the second case (we have no unnamed methods in the database at the moment), and omit the element in the latter case. Really we should be able to distinguish all three cases. RAS |