From: Richard S. <ri...@ex...> - 2005-05-22 23:12:31
|
Gary Howard wrote: > Richard Smith wrote: > > >I've put a suitable XLink schema here: > > > > http://www.ex-parrot.com/~richard/schemas/xlink.xsd > > > >Can you try using that one and tell us whether you still > >have problems? > > The problem goes away. OK. I've updated the schema to explicitly refer to this XLink schema (and have put a copy of it on the website rather than relying on my website). > >>>Finding a standard schema for xlink was not easy either > >>>so I would question its usefulness overall but that's a > >>>debate for later as I have other more pressing comments. > >>> > >>> > >There's a good reason for this. XLink does not (currently) > >have a normative schema, but I don't see why this should be > >an issue -- it's easy enough to provide one. > > > > > Then why is the one that I found different? It ain't that easy to match > the XLink "standard". As I said, there is no normative schema, and there are dozens of ways of expressing different concepts depending on how reusable and how strict you want it to be. > >Having said that, I'm not one of XLink's greatest fans, and > >if you have an alternative suggestion, I'd be interested to > >hear it. > > I'll be thinking about it. My first thoughts were to have links to > Dove/Felstead etc. for the relevant info. That sort of thing is one of the reasons we want a linking mechanism. However, I think we should also have a way of including the data inline in the XML. > >Martin has already responded to this, so I'm not going to, > >except to say that all our search script currently puts in > >the <meta> elt is a database timestamp. This is something > >for which I can easily imagine wanting to query the > >database. If I have a local (perhaps off-line) database, I > >might want to regularly sync this to the server database. > >Downloading just those methods changed since my previous > >snapshot was created is an obvious way of doing this. > > See my comments to Martin but we're still in the database-centric view. > A method definition schema > should not be expected to support database synchronisation. And ours doesn't explicitly. The method schema (you have read it, haven't you?) doesn't make *any* mention of database time stamps. It simply provides a point for extension. One way that we have chosen to extend it in the results provided by our database is with a database timestamp. That doesn't mean that anyone else needs to do the same. And that is why the time stamp is in the database namespace, not the methods namespace. > To perform > this task you should create > a synchronisation document with timestamps etc. and include method > definitions where required. But that is *precisely* what we *have* done. > Even so, for the example cited above, just a list of methods (no > annotations) matching the criterion of > being newer that a supplied timestamp would suffice. And the schema allows you to do this if that's what you want to do. And so does our database application. > E.g. the HTTP > request "get if newer" (or what > ever it is) will have an HTML document returned if there is a newer one > but the document itself does > not contain a timestamp to say it is newer: why should method > definitions be any different? The HTTP 1.1 If-Modified-Since header should be used to conditionally return the *whole* document if the *whole* document has been modified since the a given date. It isn't designed as a way of extracting a subset of a document that has been modified since a given date, and if you tried to do this it would seriously fuck up the operation of web caches. Getting our script to respect this header (and related ones) may be advantageous, but I think we have far more important things to implement. > Separation of concerns again. > > >And if you don't like having this in the output, you can > >always set the 'fields' parameter to specify which fields > >you want. > > > > http://methods.ringing.org/query.html#fields > > > >(Thinking about it, we might want to add a way of saying all > >fields except those in a given list.) > > > > > See comments to Martin. I think you're both missing the point with field > selection. That's a SQL > thing where all data has been flattened into a table: Not at all. The fields selection is not *just* a simple list of fields to be (although it can be, and usually is). As I said in my reply to your earlier mail, it is an XPath filter that allows pretty much any part of the whole conceptual XML document to be included. Yes, in many cases this is as simple as a list of fields from the database, but it needn't be. > CSV would be just a good in this case. There are many reasons why this wouldn't be satisfactory, but as I doubt you're seriously suggesting using CSV, I won't go into them here. > I want to see some structure in the results from an object-oriented > viewpoint (this is after all > a discussion list about a C++ library). The fact that this is the discussion list for a C++ library is largely irrelevant. We're simply using this list as a convenient place to discuss the methods schema. Almost none of the methods database is currently implemented in C++, though we have provided a C++ client library to access it. As to structure in the results, I think there's plenty of structure, and insofar as its meaningful to bandy around terms like "object-orientation" without any context, I think we've gone quite some distance in that direction. Take a look at the abstract class and ref elements and the way in which third parties are encouraged to extend these. > >Yes. Basically we had a choice. Either we could describe > >the <method> element as an <xsd:sequence>, in which case we > >would have been allowed to put elements from other > >namespaces there. Similarly we could have put the > >performances and classification data directly there. (They > >can't be in the current schema as they can occur multiple > >times.) The cost of this flexibility is that we would have > >had to specify an order for all of these elements, and the > >XML would only have been valid if these elements were in the > >right order. > > Nothing wrong with that. Maybe not seriously wrong, no, but it's hardly desireable, is it? And personally, I think the container classes required by the alternative approach provide better separation of the separate, related concepts. (You can perhaps argue that refs and classifications are the same, but this is a relatively trivial gripe.) > >The reason for this, as Martin says, is that an > ><xsd:sequence> is effectively taken as a grammar for a > >regular language, and keeping track of the number of times > >elements have occured rapidly becomes extremely difficult. > >(The schema requried grows combinatorially with the number > >of elements.) > > I don't understand the combinatorial explosion. A sequence gives an > "approved" order to elements > which may be optional - what's the problem with that? Try writing an XML schema for an element that can have in any order, upto 1 <foo> child, and an arbitrary number of <bar> children. Fairly easy? Now try upto 1 each of <foo>, <bar> and <baz> as children, plus an arbitrary number of <quux> children elements. Quite a lot more complicated. Now do you see? > >The alternative, which is what we've decided to do, is to > >describe it with an <xsd:all> element which allows child > >elements to occur at most once. It also does not allow > >elements from other namespaces. We get around these > >restrictions by having container elements, such as the > ><meta>, <refs>, <performances> and <classification> > >elements. I think we felt that this was preferable to having > >an arbitrary order in which the child elements had to occur. > > On the other hand you now have arbitrary flexibility which can also > cause problems. Of course it *can*. If you intend to design elements to extend the schema, then it's your job not to fuck it up. It's really not very difficult to do correctly. Look, say, at the <cc-class> or the <rwref> elements which "extend", respectively, the <class> and the <ref> abstract elements. > No DTDs, we're using XML Schema after all! The DTD equivalent is the > public id and they are > used in exactly the manner I described (see the different versions of > HTML all distinguished by > the public id in the DOCTYPE declaration). Yeah. Distinguishing versions of a document by different public ids in the DTD is fine: it causes no problems and can be very convenient. Doing the same with namespace names does cause problems -- see my previous email. > > This helps, but so long as you keep the XML Schema documents > >backwards compatibile (which should be easy as the > >conceptual schemas need to be backwards compatible), this is > >a non-issue. Always using the most recent schema for the > >namespace might result in lots of unnecessary schema for > >unknown elements, but should otherwise be fine. > > > >Finally, it should be remembered that in many real-world > >applications, you don't actually use the schema -- it's a > >simply a piece of documentation on what is allowed in the > >XML. The parser presumably already knows this. > > > > > Depends on your parser and your document validation policy. The point > being that the standard > should help in a standard way those who wish to process multiple > versions using standard > techniques like XML Catalog. Those who wish to ignore validation do so > at their own risk: their > parser may or may not cope but that's no fault of the schemas. This is all true, but entirely irrelevant. > >For a method name, there are three things > >that I might want to convey in the XML: > > > > - the method is named, but has the null name (i.e. it is > > Little Bob); > > > > - according to the database, the method is unnamed; or > > > > - it is unspecified whether the method is named. [...] > I'm still thinking about this one. Can you give me an example of the > third case and how it is > distinct from case 2? Yeah. A computer program might generate a list of methods that are true substitutes for Yorkshire as the first lead of Smith's 23. It might very well output this using this XML format. (And indeed, the program I ususally use for such tasks does exactly that.) It might not have ready access to whether or not a method has named, or it might not think it relevant, in which case you have case 3. A second example: you request from our database a list of methods with some properties, but, by using a fields option, you tell it not to include method names. From the point of view of a consumer of this document, this is case 3. By contrast, case 2 arises when the program (from the former example) or the database (in the latter example) does output information about the name of the method. If it knows (having recently synced with the MC website) that the method is unnamed, you have case 2. RAS |