Re: [Ringing-lib-discussion] XML Format

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Gary Howard wrote:

> Richard Smith wrote:
>
> >I've put a suitable XLink schema here:
> >
> >  http://www.ex-parrot.com/~richard/schemas/xlink.xsd
> >
> >Can you try using that one and tell us whether you still
> >have problems?
>
> The problem goes away.

OK.  I've updated the schema to explicitly refer to this
XLink schema (and have put a copy of it on the website
rather than relying on my website).

> >>>Finding a standard schema for xlink was not easy either
> >>>so I would question its usefulness overall but that's a
> >>>debate for later as I have other more pressing comments.
> >>>
> >>>
> >There's a good reason for this.  XLink does not (currently)
> >have a normative schema, but I don't see why this should be
> >an issue -- it's easy enough to provide one.
> >
> >
> Then why is the one that I found different? It ain't that easy to match
> the XLink "standard".

As I said, there is no normative schema, and there are
dozens of ways of expressing different concepts depending
on how reusable and how strict you want it to be.

> >Having said that, I'm not one of XLink's greatest fans, and
> >if you have an alternative suggestion, I'd be interested to
> >hear it.
>
> I'll be thinking about it. My first thoughts were to have links to
> Dove/Felstead etc. for the relevant info.

That sort of thing is one of the reasons we want a linking
mechanism.  However, I think we should also have a way of
including the data inline in the XML.

> >Martin has already responded to this, so I'm not going to,
> >except to say that all our search script currently puts in
> >the <meta> elt is a database timestamp.  This is something
> >for which I can easily imagine wanting to query the
> >database.  If I have a local (perhaps off-line) database, I
> >might want to regularly sync this to the server database.
> >Downloading just those methods changed since my previous
> >snapshot was created is an obvious way of doing this.
>
> See my comments to Martin but we're still in the database-centric view.
> A method definition schema
> should not be expected to support database synchronisation.

And ours doesn't explicitly.  The method schema (you have
read it, haven't you?) doesn't make *any* mention of
database time stamps.  It simply provides a point for
extension.  One way that we have chosen to extend it in
the results provided by our database is with a database
timestamp.  That doesn't mean that anyone else needs to do
the same.  And that is why the time stamp is in the database
namespace, not the methods namespace.

> To perform
> this task you should create
> a synchronisation document with timestamps etc. and include method
> definitions where required.

But that is *precisely* what we *have* done.

> Even so, for the example cited above, just a list of methods (no
> annotations) matching the criterion of
> being newer that a supplied timestamp would suffice.

And the schema allows you to do this if that's what you want
to do.  And so does our database application.

> E.g. the HTTP
> request "get if newer" (or what
> ever it is) will have an HTML document returned if there is a newer one
> but the document itself does
> not contain a timestamp to say it is newer: why should method
> definitions be any different?

The HTTP 1.1 If-Modified-Since header should be used to
conditionally return the *whole* document if the *whole*
document has been modified since the a given date.  It isn't
designed as a way of extracting a subset of a document that
has been modified since a given date, and if you tried to do
this it would seriously fuck up the operation of web caches.

Getting our script to respect this header (and related ones)
may be advantageous, but I think we have far more important
things to implement.

> Separation of concerns again.
>
> >And if you don't like having this in the output, you can
> >always set the 'fields' parameter to specify which fields
> >you want.
> >
> >  http://methods.ringing.org/query.html#fields
> >
> >(Thinking about it, we might want to add a way of saying all
> >fields except those in a given list.)
> >
> >
> See comments to Martin. I think you're both missing the point with field
> selection. That's a SQL
> thing where all data has been flattened into a table:

Not at all.  The fields selection is not *just* a simple
list of fields to be (although it can be, and usually is).
As I said in my reply to your earlier mail, it is an XPath
filter that allows pretty much any part of the whole
conceptual XML document to be included.  Yes, in many cases
this is as simple as a list of fields from the database, but
it needn't be.

> CSV would be just a good in this case.

There are many reasons why this wouldn't be satisfactory,
but as I doubt you're seriously suggesting using CSV, I
won't go into them here.

> I want to see some structure in the results from an object-oriented
> viewpoint (this is after all
> a discussion list about a C++ library).

The fact that this is the discussion list for a C++ library
is largely irrelevant.  We're simply using this list as a
convenient place to discuss the methods schema.  Almost none
of the methods database is currently implemented in C++,
though we have provided a C++ client library to access it.

As to structure in the results, I think there's plenty of
structure, and insofar as its meaningful to bandy around
terms like "object-orientation" without any context, I think
we've gone quite some distance in that direction.  Take a
look at the abstract class and ref elements and the way in
which third parties are encouraged to extend these.

> >Yes.  Basically we had a choice.  Either we could describe
> >the <method> element as an <xsd:sequence>, in which case we
> >would have been allowed to put elements from other
> >namespaces there.  Similarly we could have put the
> >performances and classification data directly there.  (They
> >can't be in the current schema as they can occur multiple
> >times.)  The cost of this flexibility is that we would have
> >had to specify an order for all of these elements, and the
> >XML would only have been valid if these elements were in the
> >right order.
>
> Nothing wrong with that.

Maybe not seriously wrong, no, but it's hardly desireable,
is it?  And personally, I think the container classes
required by the alternative approach provide better
separation of the separate, related concepts.  (You can
perhaps argue that refs and classifications are the same,
but this is a relatively trivial gripe.)

> >The reason for this, as Martin says, is that an
> ><xsd:sequence> is effectively taken as a grammar for a
> >regular language, and keeping track of the number of times
> >elements have occured rapidly becomes extremely difficult.
> >(The schema requried grows combinatorially with the number
> >of elements.)
>
> I don't understand the combinatorial explosion. A sequence gives an
> "approved" order to elements
> which may be optional - what's the problem with that?

Try writing an XML schema for an element that can have in
any order, upto 1 <foo> child, and an arbitrary number of
<bar> children.  Fairly easy?  Now try upto 1 each of <foo>,
<bar> and <baz> as children, plus an arbitrary number of
<quux> children elements.  Quite a lot more complicated.
Now do you see?

> >The alternative, which is what we've decided to do, is to
> >describe it with an <xsd:all> element which allows child
> >elements to occur at most once.  It also does not allow
> >elements from other namespaces.  We get around these
> >restrictions by having container elements, such as the
> ><meta>, <refs>, <performances> and <classification>
> >elements.  I think we felt that this was preferable to having
> >an arbitrary order in which the child elements had to occur.
>
> On the other hand you now have arbitrary flexibility which can also
> cause problems.

Of course it *can*.  If you intend to design elements to
extend the schema, then it's your job not to fuck it up.
It's really not very difficult to do correctly.  Look, say,
at the <cc-class> or the <rwref> elements which "extend",
respectively, the <class> and the <ref> abstract elements.

> No DTDs, we're using XML Schema after all! The DTD equivalent is the
> public id and they are
> used in exactly the manner I described (see the different versions of
> HTML all distinguished by
> the public id in the DOCTYPE declaration).

Yeah.  Distinguishing versions of a document by different
public ids in the DTD is fine: it causes no problems and can
be very convenient.  Doing the same with namespace names
does cause problems -- see my previous email.

> > This helps, but so long as you keep the XML Schema documents
> >backwards compatibile (which should be easy as the
> >conceptual schemas need to be backwards compatible), this is
> >a non-issue.  Always using the most recent schema for the
> >namespace might result in lots of unnecessary schema for
> >unknown elements, but should otherwise be fine.
> >
> >Finally, it should be remembered that in many real-world
> >applications, you don't actually use the schema -- it's a
> >simply a piece of documentation on what is allowed in the
> >XML.  The parser presumably already knows this.
> >
> >
> Depends on your parser and your document validation policy. The point
> being that the standard
> should help in a standard way those who wish to process multiple
> versions using standard
> techniques like XML Catalog. Those who wish to ignore validation do so
> at their own risk: their
> parser may or may not cope but that's no fault of the schemas.

This is all true, but entirely irrelevant.

> >For a method name, there are three things
> >that I might want to convey in the XML:
> >
> > - the method is named, but has the null name (i.e. it is
> >   Little Bob);
> >
> > - according to the database, the method is unnamed; or
> >
> > - it is unspecified whether the method is named.
[...]
> I'm still thinking about this one. Can you give me an example of the
> third case and how it is
> distinct from case 2?

Yeah.  A computer program might generate a list of methods
that are true substitutes for Yorkshire as the first lead
of Smith's 23.  It might very well output this using this
XML format.  (And indeed, the program I ususally use for
such tasks does exactly that.)  It might not have ready
access to whether or not a method has named, or it might not
think it relevant, in which case you have case 3.

A second example: you request from our database a list of
methods with some properties, but, by using a fields option,
you tell it not to include method names.  From the point of
view of a consumer of this document, this is case 3.

By contrast, case 2 arises when the program (from the former
example) or the database (in the latter example) does output
information about the name of the method.  If it knows
(having recently synced with the MC website) that the method
is unnamed, you have case 2.

RAS