Re: [Ringing-lib-discussion] XML Format

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Gary Howard wrote:
> Martin Bright wrote:
>
> >One thing to bear in mind with all of this is that the schema is meant
> >to be of use to any applications exchanging method data, not just for
> >our database.
>
> This is the crux of some of my gripes: there's a lot that's geared to
> your database.

I think you need to be careful to distinguish our core
method schema (the http://methods.ringing.org/NS/method
namespace) from the additional stuff that is specific to our
database (in the http://methods.ringing.org/NS/database
namespace).  This separation into two namespaces is
precisely to avoid the specifics of our database impinging
on the design of the core methods schema.  This means that
you can use the core schema by itself, or in conjunction
with our extra database-specific schema, or, indeed, with
some other schema(s) suited to your own needs.

Look, for example, in the schema itself

  http://methods.ringing.org/method.xsd

or in the following example

  http://methods.ringing.org/method.xsd.txt

and you'll see there is no mention of anything from the
database namespace.

> >I think maybe the fault here lies with whatever schema you're using for
> >XLink -- that should be a valid attribute declaration.
> >
> >Anyway, the whole XLink thing is a bit messy and probably not the best
> >way of doing it.  I wanted to be able to have performances either
> >inline, or referred to in another document.  If you can think of a
> >better way to do it then we'd be pleased to hear it.
>
> I got mine from here: http://schemas.opengis.net/gml/2.1.2/xlinks.xsd
> I get the impression that these guys are quite hooked on xlink.

They do seem to be, but that doesn't in anyway make the
schema "official".  The schema I suggested came from another
W3 specification -- the XForms specification -- but again,
that doesn't make it offical either.  Clearly the fact that
one validates our example methods XML, whilst the other
doesn't, is, on the face of it, a little worrying.

However, the problem isn't with our schema, or, for that
matter, either of the XLink schemas: it's in the interaction
between them.  Your XLink schema does not define the
xlink:type attribute *except* within various named
attributeGroups.  This means that another schema using
xlink:type attributes must reference these attributeGroups
by name.  This would be fine if we'd written our schema to
work with this specific xlink schema, however we haven't:
we've written in to work with the one I quoted.

It might be worth changing this as the schema you quote does
seem better in some ways.  In any event, we should
definitely import the XLink schema from a specific location
as it's clear that just any XLink schema will not do.

> >I think what you're missing here is that IDs are very handy when you
> >want to refer to a particular element from the outside.  So maybe I want
> >to put up a collection of my new unrung cyclic Royal principles on my
> >web site; then you can refer to one of them as something like
> >  http;//martins.web.site/cyclicroyal.xml#r3xx .
> >
> >The facts that the IDs are unique within the database, and that they're
> >consistent from one document to another, are not relevant to the XML
> >schema; but they do guarantee that they're also unique within any XML
> >document that the database produces, which is what IDs are about.
> >
> >
> I don't believe the XML standard says anything about IDs being
> consistent or unique across documents.

Indeed not.  It says they're unique within a particular
document.  But if they're unique across a set of documents,
then they're unique in any one of those documents.

However, this is getting away from the core point.  Our
general-purpose XML schema only says that a method *may*
have an ID (it needn't) and that if it does, it must behave
as an XML ID should -- i.e. it must be unique within that
particular document.

In our database application, we have decided that it is
advantageous to extend XML's guarantee and say the ID will
be unique for *any* method in our database.  This does not
mean you need to take advantage of this extra guarantee, nor
does it mean an application you write needs to offer this
additional guarantee.

The main purpose of ID attributes is to make fragment
locators work (the #m1234 you might see on the end of a
URL).  If you want a canonical reference number for a
method, it is better to use one of fields from the the
<refs> container element.  (This will, for example, contain
methods' numbers in various CC collections.)

[...]

> Arbitrary IDs are fine: so long as they can be referenced within the
> document by an IDREF.

Surely any ID can be referenced by an IDREF?

> I don't object to the ID coming from the database; I object to the
> notion that it is unique across documents.

Why on earth should you object to that?  No one is forcing
*you* to make *your* IDs unique accross documents.  We've
chosen to, as we feel it might be useful for our users;
you're free not to if you'd rather not.

> It is a generic attribute (many applications may need to reference
> method definitions with IDs) but they
> may not be able to guarantee uniqueness outide the document; why should
> your database have special privileges?

Because our database *is* able to make this guarantee and we
feel it's useful for us to do so.

> >[...] I think that being able to add extra information using
> >other namespaces is one of the things that XML is very good at.  It's
> >done all the time in the various W3C applications:  take the xsi:type
> >attribute, for instance, or the xlink: attributes, or the xsl:version
> >attribute in XSL.
> >
> >
> True, but I think it can be done more elegantly.

Would you like to give a couple of specific examples?

> >>I would also propose that the version attribute of the <methods> tag
> >>should be eliminated and version idetification be incorporated into
> >>the namespace URI.
> >>
> >This is definitely a bad idea.  It kills any chance of backwards
> >compatibility:  a version 1 parser has no chance at all of reading a
> >version 2 document.
> >
> Rubbish! I imagine it would be very easy to construst a multi-version
> parser. For example: schema version 1
> recognises methods in the form:
> <methods xmlns="urn:cccbr-org-uk:schemas/methods/1.0">
> ...
> </methods>
> Sometime later version 2 is released and documents now look like this:
> <methods xmlns="urn:cccbr-org-uk:schemas/methods/2.0">
> ...
> </methods>
> It is clear which schema is required by which document and many parsers
> today will handle this simply
> within their entity resolver.
> Where do you see a problem?

With your suggestion, the name of every single element
changes.  Yes: that is what I mean.  The name of an element
is a pair of namespace name (*not* prefix), which in your
suggestion will change with every (major?) new methods
schema, and the NCName of the element itself.  Anyone using
modern namespace-aware parsing tools will find this an
almighty pain.  Just for example, every XSL template will
break when a new schema is used, even if the template was
written in such a way as to be robust against new, unknown
elements.  Likewise any new tools acting on a version 2
file will fail to parse a version 1 file even if the schemas
are backwards compatible (as they really ought to be).

There's a very good reason why, in recent years, almost
every major XML-based technology has gone the version-number
route rather than the new-namespace route.

[re xsi:nil values:]

> I'm beginning to see the problem: you are seeing the XML from a
> database-centric point of view and
> are treating the XML representation as a database result set which can
> have a variable number of
> selectable fields.

I don't really think this is true.  If it's datbase-centric
to distinguish between knowledge of absence and absence of
knowledge, then yes, we're taking a database-centric
attitude.  But that's as far as it goes.  I for one think
it's critically important that we can distinguish between
the following three statements:

  - "this method is unnamed";
  - "this method is named Little Bob"; and
  - "this method may or may not be named".

If that's database-centric, so be it.

> This is a choice that is very specific to this
> application and I would contest has the
> danger of making the resultant method definitions useless to another
> application.

How might this happen?  Can you give me an example of how
having the ability (but not the compulsion) to distinguish
the above cases could possibly make the "resultant
definitions useless to another application"?

> I believe your application would function just as well with a generic
> SQL to XML format (such as generated
> by Oracle tools):
> <resultset>
>    <record id="m123">
>       <field name="methodname">Yorkshire</field>
>       <field name="stage">8</field>
>   ...
>    </record>
> </resultset>

Well, we *could* have done this, but for human consumption,
I think I'd rather use a dataset that looked more like the
one we've produced.

> In fact I think this fits the nature of queries where fields are
> selectable by name much better.

Why?  Conceptually, the list of fields is really just a
collection of XPath filters applied to a complete document
to determine which bits to return.  (In fact, we optimise
the process so that we don't generate unwanted data, but the
principle is the same.)

RAS