Re: [Ringing-lib-discussion] XML Format

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Richard Smith wrote:

>>I suppose there's no harm in putting the backslash in to make it work.
>>    
>>
>
>Agreed.  I've now done this.
>  
>
Thanks. I think it might be a function of the Java regexp processor. 
Personally I think it's clearer with
it in as it's obviously not part of a character range then.

>I've put a suitable XLink schema here:
>
>  http://www.ex-parrot.com/~richard/schemas/xlink.xsd
>
>Can you try using that one and tell us whether you still
>have problems?
>  
>
The problem goes away.

>>>Finding a standard schema for xlink was not easy either
>>>so I would question its usefulness overall but that's a
>>>debate for later as I have other more pressing comments.
>>>      
>>>
>There's a good reason for this.  XLink does not (currently)
>have a normative schema, but I don't see why this should be
>an issue -- it's easy enough to provide one.
>  
>
Then why is the one that I found different? It ain't that easy to match 
the XLink "standard".
BTW mine came from: http://schemas.opengis.net/gml/2.1.2/xlinks.xsd

>Having said that, I'm not one of XLink's greatest fans, and
>if you have an alternative suggestion, I'd be interested to
>hear it.
>  
>
I'll be thinking about it. My first thoughts were to have links to 
Dove/Felstead etc. for the relevant info.

>>>Wow, so many namespaces! On further anaylsis, those on the <method> tag
>>>are redundant
>>>      
>>>
>>Yes, I know.  It's an issue with DBIx::XMLServer.  It's quite a long way
>>down the list of priorities to fix, though, because the extra
>>de
>>
>That said, we should aim to sort this out eventually.
>  
>
I'll leave you to wrestle with your own libraries: the Java ones aren't 
always that obvious either.

>Martin has already responded to this, so I'm not going to,
>except to say that all our search script currently puts in
>the <meta> elt is a database timestamp.  This is something
>for which I can easily imagine wanting to query the
>database.  If I have a local (perhaps off-line) database, I
>might want to regularly sync this to the server database.
>Downloading just those methods changed since my previous
>snapshot was created is an obvious way of doing this.
>  
>
See my comments to Martin but we're still in the database-centric view. 
A method definition schema
should not be expected to support database synchronisation. To perform 
this task you should create
a synchronisation document with timestamps etc. and include method 
definitions where required.
Even so, for the example cited above, just a list of methods (no 
annotations) matching the criterion of
being newer that a supplied timestamp would suffice. E.g. the HTTP 
request "get if newer" (or what
ever it is) will have an HTML document returned if there is a newer one 
but the document itself does
not contain a timestamp to say it is newer: why should method 
definitions be any different?
Separation of concerns again.

>And if you don't like having this in the output, you can
>always set the 'fields' parameter to specify which fields
>you want.
>
>  http://methods.ringing.org/query.html#fields
>
>(Thinking about it, we might want to add a way of saying all
>fields except those in a given list.)
>  
>
See comments to Martin. I think you're both missing the point with field 
selection. That's a SQL
thing where all data has been flattened into a table: CSV would be just 
a good in this case.
I want to see some structure in the results from an object-oriented 
viewpoint (this is after all
a discussion list about a C++ library).

>Yes.  Basically we had a choice.  Either we could describe
>the <method> element as an <xsd:sequence>, in which case we
>would have been allowed to put elements from other
>namespaces there.  Similarly we could have put the
>performances and classification data directly there.  (They
>can't be in the current schema as they can occur multiple
>times.)  The cost of this flexibility is that we would have
>had to specify an order for all of these elements, and the
>XML would only have been valid if these elements were in the
>right order.
>  
>
Nothing wrong with that.

>The reason for this, as Martin says, is that an
><xsd:sequence> is effectively taken as a grammar for a
>regular language, and keeping track of the number of times
>elements have occured rapidly becomes extremely difficult.
>(The schema requried grows combinatorially with the number
>of elements.)
>  
>
I don't understand the combinatorial explosion. A sequence gives an 
"approved" order to elements
which may be optional - what's the problem with that?

>The alternative, which is what we've decided to do, is to
>describe it with an <xsd:all> element which allows child
>elements to occur at most once.  It also does not allow
>elements from other namespaces.  We get around these
>restrictions by having container elements, such as the
><meta>, <refs>, <performances> and <classification>
>elements.  I think we felt that this was preferable to having
>an arbitrary order in which the child elements had to occur.
>  
>
On the other hand you now have arbitrary flexibility which can also 
cause problems.

>Namespaces aren't referenced via entity references so I
>don't see how this is relevant.  (Or are you suggesting a
>DTD that adds an implicit namespace declaration on the root
>entity?  If so, I think this would be a very bad idea.)
>  
>
No DTDs, we're using XML Schema after all! The DTD equivalent is the 
public id and they are
used in exactly the manner I described (see the different versions of 
HTML all distinguished by
the public id in the DOCTYPE declaration).

>I assume you're referring to the XML Catalog-like techniques
>that can be used to select a schema for a namespace. 
>  
>
Yes.

> This helps, but so long as you keep the XML Schema documents
>backwards compatibile (which should be easy as the
>conceptual schemas need to be backwards compatible), this is
>a non-issue.  Always using the most recent schema for the
>namespace might result in lots of unnecessary schema for
>unknown elements, but should otherwise be fine.
>
>Finally, it should be remembered that in many real-world
>applications, you don't actually use the schema -- it's a
>simply a piece of documentation on what is allowed in the
>XML.  The parser presumably already knows this.
>  
>
Depends on your parser and your document validation policy. The point 
being that the standard
should help in a standard way those who wish to process multiple 
versions using standard
techniques like XML Catalog. Those who wish to ignore validation do so 
at their own risk: their
parser may or may not cope but that's no fault of the schemas.

<snip>

>Thinking further, we *are* inconsistent in our use of
>xsi:nil, as method names are handled differently from
>anything else.  For a method name, there are three things
>that I might want to convey in the XML:
>
> - the method is named, but has the null name (i.e. it is
>   Little Bob);
>
> - according to the database, the method is unnamed; or
>
> - it is unspecified whether the method is named.
>
>Currently, we use xsi:nil for the former case, ignore the
>second case (we have no unnamed methods in the database at
>the moment), and omit the element in the latter case.
>
>Really we should be able to distinguish all three cases.
>  
>
I'm still thinking about this one. Can you give me an example of the 
third case and how it is
distinct from case 2?

Gary.