Re: [Obo-format] Tag order

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Erick Antezana wrote:
> David,
>
> I believe that you agree that any hand editing process is 
> error-prone... those simple constraints (such as ordering) could, on 
> the one hand, help us to actually avoid (or minimise) those possible 
> issues (mainly syntactic) with the help of strict parsers that could 
> point them out before our ontology goes public (so we keep our users 
> happy, or at least they don't have to spend time re-checking the 
> syntax, submitting bugs, etc...); on the other hand, the experience 
> some of us have with "relatively relaxed" formats/specs demonstrates 
> that the imagination of parser developers or in general tool 
> developers, and/or data providers in a given format could have a 
> negative impact on the evolution of a specification which in turn has 
> an impact in the systems that had adopted a given "dialect" of a 
> specification... Let's take for instance the case of the GFF format, 
> which is a very comprehensive and useful format, however, you may know 
> that there are few issues related to the spec GFF 2 (fixed in GFF3) 
> due to a lack of "strictness" in a few spec details (I think column 9?)...
>
> Anyway, as Chris mentioned, the OBO files will still be valid... but 
> the recommendation will still be there...

Erick,

While there should be no doubt that a precise, unambiguous syntax 
specification is desirable and greatly helps to keep various tools based 
on the same format interoperable, I wonder how much you'd gain by 
insisting on a specific tag order, in the particular case of OBO. 

In principle, if there were some ordering imposed on tags as t1, t2, t3, 
..., tn, then parsers could discover, before being done with the whole 
stanza, the following issues:

1. missing tags, e.g., t3 found after t1;
2. unordered tags, e.g., t1 found after t3.

This might make sense in that a parser could discover a missing 
obligatory tag (oops, the lack of a tag) as soon as it finds a tag 
placed further down in the order.  This might also make sense if there 
were any dependences between tags, e.g., it made no sense to include a 
tag if another tag is not present.  (However, I'd imagine that partial 
ordering would be more appropriate here.)  *But*, OBO allows one to 
spread the specification of an object across multiple stanzas, and this 
would obviously be in conflict.  As the parser would have to wait with 
its reaction until it has read the whole document (or even a batch of 
documents), there would be no obvious gain here. 

After having parsed the whole document, a parser can decide if 
obligatory tags are missing, or if tag dependencies are violated.  True, 
waiting until the end of the document and the need for a second pass 
through the whole data (its internal representation) postpones error 
reporting, but again, it's enforced by the design of the language 
(multiple stanzas per object).  I can imagine that repetitive error 
reports while parsing OBO files due to wrong tag order (e.g., 'missing 
t1' when t3 is found first) would be more annoying than helpful for 
curators.  On the other hand, it should not be a problem to have a tool 
reorder the tags for you -- and if you assume all OBO files are in fact 
automatically reordered, you could indeed have faster error detection at 
parse time.

Order constraints are essential in languages that demand, e.g., 
declaration of a variable before it is used.  (Though it's not 
necessarily a syntactic issue.)  OBO is a declarative language, and as 
much freedom as possible, within an unambiguous specification, seems 
rather a virtue.

vQ