Re: [Obo-format] Tag order

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

On 20 Aug 2010, at 20/Aug/2010 09:15:25, Erick Antezana wrote:

> Wacek,
>
> as mentioned, I just wanted to make a point related the advantages  
> we could have with such ordering. I totally agree that having  
> freedom in a language has also its positive sides... I forgot to  
> mention in my previous email that such ordering has been helping us  
> (since many years ago) and probably other people too (by means of  
> CVS clients for instance) to easily make a 'diff' between two  
> versions of the same ontology. Nowadays, there are some tools to  
> perform ontology alignment (, matching, etc.) which also assist  
> users to perform that comparison task, but there is not really (as  
> far as I know) a "good" OBO-formatted ontology versioning system and  
> I believe most of the ontology builders still rely on systems such  
> as CVS... (our serialised ontology versions/files have actually the  
> terms ordered based on their IDs/names, too...).  I still see as few  
> more advantages in adopting/*recommending* such an ordering.

Good point.  I have to admit that we take advantage of OBO-Edit's  
ordering of stanzas and tags within them to do diffs.

>
> cheers,
> Erick
>
> On 19 August 2010 16:23, Wacek Kusnierczyk <wa...@id...> wrote:
> Erick Antezana wrote:
> David,
>
> I believe that you agree that any hand editing process is error- 
> prone... those simple constraints (such as ordering) could, on the  
> one hand, help us to actually avoid (or minimise) those possible  
> issues (mainly syntactic) with the help of strict parsers that could  
> point them out before our ontology goes public (so we keep our users  
> happy, or at least they don't have to spend time re-checking the  
> syntax, submitting bugs, etc...); on the other hand, the experience  
> some of us have with "relatively relaxed" formats/specs demonstrates  
> that the imagination of parser developers or in general tool  
> developers, and/or data providers in a given format could have a  
> negative impact on the evolution of a specification which in turn  
> has an impact in the systems that had adopted a given "dialect" of a  
> specification... Let's take for instance the case of the GFF format,  
> which is a very comprehensive and useful format, however, you may  
> know that there are few issues related to the spec GFF 2 (fixed in  
> GFF3) due to a lack of "strictness" in a few spec details (I think  
> column 9?)...
>
> Anyway, as Chris mentioned, the OBO files will still be valid... but  
> the recommendation will still be there...
>
> Erick,
>
> While there should be no doubt that a precise, unambiguous syntax  
> specification is desirable and greatly helps to keep various tools  
> based on the same format interoperable, I wonder how much you'd gain  
> by insisting on a specific tag order, in the particular case of OBO.
> In principle, if there were some ordering imposed on tags as t1, t2,  
> t3, ..., tn, then parsers could discover, before being done with the  
> whole stanza, the following issues:
>
> 1. missing tags, e.g., t3 found after t1;
> 2. unordered tags, e.g., t1 found after t3.
>
> This might make sense in that a parser could discover a missing  
> obligatory tag (oops, the lack of a tag) as soon as it finds a tag  
> placed further down in the order.  This might also make sense if  
> there were any dependences between tags, e.g., it made no sense to  
> include a tag if another tag is not present.  (However, I'd imagine  
> that partial ordering would be more appropriate here.)  *But*, OBO  
> allows one to spread the specification of an object across multiple  
> stanzas, and this would obviously be in conflict.  As the parser  
> would have to wait with its reaction until it has read the whole  
> document (or even a batch of documents), there would be no obvious  
> gain here.
> After having parsed the whole document, a parser can decide if  
> obligatory tags are missing, or if tag dependencies are violated.   
> True, waiting until the end of the document and the need for a  
> second pass through the whole data (its internal representation)  
> postpones error reporting, but again, it's enforced by the design of  
> the language (multiple stanzas per object).  I can imagine that  
> repetitive error reports while parsing OBO files due to wrong tag  
> order (e.g., 'missing t1' when t3 is found first) would be more  
> annoying than helpful for curators.  On the other hand, it should  
> not be a problem to have a tool reorder the tags for you -- and if  
> you assume all OBO files are in fact automatically reordered, you  
> could indeed have faster error detection at parse time.
>
> Order constraints are essential in languages that demand, e.g.,  
> declaration of a variable before it is used.  (Though it's not  
> necessarily a syntactic issue.)  OBO is a declarative language, and  
> as much freedom as possible, within an unambiguous specification,  
> seems rather a virtue.
>
> vQ
>

David Osumi-Sutherland, PhD
Curator/ Ontologist
FlyBase / Virtual Fly Brain
Department of Genetics,
University of Cambridge,
Downing Street,
Cambridge, CB2 3EH, UK
Tel: +44 (0)1223 333 963
Fax: +44 (0)1223 766 732