From: David Osumi-S. <dj...@ge...> - 2010-08-20 10:26:15
|
On 20 Aug 2010, at 20/Aug/2010 09:15:25, Erick Antezana wrote: > Wacek, > > as mentioned, I just wanted to make a point related the advantages > we could have with such ordering. I totally agree that having > freedom in a language has also its positive sides... I forgot to > mention in my previous email that such ordering has been helping us > (since many years ago) and probably other people too (by means of > CVS clients for instance) to easily make a 'diff' between two > versions of the same ontology. Nowadays, there are some tools to > perform ontology alignment (, matching, etc.) which also assist > users to perform that comparison task, but there is not really (as > far as I know) a "good" OBO-formatted ontology versioning system and > I believe most of the ontology builders still rely on systems such > as CVS... (our serialised ontology versions/files have actually the > terms ordered based on their IDs/names, too...). I still see as few > more advantages in adopting/*recommending* such an ordering. Good point. I have to admit that we take advantage of OBO-Edit's ordering of stanzas and tags within them to do diffs. > > cheers, > Erick > > On 19 August 2010 16:23, Wacek Kusnierczyk <wa...@id...> wrote: > Erick Antezana wrote: > David, > > I believe that you agree that any hand editing process is error- > prone... those simple constraints (such as ordering) could, on the > one hand, help us to actually avoid (or minimise) those possible > issues (mainly syntactic) with the help of strict parsers that could > point them out before our ontology goes public (so we keep our users > happy, or at least they don't have to spend time re-checking the > syntax, submitting bugs, etc...); on the other hand, the experience > some of us have with "relatively relaxed" formats/specs demonstrates > that the imagination of parser developers or in general tool > developers, and/or data providers in a given format could have a > negative impact on the evolution of a specification which in turn > has an impact in the systems that had adopted a given "dialect" of a > specification... Let's take for instance the case of the GFF format, > which is a very comprehensive and useful format, however, you may > know that there are few issues related to the spec GFF 2 (fixed in > GFF3) due to a lack of "strictness" in a few spec details (I think > column 9?)... > > Anyway, as Chris mentioned, the OBO files will still be valid... but > the recommendation will still be there... > > Erick, > > While there should be no doubt that a precise, unambiguous syntax > specification is desirable and greatly helps to keep various tools > based on the same format interoperable, I wonder how much you'd gain > by insisting on a specific tag order, in the particular case of OBO. > In principle, if there were some ordering imposed on tags as t1, t2, > t3, ..., tn, then parsers could discover, before being done with the > whole stanza, the following issues: > > 1. missing tags, e.g., t3 found after t1; > 2. unordered tags, e.g., t1 found after t3. > > This might make sense in that a parser could discover a missing > obligatory tag (oops, the lack of a tag) as soon as it finds a tag > placed further down in the order. This might also make sense if > there were any dependences between tags, e.g., it made no sense to > include a tag if another tag is not present. (However, I'd imagine > that partial ordering would be more appropriate here.) *But*, OBO > allows one to spread the specification of an object across multiple > stanzas, and this would obviously be in conflict. As the parser > would have to wait with its reaction until it has read the whole > document (or even a batch of documents), there would be no obvious > gain here. > After having parsed the whole document, a parser can decide if > obligatory tags are missing, or if tag dependencies are violated. > True, waiting until the end of the document and the need for a > second pass through the whole data (its internal representation) > postpones error reporting, but again, it's enforced by the design of > the language (multiple stanzas per object). I can imagine that > repetitive error reports while parsing OBO files due to wrong tag > order (e.g., 'missing t1' when t3 is found first) would be more > annoying than helpful for curators. On the other hand, it should > not be a problem to have a tool reorder the tags for you -- and if > you assume all OBO files are in fact automatically reordered, you > could indeed have faster error detection at parse time. > > Order constraints are essential in languages that demand, e.g., > declaration of a variable before it is used. (Though it's not > necessarily a syntactic issue.) OBO is a declarative language, and > as much freedom as possible, within an unambiguous specification, > seems rather a virtue. > > vQ > David Osumi-Sutherland, PhD Curator/ Ontologist FlyBase / Virtual Fly Brain Department of Genetics, University of Cambridge, Downing Street, Cambridge, CB2 3EH, UK Tel: +44 (0)1223 333 963 Fax: +44 (0)1223 766 732 |