|
From: David Osumi-S. <dj...@ge...> - 2009-08-12 05:15:07
|
Hi all, I've been thinking about sending out this proposal for some time. I think it fits well with the recent discussion of documentation (see Subject: Re: [Obo-discuss] ontology term comments and provenance). I'm sending it to both OBO format and OBO discuss. Note - I'm interested in the views of OWL users as well as OBO format users. ---- = Proposal for standard syntax for marking up term names in textual definitions and coments = It is desirable that term names within textual definitions and comments should be used consistently. (I thought this was on foundry principle or at least a proposed one, but I can't seem to find it). However, as term names may change, it is easy for references to other terms within the text of a definition to become inconsistent with the standard names. Over time, if there are a number of changes, multiple inconsistencies between definitions referring to the same type can emerge, as well as differences with the official name. This issue extends to comment fields as well. The problem could be solved if we had a standard markup for ontology term mentions in text that included an ID/term name pair every time a particular term was referred to. With this in place, it should be easy to automatically update names, using the ID as lookup, via scripts or systems built into the major ontology editing software. Such a system could also be used to generate hyperlinks allowing clicking from defintions to the terms referred to (both actual hyperlinks in web display, and some equivelent in editing tools). Such a markup could also be useful in notes written as part of public discussion of term definitions, for example on a wiki. It should be easy to develop term-picking systems to allow users to easily generated this markup. The markup could also serve as an indexing system for external comments. Another possible use is in the auto-generation of textual definitions from relationships. So, how should the markup work? I'm probably not the right person to specify this, but it seems to me there are two major options: 1. a simple system involving special characters to delimit term/ID pairs + a standard syntax for the term ID pair itself. e.g.- @termname;ID:1234567@. - Seems like a rather hacky option, although does have the advantage of being simple, easy to do by hand, and unobtrusive enough to leave the text readable without further processing. 2. An embedded XML tag. This would be less hacky - it could potentially extend existing standards for XML representation of ontologies and would be easy to mine using standard tools. It has the disadvantage of being verbose and a pain to do by hand. I'm worried it also may screw with OWL-XML standards, but don't know enough about these to say. - - I'm sure others on these lists are better placed than I am to make good suggestions regarding the ideal format for this markup. Whatever is chosen should work with (or at least not break) both OWL and OBO formats and their major editors. One final suggestion: it might be useful to extend this to allow standard markup of references with text definitions and comments. Cheers, David David Osumi-Sutherland, PhD Ontologist / Curator Virtual Fly Brain / FlyBase Department of Genetics University of Cambridge Downing Street Cambridge, CB2 3EH UK +44 (0)1223 333 963 |