From: Peter R. <pet...@ch...> - 2010-03-18 08:39:58
|
Chris Mungall wrote: > > > On Mar 11, 2010, at 4:14 AM, David Osumi-Sutherland wrote: > >> Hi Chris, >> >> Square brackets are fine with me if you think they're better and I'm >> happy with your suggestion that markup with backticks be officially >> recognised by optional. >> >> Just to clarify my aims: The reason I'd like a markup system for the >> name strings is to allow automatic checks for when name and ID are >> out of sync. I wouldn't want to enforce that names in text be >> identical for the reasons you point out - we don't want enforced >> mangling of grammar. But there is a serious problem with free text >> going out of sync when totally new names are chosen for terms. For >> example, I may change the term 'ventral body' to 'lateral accessory >> lobe' to fit with a forthcoming atlas. If I do, it would be nice to >> be able to track down all use of this term in free text and then to >> keep track all cases where something other than the official name is >> used. > > So how do you propose to deal with the case where a non-exact term is > deliberately used? > > For example, if I have a definition that reads > > "has two or more `nuclei`[GO:0005634] > > This does not match the official label "nucleus", so the sync tool > would either (a) keep bugging you to change this to an ungrammatical > form (b) take in additional information - perhaps in the form of > synonyms or a dictionary - to the effect that "nuclei" is a standard > plural for "nucleus" or (c) use basic NLP stemming techniques to guess > that "nuclei" is probably an OK pluralization of "nucleus". > > These cases will be common (I'd guess), so I don't think (a) is > acceptable. (b) makes a lot more work for everyone. > > I don't think (c) would be so hard to implement. There would be edge > cases that would result in false positives or false negatives, but > these would be acceptable. However, if you're already going to make > doing some kind of NLP a required capability for the sync tool then > you may as well toss out the backticks. An NLP tool should be able to > work backwards from the [ID] to the beginning of the term. > My take on this is a little different. The ID [GO:0005634] should be the actual semantic entity whose meaning is not changed if the name of the term is changed. Therefore, it should not matter whether the referring ontology definition uses a different grammatical case. If we decide to do Spring cleaning once in a while, say to check whether GO has changed the true meaning of [GO:0005634] from "nucleus" to "mitochondrion" or whatever, then certainly NLP tools could be used to make the process at least semiautomatic. But presumably, this sort of mistake will not happen very often. The use case that the HPO envisages is simply to have HTML links from our definitions to the corresponding terms in other ontologies. We will use the IDs in the () or [] to construct the link, and will put whatever is between the backticks between the <A> </A> tags. The user of our browser site will never see whether there are () or [] therefore. Please note that this is not intended to be a way of providing computer-readable definitions, but is intended to improve the human-readable ones. They will be kept in synch with the PATO definitions (this will be made easier to do because of the HTML links, in that a script could help to point out synchronisation problems that might have been more difficult without the HTML links). I do not care much whether square or round brackets are used, but think that () are slightly better because they look different from the [] that are used in other parts of the OBO syntax. I would suggest that it is better to include the backticks in the non-optional part of the standard, because then other software tools will be easier to implement. best wishes, Peter > >> Cheers, >> >> David >> >> On 10 Mar 2010, at 19:08, Chris Mungall wrote: >> >>> On Feb 25, 2010, at 9:53 AM, David Osumi-Sutherland wrote: >>> >>>> Hi Peter, >>>> >>>> It seems you want a standard markup for ontology term names in free >>>> text. This is broader than the issue of referring to imported terms >>>> - it could be extremely useful for keeping names in free text up to >>>> date, and for conversion into hyperlinks for web display. I brought >>>> this up last year. See the thread: >>>> Re: [Obo-discuss] [Obo-format] Proposal for standard syntax for >>>> marking up term names in textual definitions and coments >>>> >>>> It was also discussed briefly at the OBO foundry meeting. >>>> >>>> >>>> >>>> I'd like to arrive at some officially sanctioned syntax for this as >>>> soon as possible - perhaps as something posted on the OBO foundry >>>> website? >>>> >>>> >>>> >>>> >>>> >>>> Here's my suggestion: >>>> >>>> >>>> = Requirements for a standard markup for ontology terms in free >>>> text = >>>> >>>> 1. Delimiter for name: >>>> 2. Standard syntax to link ID to name. >>>> 3. Markup must be sufficiently unobtrusive for the text to be easily >>>> readable and editable - this rules out XML markup. >>>> 4. Markup should not clash with standard delimiters used in OBO >>>> format or XML or other major OWL syntaxes (e.g. Manchester). >>>> 5. Markup should not use symbols found in OBO term names (difficult >>>> considering the lack of constraints on these (!) - but we can work >>>> from usage and then impose a restriction as part of our spec). >>>> 6. Markup should not be likely to occur as part of normal free text. >>>> >>>> = On the basis of the above, we should avoid:= >>>> >>>> 1. Double quotes - they are the delimiters used in OBO syntax for >>>> the whole. >>>> 2. There are a number of terms in the OBO foundry with apostrophes >>>> in - unless we ban these, single quotes are ruled out. >>>> 3. All brackets except simple parentheses - []{}<> make me nervous >>>> as they have specific meanings in OBO and/or XML. >>>> >>>> = suggestions = >>>> >>>> This still leaves plenty of special characters. The least >>>> obtrusive way I can think of is to combine back-ticks and simple >>>> parentheses as follows: >>>> >>>> `name` (id:nnnnnnn) >>>> >>>> Regular readers are likely to read back ticks as a single quote and >>>> the contents of regular parentheses as an aside. Back-ticks are not >>>> likely to turn up accidentally in text and so it should be easy to >>>> write reliable RegEx to find these. >>>> >>>> Anybody see a problem with this or have some alternative suggestion? >>> My own personal preference is to keep the def field as human-centric >>> as possible, and to introduce another field if there is a desire to >>> have a new syntax for a structured field intermediate between >>> existing >>> human-centric definitions and the computable definitions. Thus I'm in >>> favour of including IDs parenthetically, like a citation, but >>> ordinary >>> natural language text doesn't normally have backticks obstrusively >>> interspersed. Having a standard way of writing the IDs >>> parenthetically >>> will allow web interfaces to include hyperlinks trivially. >>> >>> If we do bless this syntax, then tool developers should really update >>> their tools to hide the syntactic notation from users. This seems to >>> generate more sum-total work than the trivial single tool or plugin >>> that would be required to spot the class label given the ID as a >>> hint. >>> The cost of not hiding the notation is relatively low, humans can >>> learn to ignore the backticks. But I do think this is sacrificing the >>> human factors aspect too much. >>> >>> I see that HPO has already adopted this syntax. I'd like to see it >>> worked out a little better. For example >>> >>> * What is the policy on plurals? E.g. >>> "a `cell` (CL:0000000) with two `nucleus` (GO:0005634)" >>> You could always try and rewrite: >>> "a `cell` (CL:0000000) with `nucleus` (GO:0005634) number of 2" >>> but this is the tail wagging the dog >>> >>> I'm sure there are many instances where being forced to use the exact >>> string will result in some awkward wording >>> >>> Also, this seems like a halfway solution. What about relations? It's >>> important for a definition to be clear about the relationships >>> between >>> the referenced entities. Would we eventually have to write: >>> >>> "a `cell` (CL:0000000) that `has part` (BFO:nnnnnn) exactly 2 >>> `nucleus` (GO:0005634)" >>> >>> Here we have essentially reinvented OWL manchester syntax. Manchester >>> syntax is great, but the logical definitions are already essentially >>> manchester syntax, so it's not clear what's gained here. >>> >>> I think we have a set of differing requirements here, that may >>> require >>> different solutions >>> >>> * Defining a class for humans - the existing def: field, and the >>> corresponding annotation property in the IAO >>> * Defining a class for machines - existing mechanisms (intersection/ >>> union in OE, EquivalentTo in OWL) >>> * Marking up text for html with hyperlinks - parenthetic IDs may be >>> good enough, but it may be nicer to highlight the whole string, in >>> which case either human markup or simple matching tools are required >>> * Ensuring human definitions are in sync with machine definitions - >>> parenthetic IDs help, marking up the string does not help >>> * Ensuring human definitions and text in general use the approved >>> terminology of the appropriate reference ontology - parenthetic IDs >>> required, marked up text helps a lot, but in general this may be a >>> difficult problem requiring ether sophisticated tools or a heuristic >>> approach >>> >>> I think the ideal solution would be to have a separate field for an >>> html definition (or a flag to indicate the string is embedded html). >>> The html can be displayed for humans in exactly the same way as any >>> html, and it can include unobtrusive metadata that can be used to >>> reference classes and even deterministically be transformed to a >>> logical definition (something like the GRDDL standard, but more >>> ontology-friendly). But this will require additional tool support, so >>> I appreciate the need for an interim solution. >>> >>> Anyway, I'm broadly in favour of any measure that increases explicit >>> linkages across OBO ontologies. I think we all agree that parenthetic >>> inclusion of identifiers is a good thing, and relatively >>> unobstrusive. >>> I'd prefer []s to distinguish these from normal parenthetic comments, >>> and this use is common as a citation style. If the community really >>> want to unambiguously denote the start of the label then I agree with >>> David that backtick is better than any other character. It's easier >>> to >>> remove these later than to add them. >>> >>> With that in mind, here is a first pass at a set of recommendations >>> >>> * textual metadata in ontologies should be marked up by including IDs >>> in []s immediately after the term string >>> * this is especially important for the first sentence of a textual >>> definition >>> * the primary label from the referenced ontology should always be >>> used, unless this leads to an awkward construct, in which case an >>> exact synonym should be used. Standard lexical variants can be used >>> at >>> the authors discretion (e.g. plurals) >>> * if no exact match is available, include a note and make a request >>> on >>> the appropriate tracker, and preferably include a link to the tracker >>> item in the note >>> * labels can be optionally marked up. If they are marked up, then it >>> must be using the backtick. >>> >>> >>>> Cheers, >>>> >>>> David >>>> >>>> BTW: Phil Lord discussed a related topic on his blog last year: >>>> >>>> OBO term names in the context of Manchester syntax http://www.russet.org.uk/blog/2009/09/obo-format-and-manchester-syntax/ >>>> >>>> Unforutnately, the issue here is reversed - we want to >>>> parenthetically refer to a name in the context of a formal syntax >>>> that uses the ID, rather than paranthetically refer to an ID in the >>>> context of free text. So, I don't think we'll be able to use the >>>> same solution for both. >>>> >>>> >>>> On 25 Feb 2010, at 25/Feb/2010 17:36:35, Peter Robinson wrote: >>>> >>>>> Chris Mungall wrote: >>>>>> On Feb 25, 2010, at 2:45 AM, Robinson, Peter wrote: >>>>>> >>>>>>> Dear OBOers, >>>>>>> >>>>>>> The HPO is making an effort to use cross links to other OBO >>>>>>> ontologies (GO, FMA, CHEBI, PATO, MPATH and others) within its >>>>>>> term >>>>>>> definitions. It would be nice if there were a standard syntax for >>>>>>> this within the OBO Foundry ontologies so that whatever tools >>>>>>> will >>>>>>> be made in the future can take advantage of it. >>>>>>> >>>>>>> We would suggest that the following syntax be made standard: >>>>>>> >>>>>>> HP:0006315 Single median maxillary central incisor >>>>>>> def: The presence of a single, centrally located maxillary {FMA: >>>>>>> 12823 Incisor tooth} instead of the normal complement of a left >>>>>>> and >>>>>>> a right maxillary incisor tooth. >>>>>> The primary consumers of text definitions are humans, so these >>>>>> should >>>>>> remain human readable and human-centric >>>>>> >>>>>> Using a citation style seems less intrusive: >>>>>> def: "The presence of a single, centrally located maxillary >>>>>> Incisor >>>>>> tooth[FMA: 12823] instead of the normal complement of a left and a >>>>>> right maxillary incisor tooth." >>>>> >>>>> >>>>> >>>>> Of course, the intention is that human readers will only see a link >>>>> on >>>>> an HTML site and that the only people reading the OBO file directly >>>>> will >>>>> be computer-savvy enough to deal with it. The disadvantage of the >>>>> above >>>>> solution is that you need to parse in the entire FMA to know where >>>>> the >>>>> FMA term begins, and the amount of work necessary to parse things >>>>> increases greatly. >>>>> Since the HPO is moving to a single hierarchy ext file there will >>>>> definitely be an informatic pipeline between the "ur"-HPO and the >>>>> files >>>>> that users actually see. I thought that this was the OBO foundry >>>>> recommendation? Therefore, I would still vote for the {FMA:12823 >>>>> Incisor >>>>> tooth} version rather than single, centrally located maxillary >>>>> Incisor >>>>> tooth[FMA: 12823], where there is no way of knowing where the FMA >>>>> term >>>>> starts (maxillary or incisor?). >>>>> At the risk of being monotonous, I would repeat that it would be a >>>>> good >>>>> thing to have a single standard in the OBO documentation! >>>>> -peter >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>> the disadvantage is that a small amount of text matching is >>>>>> required >>>>>> to keep definitions in sync with labeling changes in the source >>>>>> ontology. But I think that's fine. >>>>>> >>>>>> If the goal is to have something more structured, intermediate >>>>>> between >>>>>> a human-centric text definition and the existing computable >>>>>> definitions then I would suggest a new structured text definition >>>>>> with >>>>>> different guidelines rather than overloading the existing field. >>>>>> >>>>>> >>>>>>> or maybe the alternative: >>>>>>> >>>>>>> HP:0006315 Single median maxillary central incisor >>>>>>> def: The presence of a single, centrally located maxillary >>>>>>> FMA_12823_Incisor_tooth instead of the normal complement of a >>>>>>> left >>>>>>> and a right maxillary incisor tooth. >>>>>>> >>>>>>> >>>>>>> In order to use the FMA definition of Incisor tooth as a referent >>>>>>> for the HPO definition. >>>>>>> >>>>>>> Has anybody else come up with a better solution? Can we agree on >>>>>>> this syntax (or, of course, a better syntax that somebody might >>>>>>> suggest)? It seems like it is only common sense to have a single >>>>>>> standard way of doing this in OBO ontologies. >>>>>>> >>>>>>> Best wishes from Berlin, where the snow is finally melting, >>>>>>> albeit >>>>>>> slowly. >>>>>>> Peter >>>>>>> >>>>>>> >>>>>>> >>>>>>> Dr. med. Peter N. Robinson, MSc. >>>>>>> Institut für Medizinische Genetik >>>>>>> Charité - Universitätsmedizin Berlin >>>>>>> Augustenburger Platz 1 >>>>>>> 13353 Berlin >>>>>>> Germany >>>>>>> +4930 450566042 >>>>>>> pet...@ch... >>>>>>> http://compbio.charite.de >>>>>>> http://www.human-phenotype-ontology.org >>>>>>> ------------------------------------------------------------------------------ >>>>>>> Download Intel® Parallel Studio Eval >>>>>>> Try the new software tools for yourself. Speed compiling, find >>>>>>> bugs >>>>>>> proactively, and fine-tune applications for parallel performance. >>>>>>> See why Intel Parallel Studio got high marks during beta. >>>>>>> http://p.sf.net/sfu/intel-sw-dev >>>>>>> _______________________________________________ >>>>>>> Obo-discuss mailing list >>>>>>> Obo...@li... >>>>>>> https://lists.sourceforge.net/lists/listinfo/obo-discuss >>>>>>> >>>>>> >>>>>> ------------------------------------------------------------------------------ >>>>>> Download Intel® Parallel Studio Eval >>>>>> Try the new software tools for yourself. Speed compiling, find >>>>>> bugs >>>>>> proactively, and fine-tune applications for parallel performance. >>>>>> See why Intel Parallel Studio got high marks during beta. >>>>>> http://p.sf.net/sfu/intel-sw-dev >>>>>> _______________________________________________ >>>>>> Obo-discuss mailing list >>>>>> Obo...@li... >>>>>> https://lists.sourceforge.net/lists/listinfo/obo-discuss >>>>>> . >>>>>> >>>>> >>>>> -- >>>>> Dr. med. Peter N. Robinson, MSc. >>>>> Institut für Medizinische Genetik >>>>> Charité - Universitätsmedizin Berlin >>>>> Humboldt-Universität >>>>> Augustenburger Platz 1 >>>>> 13353 Berlin >>>>> Germany >>>>> voice: 49-30-450566042 >>>>> fax: 49-30-450569915 >>>>> email: pet...@ch... >>>>> http://compbio.charite.de/ >>>>> http://www.human-phenotype-ontology.org >>>>> >>>>> ------------------------------------------------------------------------------ >>>>> Download Intel® Parallel Studio Eval >>>>> Try the new software tools for yourself. Speed compiling, find bugs >>>>> proactively, and fine-tune applications for parallel performance. >>>>> See why Intel Parallel Studio got high marks during beta. >>>>> http://p.sf.net/sfu/intel-sw-dev >>>>> _______________________________________________ >>>>> Obo-discuss mailing list >>>>> Obo...@li... >>>>> https://lists.sourceforge.net/lists/listinfo/obo-discuss >>>> David Osumi-Sutherland, PhD >>>> Curator / Ontologist >>>> FlyBase / Virtual Fly Brain >>>> Department of Genetics, >>>> University of Cambridge, >>>> Downing Street, >>>> Cambridge, CB2 3EH, UK >>>> Tel: +44 (0)1223 333 963 >>>> Fax: +44 (0)1223 766 732 >>>> >>>> ------------------------------------------------------------------------------ >>>> Download Intel® Parallel Studio Eval >>>> Try the new software tools for yourself. Speed compiling, find bugs >>>> proactively, and fine-tune applications for parallel performance. >>>> See why Intel Parallel Studio got high marks during beta. >>>> http://p.sf.net/sfu/intel-sw-dev_______________________________________________ >>>> Obo-discuss mailing list >>>> Obo...@li... >>>> https://lists.sourceforge.net/lists/listinfo/obo-discuss >>> >>> ------------------------------------------------------------------------------ >>> Download Intel® Parallel Studio Eval >>> Try the new software tools for yourself. Speed compiling, find bugs >>> proactively, and fine-tune applications for parallel performance. >>> See why Intel Parallel Studio got high marks during beta. >>> http://p.sf.net/sfu/intel-sw-dev >>> _______________________________________________ >>> Obo-discuss mailing list >>> Obo...@li... >>> https://lists.sourceforge.net/lists/listinfo/obo-discuss >> David Osumi-Sutherland, PhD >> Ontologist / Curator >> Virtual Fly Brain / FlyBase >> Department of Genetics >> University of Cambridge >> Downing Street >> Cambridge, CB2 3EH >> UK >> +44 (0)1223 333 963 >> >> >> >> ------------------------------------------------------------------------------ >> Download Intel® Parallel Studio Eval >> Try the new software tools for yourself. Speed compiling, find bugs >> proactively, and fine-tune applications for parallel performance. >> See why Intel Parallel Studio got high marks during beta. >> http://p.sf.net/sfu/intel-sw-dev_______________________________________________ >> Obo-discuss mailing list >> Obo...@li... >> https://lists.sourceforge.net/lists/listinfo/obo-discuss > > > ------------------------------------------------------------------------------ > Download Intel® Parallel Studio Eval > Try the new software tools for yourself. Speed compiling, find bugs > proactively, and fine-tune applications for parallel performance. > See why Intel Parallel Studio got high marks during beta. > http://p.sf.net/sfu/intel-sw-dev > _______________________________________________ > Obo-discuss mailing list > Obo...@li... > https://lists.sourceforge.net/lists/listinfo/obo-discuss > . > -- Dr. med. Peter N. Robinson, MSc. Institut für Medizinische Genetik Charité - Universitätsmedizin Berlin Augustenburger Platz 1 13353 Berlin Germany voice: 49-30-450566042 fax: 49-30-450569915 email: pet...@ch... http://compbio.charite.de/ http://www.human-phenotype-ontology.org |