From: Alan R. <ala...@gm...> - 2009-08-04 15:28:01
|
[Trimming to the obo-format list] I'm convinced CHEBI is doing the right thing, if GO is doing the right thing. I'm also convinced that GO is not doing the right thing :) I believe that deprecation/obsoletion is of representational unit, of which IDs are one sort. There are a number of reasons why this can happen, among which is the finding that two IDs refer to the same entity. There's no reason to get rid of the alternative id annotation as long as a) The alternatives are also obsoleted. b) The reason and new id is recorded as part of the reason for obsolescence c) The alternative ids are understood to not be valid for use as alternative primary keys or in the assertion of axioms. Failing that (and I think this is a much worse choice) they need to be represented in the OWL translation using equivalentClass assertions. -Alan On Tue, Aug 4, 2009 at 11:17 AM, Chris Mungall<cj...@be...> wrote: > > I'm convinced CHEBI is doing the right thing here, so the rest of the > conversation will be restricted to the obo-format list > > On Aug 4, 2009, at 4:31 AM, Paula de Matos wrote: > >> Hi Alan, >> >> Thanks for the email. >> >> So ChEBI has been following the GO lead on this. >> >> We use alt_id tag in OBO when the term has not changed meaning but just >> been moved or merged. >> http://wiki.geneontology.org/index.php/Curator_Guide:_Merge_Split_Move >> >> We use obsoletion (is_obsolete) when the term is incorrect and the meaning >> has changed. >> http://wiki.geneontology.org/index.php/Curator_Guide:_Obsoletion >> >> We are happy to change to whatever is deemed the standard provided that >> when ids are removed the user will be able to find where they have been >> moved to. >> >> Perhaps Chris could comment on how GO wants to tackle this and ChEBI will >> follow suit? >> >> Cheers, >> Paula >> >> >> Alan Ruttenberg wrote: >>> >>> On Mon, Aug 3, 2009 at 8:28 AM, Janna Hastings<has...@eb...> wrote: >>> >>>> Hello, >>>> >>>> I don't think that the intention is that we maintain different *terms* >>>> which >>>> mean the same thing. The intention is that we maintain different >>>> *identifiers* which refer to the same term, with one of them being >>>> preferred/primary. Is a term necessarily identical with its identifier? >>>> It >>>> doesn't seem that this is implied in the OBO format. Can't the >>>> alternative >>>> IDs for a particular term be modelled as a kind of synonym in OWL rather >>>> than as separate terms? >>>> >>> >>> Terms and identifiers are representation. We try to have a one-to-one >>> correspondence between classes/universal and representational units. >>> Identifiers are one of the ways we name each of the representational >>> units. >>> >>> There's no problem having synonyms and other non-normative names for a >>> class. However there is one kind of name that serves a special purpose >>> (let's call these "sname"s for now)- these are the names that we use >>> to express our logical axioms, and the ones which we use as our >>> database keys for integration purposes. Snames are useful if we can >>> use them for important purposes like querying or expressing logical >>> axioms and they work correctly. >>> >>> The kind of guarantee we want for an sname is that if we ask all the >>> databases in the world for information about the entity corresponding >>> to that sname, we get it. >>> >>> Usually, the easiest way to make this work is that we only have one >>> sname for each entity we need to represent. Then everyone uses that >>> sname to mean that entity. >>> >>> If we have more than one sname for an entity, and we want our query to >>> work, we have to either >>> >>> a) Arrange that all the databases in the world register that both of >>> them mean the same entity so that they can respond to queries about >>> either >>> b) Arrange that all of the queriers in the world know all the snames >>> to ask for so that as they ask across databases they are sure to get >>> back information in at least one way that the database understands >>> >>> Neither of these are particularly appealing. >>> >>> (a) is really unlikely to be achieved as there are too many different >>> databases to coordinate. >>> (b) is unlikely to be achieved if queriers all have to learn something >>> special because there are also too many queriers. >>> >>> With OWL we have the ability to have axioms that make it so that >>> queries using one of the snames are inferred to be the same as queries >>> that use another. These are the sameAs/equivalentClass axioms that I >>> referred to. They are still a second choice since in order for them to >>> work the query systems people use need to be OWL aware, but at least >>> there's a chance. >>> >>> We usually consider the snames to be the "identifiers", but not the >>> synonyms, which is why the strategy of making them another kind of >>> synonym doesn't cut it - we don't do precise queries using the >>> synonyms - we do them with the snames. >>> >>> So. >>> >>> Best: Have a single sname = identifer >>> Second best: Have a multiple snames but assert them to be >>> logically/queriably equivalent. >>> Unworkable: Have a bunch of different ways of saying that snames are >>> equivalent that people need to remember to use when querying or which >>> providers need to index by. >>> >>> We already have to have one special case to deal with in queries - >>> identifiers that are obsolete. If we have alternative ids then we now >>> have two special cases. >>> >>> So the choice, as far as I am concerned is to either >>> >>> a) Use the existing special case - obsoleting ids - to handle the >>> situation. This is already somewhat the practice. >>> b) Add the equivalent class assertions in the OWL so that the >>> alternative ids can really serve as alternatives without everyone have >>> to implement special code for them. >>> >>> Personally I prefer a) >>> >>> -Alan >>> >>> >>> >>> >>> >>>> Janna >>>> >>>> >>>> >>>>> -----Original Message----- >>>>> From: Alan Ruttenberg [mailto:ala...@gm...] >>>>> Sent: 03 August 2009 12:28 >>>>> To: Paula de Matos >>>>> Cc: obo...@li...; Chris Mungall; Janna Hastings; >>>>> che...@li... >>>>> Subject: Re: [Chebi-ontology] 38636 >>>>> >>>>> >>>>> >>>>> On Aug 3, 2009, at 4:58 AM, Paula de Matos <pm...@eb...> wrote: >>>>> >>>>> >>>>>> Hi Chris, >>>>>> >>>>>> Yes that is precisely the case we have. If two entities are merged >>>>>> then we will get an alternative id. This only happens for term >>>>>> merges and this can happen quite often as ChEBI imports data from >>>>>> various sources such as KEGG COMPOUND and MSDchem which will mean >>>>>> that invariably there will be an overlap. >>>>>> We follow the same protocol as GO in this respect. The term name is >>>>>> kept as a synonym and we do try to take the utmost care when merging: >>>>>> http://www.ebi.ac.uk/chebi/annotationManualForward.do#Merge >>>>>> >>>>>> Term obsoletion/deletion is used when a curator is certain that the >>>>>> entry is incorrect and therefore it should be deleted. >>>>>> >>>>> Thanks for the background, Janna. >>>>> >>>>> If this practice is to be maintained (foundry wide) - and I think it >>>>> shouldn't be - then we will need to adjust the OWL rendering so that >>>>> queries against any of the ids actually work. AFAIKT this would mean >>>>> adding equivalentclass assertions for each of the ids. >>>>> >>>>> However I note that one of the reasons that we want orthogonality >>>>> *across* ontologies is to avoid precisely the situation in which we >>>>> have many terms that mean the same thing, so I can't see why we would >>>>> want to condone it within ontologies. >>>>> >>>>> -Alan >>>>> >>>>> >>>>>> Cheers, >>>>>> Paula >>>>>> >>>>>> >>>>>> Chris Mungall wrote: >>>>>> >>>>>>> On Aug 2, 2009, at 2:38 PM, Alan Ruttenberg wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>>> On Aug 2, 2009, at 8:06 AM, Janna Hastings >>>>>>>> <jan...@gm... >>>>>>>> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>> >>>>>>>>> Hi Alan, >>>>>>>>> >>>>>>>>> that's right, this is general and will apply to many ChEBI >>>>>>>>> terms. It's to do with how ChEBI maintains identifiers. When we >>>>>>>>> create new identifiers we preserve the old ones as alternate IDs >>>>>>>>> of the new ID rather than making the terms linked to the >>>>>>>>> original IDs obsolete. So in effect, each ID can have a little >>>>>>>>> family of alternate IDs which may or may not have been released >>>>>>>>> previously. Any of the alternate IDs will get you to the same >>>>>>>>> term, so they are not obsolete in the usual sense. >>>>>>>>> >>>>>>>>> >>>>>>>> There are two problems with this approach. First, we have >>>>>>>> divergence between the different ids used by different >>>>>>>> applications - this hurts data integration. >>>>>>>> >>>>>>>> Second, they aren't really coequal ids. At least in the owl >>>>>>>> version, you can't do a query with any of the alternative ids >>>>>>>> interchangeably and get the same answers. >>>>>>>> >>>>>>>> So I think it better to move to obsoleting such ids. Keeping the >>>>>>>> alternative id annotation seems harmless, however. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> Do any of the other ontologies maintain multiple IDs for terms? >>>>>>>>> >>>>>>>>> >>>>>>>> I don't know. However the practice that we're trying to follow >>>>>>>> is that of the GO, and I expect they don't, but I'm ccing Chris >>>>>>>> to ask him. >>>>>>>> >>>>>>>> >>>>>>> http://wiki.geneontology.org/index.php/ >>>>>>> Curator_Guide:_Merge_Split_Move >>>>>>> http://wiki.geneontology.org/index.php/Curator_Guide:_Obsoletion >>>>>>> >>>>>>> It's perfectly acceptable for CHEBI to have non-obsolete alt_ids >>>>>>> for their terms. However, I'm not quite sure I understand the >>>>>>> process of ID generation in CHEBI - it seems there may be more ID >>>>>>> churn than is desirable. In general you should only have alt_ids >>>>>>> if there was previously a term merge (for example, when it is >>>>>>> discovered that two CHEBI terms denote the same types) >>>>>>> >>>>>>> >>>>>>> >>>>>>>> -Alan >>>>>>>> >>>>>>>> >>>>>>>>> Cheers, >>>>>>>>> Janna >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> -----Original Message----- >>>>>>>>>> From: Alan Ruttenberg [mailto:ala...@gm...] >>>>>>>>>> Sent: 02 August 2009 10:59 >>>>>>>>>> To: ChEBI; obo...@li... >>>>>>>>>> Subject: [Chebi-ontology] 38636 >>>>>>>>>> Just noticed (on review of some terms to import to OBI) >>>>>>>>>> that >>>>>>>>>> http://www.ebi.ac.uk/chebi/searchId.do?chebiId=CHEBI: >>>>>>>>>> 38636 shows >>>>>>>>>> >>>>>>>>> the >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> entry for CHEBI:49603. On inspection of the OWL file, I see that >>>>>>>>>> CHEBI:38636 is listed as an alternative id. >>>>>>>>>> The question is about consistency: I think the >>>>>>>>>> expectation is >>>>>>>>>> >>>>>>>>> that any >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> class name that has been in circulation would either be active, >>>>>>>>>> >>>>>>>>> or be >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> made a subclass of obsolete class. However this case (and >>>>>>>>>> >>>>>>>>> presumably >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> others) violate this expectation. >>>>>>>>>> I *think* that the obo should still include an entry for >>>>>>>>>> CHEBI: >>>>>>>>>> >>>>>>>>> 38636 >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> with is_obsolete: true, yes? >>>>>>>>>> -Alan >>>>>>>>>> >>>>>>>>>> >>>>>>>>> --- >>>>>>>>> --- >>>>>>>>> --- >>>>>>>>> ----------------------------------------------------------------- >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> ---- >>>>>>>>>> Let Crystal Reports handle the reporting - Free Crystal >>>>>>>>>> Reports >>>>>>>>>> >>>>>>>>> 2008 30- >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> Day >>>>>>>>>> trial. Simplify your report design, integration and >>>>>>>>>> deployment - >>>>>>>>>> >>>>>>>>> and focus >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> on >>>>>>>>>> what you do best, core application coding. Discover >>>>>>>>>> what's new with >>>>>>>>>> Crystal Reports now. http://p.sf.net/sfu/bobj-july >>>>>>>>>> _______________________________________________ >>>>>>>>>> Chebi-ontology mailing list >>>>>>>>>> Che...@li... >>>>>>>>>> https://lists.sourceforge.net/lists/listinfo/chebi-ontology >>>>>>>>>> >>>>>>>>>> >>>>>>>>> --- >>>>>>>>> --- >>>>>>>>> --- >>>>>>>>> --- >>>>>>>>> ------------------------------------------------------------------ >>>>>>>>> Let Crystal Reports handle the reporting - Free Crystal Reports >>>>>>>>> 2008 30-Day >>>>>>>>> trial. Simplify your report design, integration and deployment - >>>>>>>>> and focus on >>>>>>>>> what you do best, core application coding. Discover what's new with >>>>>>>>> Crystal Reports now. http://p.sf.net/sfu/bobj-july >>>>>>>>> _______________________________________________ >>>>>>>>> Chebi-ontology mailing list >>>>>>>>> Che...@li... >>>>>>>>> https://lists.sourceforge.net/lists/listinfo/chebi-ontology >>>>>>>>> >>>>>>>>> >>>>>>>> --- >>>>>>>> --- >>>>>>>> --- >>>>>>>> --- >>>>>>>> ------------------------------------------------------------------ >>>>>>>> Let Crystal Reports handle the reporting - Free Crystal Reports >>>>>>>> 2008 30-Day >>>>>>>> trial. Simplify your report design, integration and deployment - >>>>>>>> and focus on >>>>>>>> what you do best, core application coding. Discover what's new with >>>>>>>> Crystal Reports now. http://p.sf.net/sfu/bobj- >>>>>>>> >>>>> july_______________________________________________ >>>>> >>>>>>>> Chebi-ontology mailing list >>>>>>>> Che...@li... >>>>>>>> https://lists.sourceforge.net/lists/listinfo/chebi-ontology >>>>>>>> >>>>>>>> >>>>>>> --- >>>>>>> --- >>>>>>> --- >>>>>>> --------------------------------------------------------------------- >>>>>>> Let Crystal Reports handle the reporting - Free Crystal Reports >>>>>>> 2008 30-Day trial. Simplify your report design, integration and >>>>>>> deployment - and focus on what you do best, core application >>>>>>> coding. Discover what's new with Crystal Reports now. >>>>>>> >>>>> http://p.sf.net/sfu/bobj-july >>>>> >>>>>>> _______________________________________________ >>>>>>> Chebi-ontology mailing list >>>>>>> Che...@li... >>>>>>> https://lists.sourceforge.net/lists/listinfo/chebi-ontology >>>>>>> >>>>>>> >>>>>>> >>>>>> -- >>>>>> Paula de Matos >>>>>> <pm...@eb...> >>>>>> ChEBI & IntEnz Coordinator European Bioinformatics Institute - EMBL >>>>>> >>>>>> >>>>> >>>>> -------------------------------------------------------------------------- >>>>> ---- >>>>> Let Crystal Reports handle the reporting - Free Crystal Reports 2008 >>>>> 30- >>>>> Day >>>>> trial. Simplify your report design, integration and deployment - and >>>>> focus >>>>> on >>>>> what you do best, core application coding. Discover what's new with >>>>> Crystal Reports now. http://p.sf.net/sfu/bobj-july >>>>> _______________________________________________ >>>>> Chebi-ontology mailing list >>>>> Che...@li... >>>>> https://lists.sourceforge.net/lists/listinfo/chebi-ontology >>>>> >> > > |