From: Chris M. <cj...@be...> - 2009-08-27 20:47:23
|
Traveling at the moment so apologies for the cursory answer, more later. Currently in GO you can ask questions of the form [1] What gene products are found in ANY/SOME <C> (formally: for what P does there exist an instance p such that p is located in some C) But you cannot ask questions of the form: [2] What gene products are in ALL/EVERY <C> (formally: for what C does it hold that all instances of C have as part some instance p of type P) You can ask questions of type [1] and then interpret them as if you were asking questions of type [2], but this will often yield false positives, especially higher up the graph. For example, here is the answer to the question "what genes have products are found in SOME HDA1 complexes": http://amigo.geneontology.org/cgi-bin/amigo/term-assoc.cgi?gptype=all&speciesdb=SGD&taxid=all&evcode=all&term_assocs=all&term=GO%3A0070823&action=filter You can interpret this page as the answer to the question "what gene products are found in ALL HDA1 complexes", and in this case you may be correct. But if you start looking at less granular classes then the answer will be wrong if this is the question. I think it would be good to be able to answer both kinds of questions. We will be some way to being able to answer the second kind of question with the CC-PRO xp definitions. But to go all the way would require an extension of the GAF format and a change to annotation practice to explicitly indicate that an annotation indicates that all instances of the complex in a species have the gene product as part. I am interested in pursuing this but I'm not sure where this figures in everyone's priority. On Aug 26, 2009, at 3:13 PM, Karen Christie wrote: > OK, I understand your explanation about "Why should "Queries for U2- > type > spliceosomal complex should ***not*** return gene products localized > to U1 snRNP complex." But I think that is separate from the display > issue I have with has_part. > > However, it also brings up a different issue. A question that a > biologist might want to ask is "give me all the proteins in the major > spliceosome". They would expect to get the proteins present in the 5 > snRNPs (U1, U2, U4, U5, and U6). By replacing part_of relations with > has_part, it seems that they might have difficulty getting a full > answer to that question. In your example where g1 is annotated only to > "U1 snRNP", you give the hypothetical result: > >> There are no genes with products known to be localized to 'U2- >> type >> spliceosomal complex'. >> However, every 'U2-type spliceosomal complex' has the following >> parts: >> a U1 snRNP -- genes: g1 >> a U2 snRNP -- genes: none > > However, if I wanted to get all proteins in the major spliceosome, I > don't want just gene products that are in EVERY 'U2-type spliceosomal > complex', I want gene products that are in ANY 'U2-type spliceosomal > complex'. What do we need to do to make this a question that > biologists > get a reasonable answer to? > > I understand that GO needs to implement more rigorous logic, but if we > do it in a way that prevents biologists from getting reasonable > answers, we are not doing what we set out to do. > > -Karen > > > > > On Mon, 24 Aug 2009, Chris Mungall wrote: > >> >> On Aug 24, 2009, at 10:36 AM, Karen Christie wrote: >> >> [snipped part of dialog for now to focus on one issue] >> >>>> I would strongly advocate that even given sufficient developer >>>> hours it is >>>> better *not* to display the ontology in this way at all. It >>>> perhaps looks >>>> more comforting, but people will make the same comforting >>>> assumptions that >>>> no longer hold. For example, it looks like there is some kind of >>>> transitive >>>> relationship between U1 snRNP and U2-type spliceosomal complex >>>> ***which >>>> there is not***. It looks like the true path rule might hold. >>>> ***it does >>>> not*** . Queries for U2-type spliceosomal complex should >>>> ***not*** return >>>> gene products localized to U1 snRNP complex. >>> Why should "Queries for U2-type spliceosomal complex should >>> ***not*** >>> return gene products localized to U1 snRNP complex."? >> >> OK, good, I think we are circling on on the crux of the issue here. >> >> According to GO, it is not necessarily the case that a gp localized >> to a U1 snRNP is necessarily localized to a U2-type spliceosomal >> complex. For example, the gp may be localized to a particular U1 >> snRNP that is part of a penta-snRNP complex (I may not have chosen >> the best example as we're lacking annotations to this new term, but >> I can pull out an analogous example if you're not convinced) >> >> This can be seen in the sub-graph which I reproduce at the end of >> this email (let me know if this is not visible in some email >> programs, I can make a wiki page with this all on it) - there is no >> path following the arrows from U1 snRNP to U2-type spliceosomal >> complex. Note that attempting to show the graph in an "intuitive" >> way as I have attempted to do below, with the smaller entities at >> the bottom, is actually *misleading* because it leads one to assume >> that there is some inferred all-some relationship between these two >> terms when in fact there is not. >> >> Biology is complex and logic is hard. There's no escaping this. I >> don't believe we should simplify either to the point where we get >> false positives. I do think we need better ways of displaying this >> complex information, but I think we should focus resources on doing >> this in end user-facing tools rather than oboedit, as we would hope >> everyone using oboedit to edit the ontology would have an >> understanding of the logic or a willingness to learn. >> >> Here is a roughly sketched out example of how this could work >> >> Let's say PMID:123 describes an observation of a product of g1 >> being localized to a 'U1 snRNP' via an IDA. Let's say that's all we >> know, either due to the resolution of the assay or that's all that >> the annotator specified. The user queries for 'U2-type spliceosomal >> complex' >> >> The query result screen could show something like: >> >> There are no genes with products known to be localized to 'U2-type >> spliceosomal complex'. >> However, every 'U2-type spliceosomal complex' has the following >> parts: >> a U1 snRNP -- genes: g1 >> a U2 snRNP -- genes: none >> >> A more advanced tool would be able to show even more: >> >> PMID:123 shows g1 in U1 snRNP. >> prob('U2-type spliceosomal complex') = 0.83 >> prob('penta-snRNP complex') = 0.05 >> ... >> >> This leads the user to terms of relevance, shows what is known, >> shows what might be the case, and does not show anything that is >> false. >> >>> I would have >>> thought that they should. If I wanted to know the parts of "U2-type >>> spliceosomal complex" I would want to know all the things that >>> compose >>> the series of complexes that are all considered to be a "U2-type >>> spliceosomal complex". >> >> I'm not sure I really understand the statement, it sounds >> tautological, I don't understand what adding "series of" adds. >> >> If the question is "what parts can be found in every U2-type >> spliceosomal complex" then the answer is found via the has_part >> relation and it's closure. >> >> However, this is a different question from "what gene products have >> been observed to be present in a U2-type spliceosomal complex >> >>> [Note that we need to revise the defs of "U2-type spliceosomal >>> complex >>> ; GO:0005684" (and its sibling "U12-type spliceosomal complex ; >>> GO:0005689") to be consistent with the def of the parent term >>> "spliceosomal complex ; GO:0005681" and specify that these terms >>> represent series of complexes. I'll submit a SF item for this.] >> >> https://sourceforge.net/tracker/index.php?func=detail&aid=2843718&group_id=36855&atid=440764 >> >> I don't understand the motivation here. I think the definitions >> should employ a consistent style, but I would change the parent from: >> >> GO:0005681 ! spliceosomal complex [DEF: "Any of a series of >> ribonucleoprotein complexes that... >> >> to >> >> GO:0005681 ! spliceosomal complex [DEF: "A ribonucleoprotein >> complex that... >> >> I'm not sure how adding "any of a series of..." changes the >> meaning, it seems to just add extra verbiage that obfuscates the >> definition. >> >> >> > |