pro-obo-discuss Mailing List for Protein Ontology (PRO)
Brought to you by:
darren_natale
You can subscribe to this list here.
| 2009 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
(20) |
Sep
|
Oct
(23) |
Nov
|
Dec
|
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2010 |
Jan
(1) |
Feb
|
Mar
|
Apr
(1) |
May
(1) |
Jun
|
Jul
|
Aug
(2) |
Sep
|
Oct
|
Nov
|
Dec
(1) |
| 2011 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(1) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
|
From: Darren N. <da...@ge...> - 2011-07-13 17:21:20
|
Just a reminder for anyone or any resource referencing Protein Ontology terms: PRO IDs are of the form PR:xxxxxxxxx (x=digit), NOT of the form PRO:xxxxxxxxx. That is, use PR: instead of PRO: (or, for URIs, PR_). Please check your ontology and modify accordingly. Best regards, Darren Natale on behalf of the PRO Consortium |
|
From: Darren N. <da...@ge...> - 2010-12-22 16:56:39
|
Apologies for cross-posts. This message is to announce that the Protein Ontology will be changing its ID space from PRO: to PR: (e.g. PRO:000000563 will become PR:000000563) effective with release 16.0 (to occur in February 2011, barring unforeseen delays). The change will be implemented to avoid clashing with UniProtKB feature identifiers indicating processed protein subsequences (PRO_), as we would like to cross-reference these identifiers. Previews of this change can be found in the alternative PRO source files downloadable at ftp://ftp.pir.georgetown.edu/databases/ontology/pro_obo_new/ and at the alternative PRO web site http://pir.georgetown.edu/pro/pro_new.shtml. The contents of these alternatives will ultimately replace the current default views. Note that for now both the original PURL for individual PRO entries http://purl.obolibrary.org/obo/PRO_000000563 and the new PURL http://purl.obolibrary.org/obo/PR_000000563 resolve to the same view (displaying the current PRO: ID format). In the future the view will switch to the new ID format. Please do not hesitate to contact PRO for questions or comments. -- Darren Natale on behalf of the PRO Consortium |
|
From: Darren N. <da...@ge...> - 2010-08-19 13:24:56
|
Hi Chris, Thanks for pointing this out. We have made the correction and will soon upload the fixed obo file. To answer your question: No, the obo file is not generated from a database. The obo file is modified in three ways. Most often, information from our RACE-PRO interface is collected and formatted to create the stanza, after which curators use OBO Edit to make any necessary modifications. Alternatively, OBO Edit is used as the editing interface. Other changes are made by large-scale population of terms (you can recognize these because they'll have [PRO:DNx] as the evidence). Finally, and least often, changes to the obo file are made directly using a text editor. I will guess that the error you indicated came from the last method. All releases are loaded into OBO Edit one final time to make sure they'll work, but I recall you saying once before that this program does not enforce proper syntax. We look forward to the availability of the syntax checker. Best regards, Darren Chris Mungall wrote: > missing stanza header for axin: > > [Term] > id: PRO:000004527 > name: axin-2 > def: "An axin that is a translation product of the AXIN2 gene or a 1:1 > ortholog thereof." [PRO:DNx] > comment: Category=gene. > synonym: "Axil" EXACT [DNx] > synonym: "Axin-like protein" EXACT [] > synonym: "AXIN2" RELATED [] > synonym: "axis inhibition protein 2" EXACT [] > synonym: "conductin" EXACT [] > is_a: PRO:000025824 ! axin > > id: PRO:000025824 > name: axin > def: "A protein that contains a copy of the Regulator of G protein > signaling domain (PF00615), followed by a copy of the Axin beta- > catenin binding domain (PF08833) and a C-terminal copy of the DIX > domain (PF00778). Axins play key roles in WNT signalling." [PMID: > 17404597, PMID:19909509, PMID:15067197, PRO:CNA] > comment: Category=family. > synonym: "axis inhibition protein" EXACT [] > xref: PIRSF:PIRSF038234 > is_a: PRO:000000001 ! protein > > is pro.obo generated automatically? from a relational database? is the > code available? > > there will shortly be available a precise syntactic specification and > java syntax checkers > > ------------------------------------------------------------------------------ > This SF.net email is sponsored by > > Make an app they can't live without > Enter the BlackBerry Developer Challenge > http://p.sf.net/sfu/RIM-dev2dev > _______________________________________________ > pro-obo-discuss mailing list > pro...@li... > https://lists.sourceforge.net/lists/listinfo/pro-obo-discuss |
|
From: Chris M. <cjm...@lb...> - 2010-08-19 04:52:09
|
missing stanza header for axin: [Term] id: PRO:000004527 name: axin-2 def: "An axin that is a translation product of the AXIN2 gene or a 1:1 ortholog thereof." [PRO:DNx] comment: Category=gene. synonym: "Axil" EXACT [DNx] synonym: "Axin-like protein" EXACT [] synonym: "AXIN2" RELATED [] synonym: "axis inhibition protein 2" EXACT [] synonym: "conductin" EXACT [] is_a: PRO:000025824 ! axin id: PRO:000025824 name: axin def: "A protein that contains a copy of the Regulator of G protein signaling domain (PF00615), followed by a copy of the Axin beta- catenin binding domain (PF08833) and a C-terminal copy of the DIX domain (PF00778). Axins play key roles in WNT signalling." [PMID: 17404597, PMID:19909509, PMID:15067197, PRO:CNA] comment: Category=family. synonym: "axis inhibition protein" EXACT [] xref: PIRSF:PIRSF038234 is_a: PRO:000000001 ! protein is pro.obo generated automatically? from a relational database? is the code available? there will shortly be available a precise syntactic specification and java syntax checkers |
|
From: Chris M. <cj...@be...> - 2010-05-01 01:44:52
|
single intersection_of tag: [Term] id: PRO:000025028 name: transcriptional regulatory protein dpiA isoform 1 phosphorylated 1 def: "A sensor kinase dpiB isoform 1 that has been phosphorylated by dpiB on the Asp residue within the receiver domain." [EcoCyc:PHOSPHO- DPIA] comment: Category=modification. synonym: "DpiA-Pasp" EXACT [] synonym: "CitB-Pasp" EXACT [] intersection_of: PRO:000025026 ! transcriptional regulatory protein dpiA isoform 1 |
|
From: Darren N. <da...@ge...> - 2010-04-20 18:17:55
|
Dear PRO users, Updated in-progress versions of the PRO OBO file will henceforth be available at http://pir.georgetown.edu/projects/pro/pro_wv.obo. The pro_wv.obo file contains the working version of PRO, which will appear between official releases. New working versions will be uploaded once per day whenever a change has been made. Syntax and semantics are expected to be correct, but there is no guarantee of content. We welcome comments, suggestions, and (especially) bug reports regarding the working version. Best regards, The PRO Consortium |
|
From: Darren N. <da...@ge...> - 2010-01-13 16:52:53
|
The PRO Consortium announces the latest release of the Protein Ontology (PRO) which features a vastly expanded coverage of proteins at the species-non-specific gene level (see figure on the PRO home page http://pir.georgetown.edu/pro/ and question four on the Q&A page http://pir.georgetown.edu/pro/PRO_Q&A.pdf for definition). Currently the majority of these added terms are directly under the root node "protein," giving the ontology a flattened look. Building an intermediate family level is ongoing. Some stats: 18238 PRO terms 1093 terms are in the 'modification' category. 941 terms are in the 'sequence' category. 15933 terms are in the 'gene' category. <--- was ~1200 260 terms are in the 'family' category. 1233 terms are annotated, codifying the information from 842 papers. 1781 connections to GO (599 PRO terms). 235 connections to MOD (194 PRO terms). 599 connections to Pfam (358 PRO terms). 325 connections to SO (307 PRO terms). 296 annotations of a phenotype (289 PRO terms). Interested parties can visit the PRO web page at http://pir.georgetown.edu/pro/ Best regards, The PRO Consortium |
|
From: Darren N. <da...@ge...> - 2009-10-30 16:16:43
|
Comments inline. Suzanna Lewis wrote: > > On Oct 21, 2009, at 11:24 AM, Darren Natale wrote: > >> Hi Chris, >> >> You are correct that things like "mouse Shh isoform 1" is covered by >> sequence databases such as UniProtKB. And PRO does indeed intend to >> cover it as well. Why? Because PRO provides not only the generic >> classes to which you referred, but also some quite-specific terms like >> "mouse Shh isoform 1 phosphorylated at XX and YY" which are not covered >> by such databases. The obvious overlap arises because it doesn't make >> sense to specifically exclude one (middle of the hierarchy) class of >> proteins because it is covered elsewhere. A subset of very specific >> terms to be covered by PRO (such as indicated above) are found scattered >> in various resources, but not in any one location. PRO intends to >> consolidate this information and add to it. > > Why not use XPs then? When you reach that point in the ontology, rather > than duplicate with a redundant identifier, just use the external one. > If you don't set any limits on the scope of PRO it could eventually be > attempting to cover everything in the known universe. I exaggerate, but > the point remains--it's just a division of labor and responsibilities. > That, and redundancy hurts the community by creating mapping woes and > general confusion with translation, data sharing, and semantic > interpretation. I can think of two reasons not to do this (which is not to say it shouldn't be done). The first is that not all of our users (some of whom are outside the ontology community) are comfortable with XPs. I'm not sure if the reasons are aesthetic or technical (I suspect the latter). The more important reason (in my mind, at least) is that it means we'll have a split namespace for a single domain. So, when a user wishes to refer to a protein, there will be two places to look for the identifier. This seems un-Foundry like. >> A corollary of the above is that annotation providers might want to >> submit to PRO because UniProtKB IDs for the actual object of annotation >> do not exist. It really depends on how specific one wishes to be in >> specifying the object. > > I don't think so. The UniProtKB IDs for a particular isoform coupled > with the appropriate protein modification IDs (from PRO) would be > sufficient. I assume you mean the modification IDs from PSI-MOD. So the GAF would contain the XPs? >> The plans of the PRO Consortium were published, so to speak, in a >> message dated August 8, 2009 to obo-discuss. In it, the addition of >> species-specific terms was announced, as was the addition of complexes. >> The complexes to be added are more-specific versions of what are in >> GO, so there is no overlap there. In fact, the PRO complexes will have >> the GO complex terms (and IDs) as parents. > > Sorry to have missed seeing this, but I do think this needs more > discussion. A widely broadcasted e-mail that didn't get any specific > responses or questions isn't -really- quite the same thing as confirmed > agreement on a shared understanding. Agreed. There was some response regarding the species addition--but only that part--mostly to make sure we do it right. >> Chris Mungall wrote: >>> On Oct 21, 2009, at 9:22 AM, Alan Ruttenberg wrote: >>> >>>> On Wed, Oct 21, 2009 at 11:47 AM, Michael Ashburner <ma...@ge... >>>>> wrote: >>>>> Alan >>>>> >>>>> Forgive me if I am wrong. PRO is an ONTOLOGY, it DONT DO instances >>>>> which is >>>>> what GO needs >>>> I didn't know GO did anything with instances. Which instances does GO >>>> use? Certainly not instances of gene products - I don't know many >>>> experiments that are about specific single molecules. >>> >>> Strictly speaking you are correct, we care about things such as >>> ""mouse Shh" and "mouse Shh, isoform 1" which denote types. But these >>> are tied to particular sequences, which are instances. >>> >>> Regardless of the instance/class and database/ontology distinctions, >>> Michael's objection stands. A number of us thought that PRO was >>> principally concerned with providing identifiers for *generic* protein >>> types such as "interleukin", "CD4", "alpha-synuclein" and so on. I >>> believe this is extremely useful. >>> >>> What is unclear is how far PRO intends to extend down into *specific* >>> protein types, such as "mouse Shh isoform 1", a realm already covered >>> by sequence databases such as UniProtKB, and what additional value PRO >>> will provide balanced against the additional ID-mapping headaches >>> caused by yet another contender in this overpopulated space (we >>> already have Ensembl, The UniProtKB family of databases, NCBI/EMBL/ >>> DDBJ IDs, NCBI-specific IDs to name but a few - and that's leaving out >>> MOD IDs). >>> >>> Going back to your original email: >>> >>>>>> Regarding the first benefit, submitting un-annotated gene >>>>>> products, it >>>>>> seems that the appropriate place to submit these would be PRO. I >>>>>> think >>>>>> GO would then retrieve PRO to get the estimates they are concerned >>>>>> with. >>>>>> >>>>>> Otherwise we risk having two places for such information, which >>>>>> reduces the effectiveness of both, and entailing costly >>>>>> synchronization efforts. >>> >>> I think you have to make a clear argument why annotation providers >>> should have to go through the extremely time-consuming effort of >>> submitting to PRO when UniProtKB IDs already exist. Version 6.0 of PRO >>> *only* has the generic species-neutral proteins and isoforms. Whilst >>> useful for other purposes, it's not clear that these are useful here. >>> Your proposal at the least seems premature. >>> >>> It might be an idea for PRO to circulate an email regarding its intent >>> to extend it's scope to obo-discuss, and where that scope overlaps >>> with other informatics resources to have some kind of plan or MOU >>> regarding how to partition the work or at least synchronize what they >>> are doing. >>> >>> I also heard recently that PRO intends to represent protein complexes, >>> something I thought was in the domain of GO. I think this needs more >>> open discussion as well. >>> >>> My personal preference would be to see PRO focus efforts on providing >>> comprehensive coverage of generic proteins - from a selfish GO point >>> of view, at least enough to complete the necessary logical definitions >>> for BP, CC and CL. >>> >>>> -Alan >>>> >>>> >>>>> M >>>>> On 21 Oct 2009, at 15:58, Alan Ruttenberg wrote: >>>>> >>>>>> On Wed, Oct 21, 2009 at 10:15 AM, Judith Blake >>>>>> <Jud...@ja...> >>>>>> wrote: >>>>>>> Alan, >>>>>>> >>>>>>> We need to represent all proteins in GO for the reasons >>>>>>> mentioned. These >>>>>>> would carry PRO xrefs as well as xref to UniProt, NP. >>>>>> PRO should not be used as a dbxref - it should be used as the actual >>>>>> ids for the gene products. >>>>>> >>>>>>> The point being that >>>>>>> folks searching GO resources via protein IDs need to be able to >>>>>>> enter not >>>>>>> only IDs that have GO annotations, but IDs for proteins that >>>>>>> don’t have >>>>>>> GO >>>>>>> annotations. So they can recover datasets +/- GO annotations for >>>>>>> all >>>>>>> proteins of interest >>>>>> This is a tooling issue. All that means is that the GO resource >>>>>> tools >>>>>> should be able to search PRO. >>>>>> >>>>>> -Alan >>>>>> . >>>>>>> Also, while PRO will provide IDs for all mouse and human, the IDs >>>>>>> and >>>>>>> intersections for other organisms, now included in GO, will come >>>>>>> later. >>>>>>> >>>>>>> Judy >>>>>>> >>>>>>> >>>>>>> On 10/20/09 6:20 PM, "Alan Ruttenberg" <ala...@gm...> >>>>>>> wrote: >>>>>>> >>>>>>> Regarding the first benefit, submitting un-annotated gene >>>>>>> products, it >>>>>>> seems that the appropriate place to submit these would be PRO. I >>>>>>> think >>>>>>> GO would then retrieve PRO to get the estimates they are concerned >>>>>>> with. >>>>>>> >>>>>>> Otherwise we risk having two places for such information, which >>>>>>> reduces the effectiveness of both, and entailing costly >>>>>>> synchronization efforts. >>>>>>> >>>>>>> -Alan >>>>>>> >>>>>>> On Mon, Oct 19, 2009 at 12:25 PM, Amelia Ireland <aj...@eb...> >>>>>>> wrote: >>>>>>>> Hello annotators, >>>>>>>> >>>>>>>> I've written up a proposal for a new format for the annotation and >>>>>>>> gp2protein files which would separate gene product data from >>>>>>>> annotation >>>>>>>> data, thereby allowing unannotated gene products to be submitted >>>>>>>> to the >>>>>>>> GO >>>>>>>> database. It also incorporates the plans for col 17, for >>>>>>>> annotating >>>>>>>> spliceforms. Please have a look at the proposal here: >>>>>>>> >>>>>>>> http://wiki.geneontology.org/index.php/Annotation_File_Format_Proposal >>>>>>>> >>>>>>>> >>>>>>>> Feedback / questions / etc. happily received. >>>>>>>> >>>>>>>> Cheers, >>>>>>>> Amelia. >>>>>>>> >>>>>>>> -- >>>>>>>> Amelia Ireland >>>>>>>> GO Editorial Office >>>>>>>> http://www.berkeleybop.org || http://www.ebi.ac.uk >>>>>>>> Boycott Trader Joe's Red List seafood: http://traitorjoe.com/ >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Annotation mailing list >>>>>>>> Ann...@ge... >>>>>>>> http://fafner.stanford.edu/mailman/listinfo/annotation >>>>>>>> >>>>>>> _______________________________________________ >>>>>>> Annotation mailing list >>>>>>> Ann...@ge... >>>>>>> http://fafner.stanford.edu/mailman/listinfo/annotation >>>>>>> >>>>>>> >>>>>> _______________________________________________ >>>>>> Obo-coordinators mailing list >>>>>> Obo...@ob... >>>>>> http://mail.fruitfly.org/mailman/listinfo/obo-coordinators >>>>> >>>> ------------------------------------------------------------------------------ >>>> >>>> Come build with us! The BlackBerry(R) Developer Conference in SF, CA >>>> is the only developer event you need to attend this year. Jumpstart >>>> your >>>> developing skills, take BlackBerry mobile applications to market and >>>> stay >>>> ahead of the curve. Join us from November 9 - 12, 2009. Register now! >>>> http://p.sf.net/sfu/devconference >>>> _______________________________________________ >>>> pro-obo-discuss mailing list >>>> pro...@li... >>>> https://lists.sourceforge.net/lists/listinfo/pro-obo-discuss >>>> >>> >>> >>> ------------------------------------------------------------------------------ >>> >>> Come build with us! The BlackBerry(R) Developer Conference in SF, CA >>> is the only developer event you need to attend this year. Jumpstart your >>> developing skills, take BlackBerry mobile applications to market and >>> stay >>> ahead of the curve. Join us from November 9 - 12, 2009. Register now! >>> http://p.sf.net/sfu/devconference >>> _______________________________________________ >>> pro-obo-discuss mailing list >>> pro...@li... >>> https://lists.sourceforge.net/lists/listinfo/pro-obo-discuss >> >> ------------------------------------------------------------------------------ >> >> Come build with us! The BlackBerry(R) Developer Conference in SF, CA >> is the only developer event you need to attend this year. Jumpstart your >> developing skills, take BlackBerry mobile applications to market and stay >> ahead of the curve. Join us from November 9 - 12, 2009. Register now! >> http://p.sf.net/sfu/devconference >> _______________________________________________ >> pro-obo-discuss mailing list >> pro...@li... >> https://lists.sourceforge.net/lists/listinfo/pro-obo-discuss >> > |
|
From: Darren N. <da...@ge...> - 2009-10-30 15:39:44
|
Apologies for the delayed response. The slides you refer to I believe are from my poster presentation. These have been made available from the PRO wiki under Documents under discussion. https://pir5.georgetown.edu/wiki/PRO#Documents_under_discussion The pdf is entitled "PRO Complex Ontology Framework" and concentrates mostly on the complexes in PRO. The first three priority work items listed (expanded ID space, species-specific terms, and complexes) all came from the November meeting, though the first one has been a long-time goal discussed and requested many times. The addition of general terms was not discussed at any meeting, but arose from discussions with a number of potential users (mostly reaction databases). Suzanna Lewis wrote: > BTW, Darren I just located Cathy's slides from the ICBO meeting and > these were a big help to me because they had so much more information > than the email message. While there are still things to discuss, perhaps > you would send these slides out to everyone as they offer a more > complete explanation. Also more information about the workshop itself, > who was there and why those 4 priority work items for PRO were picked > (like, what the "use case" was for each of them?). I was at the November > meeting almost a year ago, and I'm fairly certain these didn't come out > of that meeting, but I don't see anything on the mailing list between > November 08 and last August. > > -S > > On Oct 21, 2009, at 11:24 AM, Darren Natale wrote: > >> Hi Chris, >> >> You are correct that things like "mouse Shh isoform 1" is covered by >> sequence databases such as UniProtKB. And PRO does indeed intend to >> cover it as well. Why? Because PRO provides not only the generic >> classes to which you referred, but also some quite-specific terms like >> "mouse Shh isoform 1 phosphorylated at XX and YY" which are not covered >> by such databases. The obvious overlap arises because it doesn't make >> sense to specifically exclude one (middle of the hierarchy) class of >> proteins because it is covered elsewhere. A subset of very specific >> terms to be covered by PRO (such as indicated above) are found scattered >> in various resources, but not in any one location. PRO intends to >> consolidate this information and add to it. >> >> A corollary of the above is that annotation providers might want to >> submit to PRO because UniProtKB IDs for the actual object of annotation >> do not exist. It really depends on how specific one wishes to be in >> specifying the object. >> >> The plans of the PRO Consortium were published, so to speak, in a >> message dated August 8, 2009 to obo-discuss. In it, the addition of >> species-specific terms was announced, as was the addition of complexes. >> The complexes to be added are more-specific versions of what are in >> GO, so there is no overlap there. In fact, the PRO complexes will have >> the GO complex terms (and IDs) as parents. >> >> Chris Mungall wrote: >>> On Oct 21, 2009, at 9:22 AM, Alan Ruttenberg wrote: >>> >>>> On Wed, Oct 21, 2009 at 11:47 AM, Michael Ashburner <ma...@ge... >>>>> wrote: >>>>> Alan >>>>> >>>>> Forgive me if I am wrong. PRO is an ONTOLOGY, it DONT DO instances >>>>> which is >>>>> what GO needs >>>> I didn't know GO did anything with instances. Which instances does GO >>>> use? Certainly not instances of gene products - I don't know many >>>> experiments that are about specific single molecules. >>> >>> Strictly speaking you are correct, we care about things such as >>> ""mouse Shh" and "mouse Shh, isoform 1" which denote types. But these >>> are tied to particular sequences, which are instances. >>> >>> Regardless of the instance/class and database/ontology distinctions, >>> Michael's objection stands. A number of us thought that PRO was >>> principally concerned with providing identifiers for *generic* protein >>> types such as "interleukin", "CD4", "alpha-synuclein" and so on. I >>> believe this is extremely useful. >>> >>> What is unclear is how far PRO intends to extend down into *specific* >>> protein types, such as "mouse Shh isoform 1", a realm already covered >>> by sequence databases such as UniProtKB, and what additional value PRO >>> will provide balanced against the additional ID-mapping headaches >>> caused by yet another contender in this overpopulated space (we >>> already have Ensembl, The UniProtKB family of databases, NCBI/EMBL/ >>> DDBJ IDs, NCBI-specific IDs to name but a few - and that's leaving out >>> MOD IDs). >>> >>> Going back to your original email: >>> >>>>>> Regarding the first benefit, submitting un-annotated gene >>>>>> products, it >>>>>> seems that the appropriate place to submit these would be PRO. I >>>>>> think >>>>>> GO would then retrieve PRO to get the estimates they are concerned >>>>>> with. >>>>>> >>>>>> Otherwise we risk having two places for such information, which >>>>>> reduces the effectiveness of both, and entailing costly >>>>>> synchronization efforts. >>> >>> I think you have to make a clear argument why annotation providers >>> should have to go through the extremely time-consuming effort of >>> submitting to PRO when UniProtKB IDs already exist. Version 6.0 of PRO >>> *only* has the generic species-neutral proteins and isoforms. Whilst >>> useful for other purposes, it's not clear that these are useful here. >>> Your proposal at the least seems premature. >>> >>> It might be an idea for PRO to circulate an email regarding its intent >>> to extend it's scope to obo-discuss, and where that scope overlaps >>> with other informatics resources to have some kind of plan or MOU >>> regarding how to partition the work or at least synchronize what they >>> are doing. >>> >>> I also heard recently that PRO intends to represent protein complexes, >>> something I thought was in the domain of GO. I think this needs more >>> open discussion as well. >>> >>> My personal preference would be to see PRO focus efforts on providing >>> comprehensive coverage of generic proteins - from a selfish GO point >>> of view, at least enough to complete the necessary logical definitions >>> for BP, CC and CL. >>> >>>> -Alan >>>> >>>> >>>>> M >>>>> On 21 Oct 2009, at 15:58, Alan Ruttenberg wrote: >>>>> >>>>>> On Wed, Oct 21, 2009 at 10:15 AM, Judith Blake >>>>>> <Jud...@ja...> >>>>>> wrote: >>>>>>> Alan, >>>>>>> >>>>>>> We need to represent all proteins in GO for the reasons >>>>>>> mentioned. These >>>>>>> would carry PRO xrefs as well as xref to UniProt, NP. >>>>>> PRO should not be used as a dbxref - it should be used as the actual >>>>>> ids for the gene products. >>>>>> >>>>>>> The point being that >>>>>>> folks searching GO resources via protein IDs need to be able to >>>>>>> enter not >>>>>>> only IDs that have GO annotations, but IDs for proteins that >>>>>>> don’t have >>>>>>> GO >>>>>>> annotations. So they can recover datasets +/- GO annotations for >>>>>>> all >>>>>>> proteins of interest >>>>>> This is a tooling issue. All that means is that the GO resource >>>>>> tools >>>>>> should be able to search PRO. >>>>>> >>>>>> -Alan >>>>>> . >>>>>>> Also, while PRO will provide IDs for all mouse and human, the IDs >>>>>>> and >>>>>>> intersections for other organisms, now included in GO, will come >>>>>>> later. >>>>>>> >>>>>>> Judy >>>>>>> >>>>>>> >>>>>>> On 10/20/09 6:20 PM, "Alan Ruttenberg" <ala...@gm...> >>>>>>> wrote: >>>>>>> >>>>>>> Regarding the first benefit, submitting un-annotated gene >>>>>>> products, it >>>>>>> seems that the appropriate place to submit these would be PRO. I >>>>>>> think >>>>>>> GO would then retrieve PRO to get the estimates they are concerned >>>>>>> with. >>>>>>> >>>>>>> Otherwise we risk having two places for such information, which >>>>>>> reduces the effectiveness of both, and entailing costly >>>>>>> synchronization efforts. >>>>>>> >>>>>>> -Alan >>>>>>> >>>>>>> On Mon, Oct 19, 2009 at 12:25 PM, Amelia Ireland <aj...@eb...> >>>>>>> wrote: >>>>>>>> Hello annotators, >>>>>>>> >>>>>>>> I've written up a proposal for a new format for the annotation and >>>>>>>> gp2protein files which would separate gene product data from >>>>>>>> annotation >>>>>>>> data, thereby allowing unannotated gene products to be submitted >>>>>>>> to the >>>>>>>> GO >>>>>>>> database. It also incorporates the plans for col 17, for >>>>>>>> annotating >>>>>>>> spliceforms. Please have a look at the proposal here: >>>>>>>> >>>>>>>> http://wiki.geneontology.org/index.php/Annotation_File_Format_Proposal >>>>>>>> >>>>>>>> >>>>>>>> Feedback / questions / etc. happily received. >>>>>>>> >>>>>>>> Cheers, >>>>>>>> Amelia. >>>>>>>> >>>>>>>> -- >>>>>>>> Amelia Ireland >>>>>>>> GO Editorial Office >>>>>>>> http://www.berkeleybop.org || http://www.ebi.ac.uk >>>>>>>> Boycott Trader Joe's Red List seafood: http://traitorjoe.com/ >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Annotation mailing list >>>>>>>> Ann...@ge... >>>>>>>> http://fafner.stanford.edu/mailman/listinfo/annotation >>>>>>>> >>>>>>> _______________________________________________ >>>>>>> Annotation mailing list >>>>>>> Ann...@ge... >>>>>>> http://fafner.stanford.edu/mailman/listinfo/annotation >>>>>>> >>>>>>> >>>>>> _______________________________________________ >>>>>> Obo-coordinators mailing list >>>>>> Obo...@ob... >>>>>> http://mail.fruitfly.org/mailman/listinfo/obo-coordinators >>>>> >>>> ------------------------------------------------------------------------------ >>>> >>>> Come build with us! The BlackBerry(R) Developer Conference in SF, CA >>>> is the only developer event you need to attend this year. Jumpstart >>>> your >>>> developing skills, take BlackBerry mobile applications to market and >>>> stay >>>> ahead of the curve. Join us from November 9 - 12, 2009. Register now! >>>> http://p.sf.net/sfu/devconference >>>> _______________________________________________ >>>> pro-obo-discuss mailing list >>>> pro...@li... >>>> https://lists.sourceforge.net/lists/listinfo/pro-obo-discuss >>>> >>> >>> >>> ------------------------------------------------------------------------------ >>> >>> Come build with us! The BlackBerry(R) Developer Conference in SF, CA >>> is the only developer event you need to attend this year. Jumpstart your >>> developing skills, take BlackBerry mobile applications to market and >>> stay >>> ahead of the curve. Join us from November 9 - 12, 2009. Register now! >>> http://p.sf.net/sfu/devconference >>> _______________________________________________ >>> pro-obo-discuss mailing list >>> pro...@li... >>> https://lists.sourceforge.net/lists/listinfo/pro-obo-discuss >> >> ------------------------------------------------------------------------------ >> >> Come build with us! The BlackBerry(R) Developer Conference in SF, CA >> is the only developer event you need to attend this year. Jumpstart your >> developing skills, take BlackBerry mobile applications to market and stay >> ahead of the curve. Join us from November 9 - 12, 2009. Register now! >> http://p.sf.net/sfu/devconference >> _______________________________________________ >> pro-obo-discuss mailing list >> pro...@li... >> https://lists.sourceforge.net/lists/listinfo/pro-obo-discuss >> > |
|
From: Karen C. <kc...@ge...> - 2009-10-22 19:52:42
|
I assume Chris cc'd me on this mostly with respect to the section about complexes, as we've had previous discussions about TFIIH, so I've got some comments to add on that section, inserted inline, but have snipped the rest of email. -Karen On Wed, 21 Oct 2009, Chris Mungall wrote: > > On Aug 4, 2009, at 4:08 PM, Darren Natale wrote: > [snip] >> 4) PRO will add protein complexes. This will not replace the general >> complexes named in GO. However, within PRO it will be possible to >> capture the distinct complexes and sub-complexes as they exist in >> different species, defining them through their component proteins and >> annotating them through GO terms. It will also be possible to assign >> separate terms for active and inactive forms, whether they are derived >> from post-translational modification of one or more components or from >> addition/subtraction of one or more components. Work is proceeding >> cautiously on this front, so such terms will appear in the mid-term. > > Let's work through a specific example of this. > > Currently GO has: > > id: GO:0000439 > name: core TFIIH complex > namespace: cellular_component > def: "The 5 subunit core of TFIIH that has tightly associated subunits and is > found in both the general transcription factor holo-TFIIH and in the > nucleotide-excision repair factor 3 complex. In S. cerevisiae, it is composed > of Rad3, Tfb1, Tfb2, Ssl1, Tfb4. In humans, it is composed of XPD, p62, p55, > p44, p34." [GOC:krc, PMID:14500720, PMID:7813015] > > This complex class is fairly generic in that it appears to be instantiated > across at least Fungi/Metazoa. > > We do not yet have a computable definition for this. There are gene product > annotations to this complex, but: > > * they are species-specific > * they are incomplete in some species > * an annotation (currently) is weaker than a computable statement that "all > instances of core TFIIH complex in yeast have Rad3, Tfb1, Tfb2, Ssl1, Tfb4", > because without additional information we can infer nothing more from the > annotation that the fact that *some* core TFIIH complexes have these proteins > in these species. > > Ideally we would have a computable definition for this generic complex. > > One way to do it would be define it as the mereological sum of 5 generic > proteins, using PRO IDs. This would work in this particular case if the 5 > proteins were conserved across eukaryotes, and PRO could provide the relevant > IDs. But this definition would carry a stronger commitment than the current > GO text definition, which does not state full necessary and sufficient > conditions for all species, but rather states N+S conditions for two species > only. I'm not sure if we'd find branches in which proteins are gained or lost > but this is the kind of thing we will have to consider. In this case, if I remember correctly I think that it is clear that the five proteins in human and cerevisiae core TFIIH are equivalent to each other. However, over the years in writing GO definitions, I've learned that it seems to work best if I only specify the subunit compositions I specifically know. I have come across quite a number of complexes where many subunits are conserved, but some are not. I believe TFIID may fall into this class; most of the genes characterized as TFIID subunits (called TAFs) have a one to one correspondance across several species (human, worm, fly, two yeasts), but some species have two copies of a given TAF and other species are completely missing a given TAF. So the complex appears to be largely conserved with some species specific differences. Going by my experiences in annotating, I think this situation is fairly common. > Others complexes will have interesting evolutionary stories and will > thus be even harder to define. I have an interesting example for you here too, the FACT complex. I contacted several researchers who work on this for input in constructing the current GO definition, so it does seem clear that the research community considers both the mammalian and the yeast version of the complex to be a form of the FACT complex, despite some differences, which the def does a good job of explaining: GO term: FACT complex (GO:0035101) Definition: An abundant nuclear complex, which was originally identified in mammalian systems as a factor required for transcription elongation on chromatin templates. The FACT complex has been shown to destablilize the interaction between the H2A/H2B dimer and the H3/H4 tetramer of the nucleosome, thus reorganizing the structure of the nucleosome. In this way, the FACT complex may play a role in DNA replication and other processes that traverse the chromatin, as well as in transcription elongation. FACT is composed of two proteins that are evolutionarily conserved in all eukaryotes and homologous to mammalian Spt16 and SSRP1. In metazoans, the SSRP1 homolog contains an HMG domain; however in fungi and protists, it does not. For example, in S. cerevisiae the Pob3 protein is homologous to SSRP1, but lacks the HMG chromatin binding domain. Instead, the yFACT complex of Spt16p and Pob3p, binds to nucleosomes where multiple copies of the HMG-domain containing protein Nhp6p have already bound, but Nhp6p does not form a stable complex with the Spt16p/Pob3p heterodimer. > Another approach would be to define either species or taxon-specific > subclasses of the GO complex in PRO, and give mereological definitions to > these using species or taxon-specific PRO IDs. This is weaker, but conversely > you have less chance of accidentally stating something false. > > Either way, having has_part relationships between complexes and proteins > would be very useful. > |
|
From: Judith B. <Jud...@ja...> - 2009-10-22 13:25:28
|
Thank you Michael for bringing us back to the core discussion Can you remind us why we are not yet loading full sets of gene products into AmiGO db? We decided to do this long ago. Thanks Judy On 10/22/09 9:21 AM, "Mike Cherry" <ch...@st...> wrote: Its all very nice that a new resource has been created and has a very valid justification of providing more information. However the point of this discussion is not about PRO. It was about a file format used by the GOC. The need for IDs is a discussion of what is currently used by annotation projects. A campaign for the use of PRO should not be at the level of the GOC. Rather it must be available and used by curators. If they don't or cannot use it for annotation GOC cannot require it in a file format that covers many annotation projects. Whether a MOD uses PRO is up to them not the GOC. Any change in an annotation system is not trivial and will require a significant amount of time. -Mike |
|
From: Mike C. <ch...@st...> - 2009-10-22 13:21:34
|
Its all very nice that a new resource has been created and has a very valid justification of providing more information. However the point of this discussion is not about PRO. It was about a file format used by the GOC. The need for IDs is a discussion of what is currently used by annotation projects. A campaign for the use of PRO should not be at the level of the GOC. Rather it must be available and used by curators. If they don't or cannot use it for annotation GOC cannot require it in a file format that covers many annotation projects. Whether a MOD uses PRO is up to them not the GOC. Any change in an annotation system is not trivial and will require a significant amount of time. -Mike |
|
From: Darren N. <da...@ge...> - 2009-10-22 12:40:19
|
You describe exactly our thinking for incorporating specific complexes in PRO and for defining the general complexes in GO. Chris Mungall wrote: > > On Aug 4, 2009, at 4:08 PM, Darren Natale wrote: > >> The scope of PRO is, as the name suggests, all proteins. However, the >> current focus of PRO has been on mostly specific forms of proteins found >> in humans and mice, and even then only the subset that has some impact >> on disease or that has been requested by users. Accordingly, PRO is >> relatively small and deep (where necessary), and lacks many terms that >> might be needed. In the past six months the PRO Consortium has had >> numerous discussions with current and potential users to determine how >> to best meet the needs of the community. These discussions have led to >> several new directions for the near, mid and long term. Herein these >> new directions will be described. >> >> 1) PRO will add species-specific terms. Currently the terms in PRO are >> species-neutral (though based on human and mouse proteins). For >> example, the term "smad2" actually refers to the translation products of >> any SMAD2 gene in any organism. Within the next month or two, PRO will >> add terms for "human smad2" and "mouse smad2" (for example). >> >> 2) PRO will greatly expand its coverage of human and mouse proteins. >> Within the next month or two, PRO will add terms for nearly all human >> and mouse proteins (the exceptions being uncharacterized/hypothetical >> proteins) at the gene level (see figure on the PRO home page >> http://pir.georgetown.edu/pro/ and question four on the Q&A page >> http://pir.georgetown.edu/pro/PRO_Q&A.pdf for definition). These will >> be the species-specific terms as described above. The non-specific >> terms (like "smad2") will begin to appear within a month or two beyond >> that. >> >> 3) PRO will expand it coverage of species. Within the next month or two >> PRO will also include proteins found in E. coli. In the mid-to-long >> term, PRO intends to cover all model organisms. >> >> 4) PRO will add protein complexes. This will not replace the general >> complexes named in GO. However, within PRO it will be possible to >> capture the distinct complexes and sub-complexes as they exist in >> different species, defining them through their component proteins and >> annotating them through GO terms. It will also be possible to assign >> separate terms for active and inactive forms, whether they are derived >> from post-translational modification of one or more components or from >> addition/subtraction of one or more components. Work is proceeding >> cautiously on this front, so such terms will appear in the mid-term. > > Let's work through a specific example of this. > > Currently GO has: > > id: GO:0000439 > name: core TFIIH complex > namespace: cellular_component > def: "The 5 subunit core of TFIIH that has tightly associated subunits > and is found in both the general transcription factor holo-TFIIH and in > the nucleotide-excision repair factor 3 complex. In S. cerevisiae, it is > composed of Rad3, Tfb1, Tfb2, Ssl1, Tfb4. In humans, it is composed of > XPD, p62, p55, p44, p34." [GOC:krc, PMID:14500720, PMID:7813015] > > This complex class is fairly generic in that it appears to be > instantiated across at least Fungi/Metazoa. > > We do not yet have a computable definition for this. There are gene > product annotations to this complex, but: > > * they are species-specific > * they are incomplete in some species > * an annotation (currently) is weaker than a computable statement that > "all instances of core TFIIH complex in yeast have Rad3, Tfb1, Tfb2, > Ssl1, Tfb4", because without additional information we can infer nothing > more from the annotation that the fact that *some* core TFIIH complexes > have these proteins in these species. > > Ideally we would have a computable definition for this generic complex. > > One way to do it would be define it as the mereological sum of 5 generic > proteins, using PRO IDs. This would work in this particular case if the > 5 proteins were conserved across eukaryotes, and PRO could provide the > relevant IDs. But this definition would carry a stronger commitment than > the current GO text definition, which does not state full necessary and > sufficient conditions for all species, but rather states N+S conditions > for two species only. I'm not sure if we'd find branches in which > proteins are gained or lost but this is the kind of thing we will have > to consider. Others complexes will have interesting evolutionary stories > and will thus be even harder to define. > > Another approach would be to define either species or taxon-specific > subclasses of the GO complex in PRO, and give mereological definitions > to these using species or taxon-specific PRO IDs. This is weaker, but > conversely you have less chance of accidentally stating something false. > > Either way, having has_part relationships between complexes and proteins > would be very useful. > >> 5) PRO will add general protein terms. Right now, PRO contains >> highly-specific terms for protein forms (such as "smad2 isoform 1 >> phosphorylated form" and its children). Such terms meet the needs of >> users that know something about the specific sequence of the protein of >> interest, but left a void for those that did not. Thus, something a bit >> more general, such as "smad2 phosphorylated form", will be introduced >> (along with very general terms such as "phosphoprotein"). These terms >> will appear within the next month or two. >> >> As described above, the bulk of the work will take place in the very >> near future. We believe the result will be of benefit to many potential >> users, especially those developing pathway-related and text-tagging >> resources. Details will be posted and discussed via the PRO mailing >> list (pro...@li...) in the weeks to come. >> >> Best regards, >> PRO Consortium >> >> ------------------------------------------------------------------------------ >> >> Let Crystal Reports handle the reporting - Free Crystal Reports 2008 >> 30-Day >> trial. Simplify your report design, integration and deployment - and >> focus on >> what you do best, core application coding. Discover what's new with >> Crystal Reports now. http://p.sf.net/sfu/bobj-july >> _______________________________________________ >> pro-obo-discuss mailing list >> pro...@li... >> https://lists.sourceforge.net/lists/listinfo/pro-obo-discuss >> > |
|
From: Chris M. <cj...@be...> - 2009-10-22 00:22:59
|
On Aug 4, 2009, at 4:08 PM, Darren Natale wrote: > The scope of PRO is, as the name suggests, all proteins. However, the > current focus of PRO has been on mostly specific forms of proteins > found > in humans and mice, and even then only the subset that has some impact > on disease or that has been requested by users. Accordingly, PRO is > relatively small and deep (where necessary), and lacks many terms that > might be needed. In the past six months the PRO Consortium has had > numerous discussions with current and potential users to determine how > to best meet the needs of the community. These discussions have led > to > several new directions for the near, mid and long term. Herein these > new directions will be described. > > 1) PRO will add species-specific terms. Currently the terms in PRO > are > species-neutral (though based on human and mouse proteins). For > example, the term "smad2" actually refers to the translation > products of > any SMAD2 gene in any organism. Within the next month or two, PRO > will > add terms for "human smad2" and "mouse smad2" (for example). > > 2) PRO will greatly expand its coverage of human and mouse proteins. > Within the next month or two, PRO will add terms for nearly all human > and mouse proteins (the exceptions being uncharacterized/hypothetical > proteins) at the gene level (see figure on the PRO home page > http://pir.georgetown.edu/pro/ and question four on the Q&A page > http://pir.georgetown.edu/pro/PRO_Q&A.pdf for definition). These will > be the species-specific terms as described above. The non-specific > terms (like "smad2") will begin to appear within a month or two beyond > that. > > 3) PRO will expand it coverage of species. Within the next month or > two > PRO will also include proteins found in E. coli. In the mid-to-long > term, PRO intends to cover all model organisms. > > 4) PRO will add protein complexes. This will not replace the general > complexes named in GO. However, within PRO it will be possible to > capture the distinct complexes and sub-complexes as they exist in > different species, defining them through their component proteins and > annotating them through GO terms. It will also be possible to assign > separate terms for active and inactive forms, whether they are derived > from post-translational modification of one or more components or from > addition/subtraction of one or more components. Work is proceeding > cautiously on this front, so such terms will appear in the mid-term. Let's work through a specific example of this. Currently GO has: id: GO:0000439 name: core TFIIH complex namespace: cellular_component def: "The 5 subunit core of TFIIH that has tightly associated subunits and is found in both the general transcription factor holo-TFIIH and in the nucleotide-excision repair factor 3 complex. In S. cerevisiae, it is composed of Rad3, Tfb1, Tfb2, Ssl1, Tfb4. In humans, it is composed of XPD, p62, p55, p44, p34." [GOC:krc, PMID:14500720, PMID: 7813015] This complex class is fairly generic in that it appears to be instantiated across at least Fungi/Metazoa. We do not yet have a computable definition for this. There are gene product annotations to this complex, but: * they are species-specific * they are incomplete in some species * an annotation (currently) is weaker than a computable statement that "all instances of core TFIIH complex in yeast have Rad3, Tfb1, Tfb2, Ssl1, Tfb4", because without additional information we can infer nothing more from the annotation that the fact that *some* core TFIIH complexes have these proteins in these species. Ideally we would have a computable definition for this generic complex. One way to do it would be define it as the mereological sum of 5 generic proteins, using PRO IDs. This would work in this particular case if the 5 proteins were conserved across eukaryotes, and PRO could provide the relevant IDs. But this definition would carry a stronger commitment than the current GO text definition, which does not state full necessary and sufficient conditions for all species, but rather states N+S conditions for two species only. I'm not sure if we'd find branches in which proteins are gained or lost but this is the kind of thing we will have to consider. Others complexes will have interesting evolutionary stories and will thus be even harder to define. Another approach would be to define either species or taxon-specific subclasses of the GO complex in PRO, and give mereological definitions to these using species or taxon-specific PRO IDs. This is weaker, but conversely you have less chance of accidentally stating something false. Either way, having has_part relationships between complexes and proteins would be very useful. > 5) PRO will add general protein terms. Right now, PRO contains > highly-specific terms for protein forms (such as "smad2 isoform 1 > phosphorylated form" and its children). Such terms meet the needs of > users that know something about the specific sequence of the protein > of > interest, but left a void for those that did not. Thus, something a > bit > more general, such as "smad2 phosphorylated form", will be introduced > (along with very general terms such as "phosphoprotein"). These terms > will appear within the next month or two. > > As described above, the bulk of the work will take place in the very > near future. We believe the result will be of benefit to many > potential > users, especially those developing pathway-related and text-tagging > resources. Details will be posted and discussed via the PRO mailing > list (pro...@li...) in the weeks to come. > > Best regards, > PRO Consortium > > ------------------------------------------------------------------------------ > Let Crystal Reports handle the reporting - Free Crystal Reports 2008 > 30-Day > trial. Simplify your report design, integration and deployment - and > focus on > what you do best, core application coding. Discover what's new with > Crystal Reports now. http://p.sf.net/sfu/bobj-july > _______________________________________________ > pro-obo-discuss mailing list > pro...@li... > https://lists.sourceforge.net/lists/listinfo/pro-obo-discuss > |
|
From: Suzanna L. <su...@fr...> - 2009-10-21 23:55:46
|
BTW, Darren I just located Cathy's slides from the ICBO meeting and these were a big help to me because they had so much more information than the email message. While there are still things to discuss, perhaps you would send these slides out to everyone as they offer a more complete explanation. Also more information about the workshop itself, who was there and why those 4 priority work items for PRO were picked (like, what the "use case" was for each of them?). I was at the November meeting almost a year ago, and I'm fairly certain these didn't come out of that meeting, but I don't see anything on the mailing list between November 08 and last August. -S On Oct 21, 2009, at 11:24 AM, Darren Natale wrote: > Hi Chris, > > You are correct that things like "mouse Shh isoform 1" is covered by > sequence databases such as UniProtKB. And PRO does indeed intend to > cover it as well. Why? Because PRO provides not only the generic > classes to which you referred, but also some quite-specific terms like > "mouse Shh isoform 1 phosphorylated at XX and YY" which are not > covered > by such databases. The obvious overlap arises because it doesn't make > sense to specifically exclude one (middle of the hierarchy) class of > proteins because it is covered elsewhere. A subset of very specific > terms to be covered by PRO (such as indicated above) are found > scattered > in various resources, but not in any one location. PRO intends to > consolidate this information and add to it. > > A corollary of the above is that annotation providers might want to > submit to PRO because UniProtKB IDs for the actual object of > annotation > do not exist. It really depends on how specific one wishes to be in > specifying the object. > > The plans of the PRO Consortium were published, so to speak, in a > message dated August 8, 2009 to obo-discuss. In it, the addition of > species-specific terms was announced, as was the addition of > complexes. > The complexes to be added are more-specific versions of what are in > GO, so there is no overlap there. In fact, the PRO complexes will > have > the GO complex terms (and IDs) as parents. > > Chris Mungall wrote: >> On Oct 21, 2009, at 9:22 AM, Alan Ruttenberg wrote: >> >>> On Wed, Oct 21, 2009 at 11:47 AM, Michael Ashburner <ma...@ge... >>>> wrote: >>>> Alan >>>> >>>> Forgive me if I am wrong. PRO is an ONTOLOGY, it DONT DO instances >>>> which is >>>> what GO needs >>> I didn't know GO did anything with instances. Which instances does >>> GO >>> use? Certainly not instances of gene products - I don't know many >>> experiments that are about specific single molecules. >> >> Strictly speaking you are correct, we care about things such as >> ""mouse Shh" and "mouse Shh, isoform 1" which denote types. But these >> are tied to particular sequences, which are instances. >> >> Regardless of the instance/class and database/ontology distinctions, >> Michael's objection stands. A number of us thought that PRO was >> principally concerned with providing identifiers for *generic* >> protein >> types such as "interleukin", "CD4", "alpha-synuclein" and so on. I >> believe this is extremely useful. >> >> What is unclear is how far PRO intends to extend down into *specific* >> protein types, such as "mouse Shh isoform 1", a realm already covered >> by sequence databases such as UniProtKB, and what additional value >> PRO >> will provide balanced against the additional ID-mapping headaches >> caused by yet another contender in this overpopulated space (we >> already have Ensembl, The UniProtKB family of databases, NCBI/EMBL/ >> DDBJ IDs, NCBI-specific IDs to name but a few - and that's leaving >> out >> MOD IDs). >> >> Going back to your original email: >> >>>>> Regarding the first benefit, submitting un-annotated gene >>>>> products, it >>>>> seems that the appropriate place to submit these would be PRO. I >>>>> think >>>>> GO would then retrieve PRO to get the estimates they are concerned >>>>> with. >>>>> >>>>> Otherwise we risk having two places for such information, which >>>>> reduces the effectiveness of both, and entailing costly >>>>> synchronization efforts. >> >> I think you have to make a clear argument why annotation providers >> should have to go through the extremely time-consuming effort of >> submitting to PRO when UniProtKB IDs already exist. Version 6.0 of >> PRO >> *only* has the generic species-neutral proteins and isoforms. Whilst >> useful for other purposes, it's not clear that these are useful here. >> Your proposal at the least seems premature. >> >> It might be an idea for PRO to circulate an email regarding its >> intent >> to extend it's scope to obo-discuss, and where that scope overlaps >> with other informatics resources to have some kind of plan or MOU >> regarding how to partition the work or at least synchronize what they >> are doing. >> >> I also heard recently that PRO intends to represent protein >> complexes, >> something I thought was in the domain of GO. I think this needs more >> open discussion as well. >> >> My personal preference would be to see PRO focus efforts on providing >> comprehensive coverage of generic proteins - from a selfish GO point >> of view, at least enough to complete the necessary logical >> definitions >> for BP, CC and CL. >> >>> -Alan >>> >>> >>>> M >>>> On 21 Oct 2009, at 15:58, Alan Ruttenberg wrote: >>>> >>>>> On Wed, Oct 21, 2009 at 10:15 AM, Judith Blake >>>>> <Jud...@ja...> >>>>> wrote: >>>>>> Alan, >>>>>> >>>>>> We need to represent all proteins in GO for the reasons >>>>>> mentioned. These >>>>>> would carry PRO xrefs as well as xref to UniProt, NP. >>>>> PRO should not be used as a dbxref - it should be used as the >>>>> actual >>>>> ids for the gene products. >>>>> >>>>>> The point being that >>>>>> folks searching GO resources via protein IDs need to be able to >>>>>> enter not >>>>>> only IDs that have GO annotations, but IDs for proteins that >>>>>> don’t have >>>>>> GO >>>>>> annotations. So they can recover datasets +/- GO annotations for >>>>>> all >>>>>> proteins of interest >>>>> This is a tooling issue. All that means is that the GO resource >>>>> tools >>>>> should be able to search PRO. >>>>> >>>>> -Alan >>>>> . >>>>>> Also, while PRO will provide IDs for all mouse and human, the IDs >>>>>> and >>>>>> intersections for other organisms, now included in GO, will come >>>>>> later. >>>>>> >>>>>> Judy >>>>>> >>>>>> >>>>>> On 10/20/09 6:20 PM, "Alan Ruttenberg" <ala...@gm...> >>>>>> wrote: >>>>>> >>>>>> Regarding the first benefit, submitting un-annotated gene >>>>>> products, it >>>>>> seems that the appropriate place to submit these would be PRO. I >>>>>> think >>>>>> GO would then retrieve PRO to get the estimates they are >>>>>> concerned >>>>>> with. >>>>>> >>>>>> Otherwise we risk having two places for such information, which >>>>>> reduces the effectiveness of both, and entailing costly >>>>>> synchronization efforts. >>>>>> >>>>>> -Alan >>>>>> >>>>>> On Mon, Oct 19, 2009 at 12:25 PM, Amelia Ireland <aj...@eb...> >>>>>> wrote: >>>>>>> Hello annotators, >>>>>>> >>>>>>> I've written up a proposal for a new format for the annotation >>>>>>> and >>>>>>> gp2protein files which would separate gene product data from >>>>>>> annotation >>>>>>> data, thereby allowing unannotated gene products to be submitted >>>>>>> to the >>>>>>> GO >>>>>>> database. It also incorporates the plans for col 17, for >>>>>>> annotating >>>>>>> spliceforms. Please have a look at the proposal here: >>>>>>> >>>>>>> http://wiki.geneontology.org/index.php/Annotation_File_Format_Proposal >>>>>>> >>>>>>> Feedback / questions / etc. happily received. >>>>>>> >>>>>>> Cheers, >>>>>>> Amelia. >>>>>>> >>>>>>> -- >>>>>>> Amelia Ireland >>>>>>> GO Editorial Office >>>>>>> http://www.berkeleybop.org || http://www.ebi.ac.uk >>>>>>> Boycott Trader Joe's Red List seafood: http://traitorjoe.com/ >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Annotation mailing list >>>>>>> Ann...@ge... >>>>>>> http://fafner.stanford.edu/mailman/listinfo/annotation >>>>>>> >>>>>> _______________________________________________ >>>>>> Annotation mailing list >>>>>> Ann...@ge... >>>>>> http://fafner.stanford.edu/mailman/listinfo/annotation >>>>>> >>>>>> >>>>> _______________________________________________ >>>>> Obo-coordinators mailing list >>>>> Obo...@ob... >>>>> http://mail.fruitfly.org/mailman/listinfo/obo-coordinators >>>> >>> ------------------------------------------------------------------------------ >>> Come build with us! The BlackBerry(R) Developer Conference in SF, CA >>> is the only developer event you need to attend this year. Jumpstart >>> your >>> developing skills, take BlackBerry mobile applications to market and >>> stay >>> ahead of the curve. Join us from November 9 - 12, 2009. Register >>> now! >>> http://p.sf.net/sfu/devconference >>> _______________________________________________ >>> pro-obo-discuss mailing list >>> pro...@li... >>> https://lists.sourceforge.net/lists/listinfo/pro-obo-discuss >>> >> >> >> ------------------------------------------------------------------------------ >> Come build with us! The BlackBerry(R) Developer Conference in SF, CA >> is the only developer event you need to attend this year. Jumpstart >> your >> developing skills, take BlackBerry mobile applications to market >> and stay >> ahead of the curve. Join us from November 9 - 12, 2009. Register now! >> http://p.sf.net/sfu/devconference >> _______________________________________________ >> pro-obo-discuss mailing list >> pro...@li... >> https://lists.sourceforge.net/lists/listinfo/pro-obo-discuss > > ------------------------------------------------------------------------------ > Come build with us! The BlackBerry(R) Developer Conference in SF, CA > is the only developer event you need to attend this year. Jumpstart > your > developing skills, take BlackBerry mobile applications to market and > stay > ahead of the curve. Join us from November 9 - 12, 2009. Register now! > http://p.sf.net/sfu/devconference > _______________________________________________ > pro-obo-discuss mailing list > pro...@li... > https://lists.sourceforge.net/lists/listinfo/pro-obo-discuss > |
|
From: Chris M. <cj...@be...> - 2009-10-21 23:14:51
|
On Oct 21, 2009, at 11:24 AM, Darren Natale wrote: > Hi Chris, > > You are correct that things like "mouse Shh isoform 1" is covered by > sequence databases such as UniProtKB. And PRO does indeed intend to > cover it as well. Why? Because PRO provides not only the generic > classes to which you referred, but also some quite-specific terms > like "mouse Shh isoform 1 phosphorylated at XX and YY" which are not > covered by such databases. The obvious overlap arises because it > doesn't make sense to specifically exclude one (middle of the > hierarchy) class of proteins because it is covered elsewhere. A > subset of very specific terms to be covered by PRO (such as > indicated above) are found scattered in various resources, but not > in any one location. PRO intends to consolidate this information > and add to it. > > A corollary of the above is that annotation providers might want to > submit to PRO because UniProtKB IDs for the actual object of > annotation do not exist. It really depends on how specific one > wishes to be in specifying the object. > > The plans of the PRO Consortium were published, so to speak, in a > message dated August 8, 2009 to obo-discuss. In it, the addition of > species-specific terms was announced, as was the addition of > complexes. The complexes to be added are more-specific versions of > what are in GO, so there is no overlap there. In fact, the PRO > complexes will have the GO complex terms (and IDs) as parents. Apologies, I forgot about this. You did indeed post a very clear message. Surprisingly little discussion! I'll go back to the original email and response on obo-discuss and pro-obo-discuss annotators, it would be great to have your opinions on this. You can find subscription details for all obo lists on: http://www.obofoundry.org/cgi-bin/discussion.cgi |
|
From: Suzanna L. <su...@fr...> - 2009-10-21 22:33:55
|
On Oct 21, 2009, at 11:24 AM, Darren Natale wrote: > Hi Chris, > > You are correct that things like "mouse Shh isoform 1" is covered by > sequence databases such as UniProtKB. And PRO does indeed intend to > cover it as well. Why? Because PRO provides not only the generic > classes to which you referred, but also some quite-specific terms like > "mouse Shh isoform 1 phosphorylated at XX and YY" which are not > covered > by such databases. The obvious overlap arises because it doesn't make > sense to specifically exclude one (middle of the hierarchy) class of > proteins because it is covered elsewhere. A subset of very specific > terms to be covered by PRO (such as indicated above) are found > scattered > in various resources, but not in any one location. PRO intends to > consolidate this information and add to it. Why not use XPs then? When you reach that point in the ontology, rather than duplicate with a redundant identifier, just use the external one. If you don't set any limits on the scope of PRO it could eventually be attempting to cover everything in the known universe. I exaggerate, but the point remains--it's just a division of labor and responsibilities. That, and redundancy hurts the community by creating mapping woes and general confusion with translation, data sharing, and semantic interpretation. > > A corollary of the above is that annotation providers might want to > submit to PRO because UniProtKB IDs for the actual object of > annotation > do not exist. It really depends on how specific one wishes to be in > specifying the object. I don't think so. The UniProtKB IDs for a particular isoform coupled with the appropriate protein modification IDs (from PRO) would be sufficient. > > The plans of the PRO Consortium were published, so to speak, in a > message dated August 8, 2009 to obo-discuss. In it, the addition of > species-specific terms was announced, as was the addition of > complexes. > The complexes to be added are more-specific versions of what are in > GO, so there is no overlap there. In fact, the PRO complexes will > have > the GO complex terms (and IDs) as parents. Sorry to have missed seeing this, but I do think this needs more discussion. A widely broadcasted e-mail that didn't get any specific responses or questions isn't -really- quite the same thing as confirmed agreement on a shared understanding. -S > > Chris Mungall wrote: >> On Oct 21, 2009, at 9:22 AM, Alan Ruttenberg wrote: >> >>> On Wed, Oct 21, 2009 at 11:47 AM, Michael Ashburner <ma...@ge... >>>> wrote: >>>> Alan >>>> >>>> Forgive me if I am wrong. PRO is an ONTOLOGY, it DONT DO instances >>>> which is >>>> what GO needs >>> I didn't know GO did anything with instances. Which instances does >>> GO >>> use? Certainly not instances of gene products - I don't know many >>> experiments that are about specific single molecules. >> >> Strictly speaking you are correct, we care about things such as >> ""mouse Shh" and "mouse Shh, isoform 1" which denote types. But these >> are tied to particular sequences, which are instances. >> >> Regardless of the instance/class and database/ontology distinctions, >> Michael's objection stands. A number of us thought that PRO was >> principally concerned with providing identifiers for *generic* >> protein >> types such as "interleukin", "CD4", "alpha-synuclein" and so on. I >> believe this is extremely useful. >> >> What is unclear is how far PRO intends to extend down into *specific* >> protein types, such as "mouse Shh isoform 1", a realm already covered >> by sequence databases such as UniProtKB, and what additional value >> PRO >> will provide balanced against the additional ID-mapping headaches >> caused by yet another contender in this overpopulated space (we >> already have Ensembl, The UniProtKB family of databases, NCBI/EMBL/ >> DDBJ IDs, NCBI-specific IDs to name but a few - and that's leaving >> out >> MOD IDs). >> >> Going back to your original email: >> >>>>> Regarding the first benefit, submitting un-annotated gene >>>>> products, it >>>>> seems that the appropriate place to submit these would be PRO. I >>>>> think >>>>> GO would then retrieve PRO to get the estimates they are concerned >>>>> with. >>>>> >>>>> Otherwise we risk having two places for such information, which >>>>> reduces the effectiveness of both, and entailing costly >>>>> synchronization efforts. >> >> I think you have to make a clear argument why annotation providers >> should have to go through the extremely time-consuming effort of >> submitting to PRO when UniProtKB IDs already exist. Version 6.0 of >> PRO >> *only* has the generic species-neutral proteins and isoforms. Whilst >> useful for other purposes, it's not clear that these are useful here. >> Your proposal at the least seems premature. >> >> It might be an idea for PRO to circulate an email regarding its >> intent >> to extend it's scope to obo-discuss, and where that scope overlaps >> with other informatics resources to have some kind of plan or MOU >> regarding how to partition the work or at least synchronize what they >> are doing. >> >> I also heard recently that PRO intends to represent protein >> complexes, >> something I thought was in the domain of GO. I think this needs more >> open discussion as well. >> >> My personal preference would be to see PRO focus efforts on providing >> comprehensive coverage of generic proteins - from a selfish GO point >> of view, at least enough to complete the necessary logical >> definitions >> for BP, CC and CL. >> >>> -Alan >>> >>> >>>> M >>>> On 21 Oct 2009, at 15:58, Alan Ruttenberg wrote: >>>> >>>>> On Wed, Oct 21, 2009 at 10:15 AM, Judith Blake >>>>> <Jud...@ja...> >>>>> wrote: >>>>>> Alan, >>>>>> >>>>>> We need to represent all proteins in GO for the reasons >>>>>> mentioned. These >>>>>> would carry PRO xrefs as well as xref to UniProt, NP. >>>>> PRO should not be used as a dbxref - it should be used as the >>>>> actual >>>>> ids for the gene products. >>>>> >>>>>> The point being that >>>>>> folks searching GO resources via protein IDs need to be able to >>>>>> enter not >>>>>> only IDs that have GO annotations, but IDs for proteins that >>>>>> don’t have >>>>>> GO >>>>>> annotations. So they can recover datasets +/- GO annotations for >>>>>> all >>>>>> proteins of interest >>>>> This is a tooling issue. All that means is that the GO resource >>>>> tools >>>>> should be able to search PRO. >>>>> >>>>> -Alan >>>>> . >>>>>> Also, while PRO will provide IDs for all mouse and human, the IDs >>>>>> and >>>>>> intersections for other organisms, now included in GO, will come >>>>>> later. >>>>>> >>>>>> Judy >>>>>> >>>>>> >>>>>> On 10/20/09 6:20 PM, "Alan Ruttenberg" <ala...@gm...> >>>>>> wrote: >>>>>> >>>>>> Regarding the first benefit, submitting un-annotated gene >>>>>> products, it >>>>>> seems that the appropriate place to submit these would be PRO. I >>>>>> think >>>>>> GO would then retrieve PRO to get the estimates they are >>>>>> concerned >>>>>> with. >>>>>> >>>>>> Otherwise we risk having two places for such information, which >>>>>> reduces the effectiveness of both, and entailing costly >>>>>> synchronization efforts. >>>>>> >>>>>> -Alan >>>>>> >>>>>> On Mon, Oct 19, 2009 at 12:25 PM, Amelia Ireland <aj...@eb...> >>>>>> wrote: >>>>>>> Hello annotators, >>>>>>> >>>>>>> I've written up a proposal for a new format for the annotation >>>>>>> and >>>>>>> gp2protein files which would separate gene product data from >>>>>>> annotation >>>>>>> data, thereby allowing unannotated gene products to be submitted >>>>>>> to the >>>>>>> GO >>>>>>> database. It also incorporates the plans for col 17, for >>>>>>> annotating >>>>>>> spliceforms. Please have a look at the proposal here: >>>>>>> >>>>>>> http://wiki.geneontology.org/index.php/Annotation_File_Format_Proposal >>>>>>> >>>>>>> Feedback / questions / etc. happily received. >>>>>>> >>>>>>> Cheers, >>>>>>> Amelia. >>>>>>> >>>>>>> -- >>>>>>> Amelia Ireland >>>>>>> GO Editorial Office >>>>>>> http://www.berkeleybop.org || http://www.ebi.ac.uk >>>>>>> Boycott Trader Joe's Red List seafood: http://traitorjoe.com/ >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Annotation mailing list >>>>>>> Ann...@ge... >>>>>>> http://fafner.stanford.edu/mailman/listinfo/annotation >>>>>>> >>>>>> _______________________________________________ >>>>>> Annotation mailing list >>>>>> Ann...@ge... >>>>>> http://fafner.stanford.edu/mailman/listinfo/annotation >>>>>> >>>>>> >>>>> _______________________________________________ >>>>> Obo-coordinators mailing list >>>>> Obo...@ob... >>>>> http://mail.fruitfly.org/mailman/listinfo/obo-coordinators >>>> >>> ------------------------------------------------------------------------------ >>> Come build with us! The BlackBerry(R) Developer Conference in SF, CA >>> is the only developer event you need to attend this year. Jumpstart >>> your >>> developing skills, take BlackBerry mobile applications to market and >>> stay >>> ahead of the curve. Join us from November 9 - 12, 2009. Register >>> now! >>> http://p.sf.net/sfu/devconference >>> _______________________________________________ >>> pro-obo-discuss mailing list >>> pro...@li... >>> https://lists.sourceforge.net/lists/listinfo/pro-obo-discuss >>> >> >> >> ------------------------------------------------------------------------------ >> Come build with us! The BlackBerry(R) Developer Conference in SF, CA >> is the only developer event you need to attend this year. Jumpstart >> your >> developing skills, take BlackBerry mobile applications to market >> and stay >> ahead of the curve. Join us from November 9 - 12, 2009. Register now! >> http://p.sf.net/sfu/devconference >> _______________________________________________ >> pro-obo-discuss mailing list >> pro...@li... >> https://lists.sourceforge.net/lists/listinfo/pro-obo-discuss > > ------------------------------------------------------------------------------ > Come build with us! The BlackBerry(R) Developer Conference in SF, CA > is the only developer event you need to attend this year. Jumpstart > your > developing skills, take BlackBerry mobile applications to market and > stay > ahead of the curve. Join us from November 9 - 12, 2009. Register now! > http://p.sf.net/sfu/devconference > _______________________________________________ > pro-obo-discuss mailing list > pro...@li... > https://lists.sourceforge.net/lists/listinfo/pro-obo-discuss > |
|
From: Suzanna L. <su...@fr...> - 2009-10-21 21:51:12
|
Since what started as a single discussion on the write-up of the format changes that we agreed upon in Cambridge has split into 2 separate threads (one on PRO and one sticking to format) I've changed the subject line to reflect this. Mostly I wanted to second everything that Chris has said here: 1. A PRO focus on *generic* protein types, such as "interleukin", "CD4", "alpha-synuclein" and so on, would certainly be useful. Particularly for cross-products to GO terms where these apply. Applause. 2. We are already overwhelmed with IDs for specific proteins and having more of these just makes a bad situation even worse and this must be avoided. 3. For a wealth of reasons, including time-effort-&-avoiding- confusion, it does not make any sense for the MODs to submit requests for additions to PRO when the existing accessions from the sequence databases serve this purpose quite well. 4. If, as rumored, PRO intends to begin representing protein complexes then this should be fully coordinated with the GO project. We're all open to practical discussions to find a good division of labor, but it does require a dialog beforehand (I'm agnostic at the moment on which ontology would be most appropriate to cover it). On Oct 21, 2009, at 11:01 AM, Chris Mungall wrote: > > On Oct 21, 2009, at 9:22 AM, Alan Ruttenberg wrote: > >> On Wed, Oct 21, 2009 at 11:47 AM, Michael Ashburner <ma...@ge... >> > wrote: >>> Alan >>> >>> Forgive me if I am wrong. PRO is an ONTOLOGY, it DONT DO instances >>> which is >>> what GO needs >> >> I didn't know GO did anything with instances. Which instances does GO >> use? Certainly not instances of gene products - I don't know many >> experiments that are about specific single molecules. > > Strictly speaking you are correct, we care about things such as > ""mouse Shh" and "mouse Shh, isoform 1" which denote types. But > these are tied to particular sequences, which are instances. > > Regardless of the instance/class and database/ontology distinctions, > Michael's objection stands. A number of us thought that PRO was > principally concerned with providing identifiers for *generic* > protein types such as "interleukin", "CD4", "alpha-synuclein" and so > on. I believe this is extremely useful. > > What is unclear is how far PRO intends to extend down into > *specific* protein types, such as "mouse Shh isoform 1", a realm > already covered by sequence databases such as UniProtKB, and what > additional value PRO will provide balanced against the additional ID- > mapping headaches caused by yet another contender in this > overpopulated space (we already have Ensembl, The UniProtKB family > of databases, NCBI/EMBL/DDBJ IDs, NCBI-specific IDs to name but a > few - and that's leaving out MOD IDs). > > Going back to your original email: > >>>> Regarding the first benefit, submitting un-annotated gene >>>> products, it >>>> seems that the appropriate place to submit these would be PRO. I >>>> think >>>> GO would then retrieve PRO to get the estimates they are concerned >>>> with. >>>> >>>> Otherwise we risk having two places for such information, which >>>> reduces the effectiveness of both, and entailing costly >>>> synchronization efforts. > > I think you have to make a clear argument why annotation providers > should have to go through the extremely time-consuming effort of > submitting to PRO when UniProtKB IDs already exist. Version 6.0 of > PRO *only* has the generic species-neutral proteins and isoforms. > Whilst useful for other purposes, it's not clear that these are > useful here. Your proposal at the least seems premature. > > It might be an idea for PRO to circulate an email regarding its > intent to extend it's scope to obo-discuss, and where that scope > overlaps with other informatics resources to have some kind of plan > or MOU regarding how to partition the work or at least synchronize > what they are doing. > > I also heard recently that PRO intends to represent protein > complexes, something I thought was in the domain of GO. I think this > needs more open discussion as well. > > My personal preference would be to see PRO focus efforts on > providing comprehensive coverage of generic proteins - from a > selfish GO point of view, at least enough to complete the necessary > logical definitions for BP, CC and CL. > >> -Alan >> >> >>> >>> M >>> On 21 Oct 2009, at 15:58, Alan Ruttenberg wrote: >>> >>>> On Wed, Oct 21, 2009 at 10:15 AM, Judith Blake <Jud...@ja... >>>> > >>>> wrote: >>>>> >>>>> Alan, >>>>> >>>>> We need to represent all proteins in GO for the reasons >>>>> mentioned. These >>>>> would carry PRO xrefs as well as xref to UniProt, NP. >>>> >>>> PRO should not be used as a dbxref - it should be used as the >>>> actual >>>> ids for the gene products. >>>> >>>>> The point being that >>>>> folks searching GO resources via protein IDs need to be able to >>>>> enter not >>>>> only IDs that have GO annotations, but IDs for proteins that >>>>> don’t have >>>>> GO >>>>> annotations. So they can recover datasets +/- GO annotations >>>>> for all >>>>> proteins of interest >>>> >>>> This is a tooling issue. All that means is that the GO resource >>>> tools >>>> should be able to search PRO. >>>> >>>> -Alan >>>> . >>>>> >>>>> Also, while PRO will provide IDs for all mouse and human, the >>>>> IDs and >>>>> intersections for other organisms, now included in GO, will come >>>>> later. >>>>> >>>>> Judy >>>>> >>>>> >>>>> On 10/20/09 6:20 PM, "Alan Ruttenberg" >>>>> <ala...@gm...> wrote: >>>>> >>>>> Regarding the first benefit, submitting un-annotated gene >>>>> products, it >>>>> seems that the appropriate place to submit these would be PRO. I >>>>> think >>>>> GO would then retrieve PRO to get the estimates they are concerned >>>>> with. >>>>> >>>>> Otherwise we risk having two places for such information, which >>>>> reduces the effectiveness of both, and entailing costly >>>>> synchronization efforts. >>>>> >>>>> -Alan >>>>> >>>>> On Mon, Oct 19, 2009 at 12:25 PM, Amelia Ireland <aj...@eb...> >>>>> wrote: >>>>>> >>>>>> Hello annotators, >>>>>> >>>>>> I've written up a proposal for a new format for the annotation >>>>>> and >>>>>> gp2protein files which would separate gene product data from >>>>>> annotation >>>>>> data, thereby allowing unannotated gene products to be >>>>>> submitted to the >>>>>> GO >>>>>> database. It also incorporates the plans for col 17, for >>>>>> annotating >>>>>> spliceforms. Please have a look at the proposal here: >>>>>> >>>>>> http://wiki.geneontology.org/index.php/Annotation_File_Format_Proposal >>>>>> >>>>>> Feedback / questions / etc. happily received. >>>>>> >>>>>> Cheers, >>>>>> Amelia. >>>>>> >>>>>> -- >>>>>> Amelia Ireland >>>>>> GO Editorial Office >>>>>> http://www.berkeleybop.org || http://www.ebi.ac.uk >>>>>> Boycott Trader Joe's Red List seafood: http://traitorjoe.com/ >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Annotation mailing list >>>>>> Ann...@ge... >>>>>> http://fafner.stanford.edu/mailman/listinfo/annotation >>>>>> >>>>> _______________________________________________ >>>>> Annotation mailing list >>>>> Ann...@ge... >>>>> http://fafner.stanford.edu/mailman/listinfo/annotation >>>>> >>>>> >>>> _______________________________________________ >>>> Obo-coordinators mailing list >>>> Obo...@ob... >>>> http://mail.fruitfly.org/mailman/listinfo/obo-coordinators >>> >>> >> >> ------------------------------------------------------------------------------ >> Come build with us! The BlackBerry(R) Developer Conference in SF, CA >> is the only developer event you need to attend this year. Jumpstart >> your >> developing skills, take BlackBerry mobile applications to market >> and stay >> ahead of the curve. Join us from November 9 - 12, 2009. Register now! >> http://p.sf.net/sfu/devconference >> _______________________________________________ >> pro-obo-discuss mailing list >> pro...@li... >> https://lists.sourceforge.net/lists/listinfo/pro-obo-discuss >> > > _______________________________________________ > Obo-coordinators mailing list > Obo...@ob... > http://mail.fruitfly.org/mailman/listinfo/obo-coordinators > |
|
From: Darren N. <da...@ge...> - 2009-10-21 18:26:18
|
Hi Chris, You are correct that things like "mouse Shh isoform 1" is covered by sequence databases such as UniProtKB. And PRO does indeed intend to cover it as well. Why? Because PRO provides not only the generic classes to which you referred, but also some quite-specific terms like "mouse Shh isoform 1 phosphorylated at XX and YY" which are not covered by such databases. The obvious overlap arises because it doesn't make sense to specifically exclude one (middle of the hierarchy) class of proteins because it is covered elsewhere. A subset of very specific terms to be covered by PRO (such as indicated above) are found scattered in various resources, but not in any one location. PRO intends to consolidate this information and add to it. A corollary of the above is that annotation providers might want to submit to PRO because UniProtKB IDs for the actual object of annotation do not exist. It really depends on how specific one wishes to be in specifying the object. The plans of the PRO Consortium were published, so to speak, in a message dated August 8, 2009 to obo-discuss. In it, the addition of species-specific terms was announced, as was the addition of complexes. The complexes to be added are more-specific versions of what are in GO, so there is no overlap there. In fact, the PRO complexes will have the GO complex terms (and IDs) as parents. Chris Mungall wrote: > On Oct 21, 2009, at 9:22 AM, Alan Ruttenberg wrote: > >> On Wed, Oct 21, 2009 at 11:47 AM, Michael Ashburner <ma...@ge... >>> wrote: >>> Alan >>> >>> Forgive me if I am wrong. PRO is an ONTOLOGY, it DONT DO instances >>> which is >>> what GO needs >> I didn't know GO did anything with instances. Which instances does GO >> use? Certainly not instances of gene products - I don't know many >> experiments that are about specific single molecules. > > Strictly speaking you are correct, we care about things such as > ""mouse Shh" and "mouse Shh, isoform 1" which denote types. But these > are tied to particular sequences, which are instances. > > Regardless of the instance/class and database/ontology distinctions, > Michael's objection stands. A number of us thought that PRO was > principally concerned with providing identifiers for *generic* protein > types such as "interleukin", "CD4", "alpha-synuclein" and so on. I > believe this is extremely useful. > > What is unclear is how far PRO intends to extend down into *specific* > protein types, such as "mouse Shh isoform 1", a realm already covered > by sequence databases such as UniProtKB, and what additional value PRO > will provide balanced against the additional ID-mapping headaches > caused by yet another contender in this overpopulated space (we > already have Ensembl, The UniProtKB family of databases, NCBI/EMBL/ > DDBJ IDs, NCBI-specific IDs to name but a few - and that's leaving out > MOD IDs). > > Going back to your original email: > >>>> Regarding the first benefit, submitting un-annotated gene >>>> products, it >>>> seems that the appropriate place to submit these would be PRO. I >>>> think >>>> GO would then retrieve PRO to get the estimates they are concerned >>>> with. >>>> >>>> Otherwise we risk having two places for such information, which >>>> reduces the effectiveness of both, and entailing costly >>>> synchronization efforts. > > I think you have to make a clear argument why annotation providers > should have to go through the extremely time-consuming effort of > submitting to PRO when UniProtKB IDs already exist. Version 6.0 of PRO > *only* has the generic species-neutral proteins and isoforms. Whilst > useful for other purposes, it's not clear that these are useful here. > Your proposal at the least seems premature. > > It might be an idea for PRO to circulate an email regarding its intent > to extend it's scope to obo-discuss, and where that scope overlaps > with other informatics resources to have some kind of plan or MOU > regarding how to partition the work or at least synchronize what they > are doing. > > I also heard recently that PRO intends to represent protein complexes, > something I thought was in the domain of GO. I think this needs more > open discussion as well. > > My personal preference would be to see PRO focus efforts on providing > comprehensive coverage of generic proteins - from a selfish GO point > of view, at least enough to complete the necessary logical definitions > for BP, CC and CL. > >> -Alan >> >> >>> M >>> On 21 Oct 2009, at 15:58, Alan Ruttenberg wrote: >>> >>>> On Wed, Oct 21, 2009 at 10:15 AM, Judith Blake >>>> <Jud...@ja...> >>>> wrote: >>>>> Alan, >>>>> >>>>> We need to represent all proteins in GO for the reasons >>>>> mentioned. These >>>>> would carry PRO xrefs as well as xref to UniProt, NP. >>>> PRO should not be used as a dbxref - it should be used as the actual >>>> ids for the gene products. >>>> >>>>> The point being that >>>>> folks searching GO resources via protein IDs need to be able to >>>>> enter not >>>>> only IDs that have GO annotations, but IDs for proteins that >>>>> don’t have >>>>> GO >>>>> annotations. So they can recover datasets +/- GO annotations for >>>>> all >>>>> proteins of interest >>>> This is a tooling issue. All that means is that the GO resource >>>> tools >>>> should be able to search PRO. >>>> >>>> -Alan >>>> . >>>>> Also, while PRO will provide IDs for all mouse and human, the IDs >>>>> and >>>>> intersections for other organisms, now included in GO, will come >>>>> later. >>>>> >>>>> Judy >>>>> >>>>> >>>>> On 10/20/09 6:20 PM, "Alan Ruttenberg" <ala...@gm...> >>>>> wrote: >>>>> >>>>> Regarding the first benefit, submitting un-annotated gene >>>>> products, it >>>>> seems that the appropriate place to submit these would be PRO. I >>>>> think >>>>> GO would then retrieve PRO to get the estimates they are concerned >>>>> with. >>>>> >>>>> Otherwise we risk having two places for such information, which >>>>> reduces the effectiveness of both, and entailing costly >>>>> synchronization efforts. >>>>> >>>>> -Alan >>>>> >>>>> On Mon, Oct 19, 2009 at 12:25 PM, Amelia Ireland <aj...@eb...> >>>>> wrote: >>>>>> Hello annotators, >>>>>> >>>>>> I've written up a proposal for a new format for the annotation and >>>>>> gp2protein files which would separate gene product data from >>>>>> annotation >>>>>> data, thereby allowing unannotated gene products to be submitted >>>>>> to the >>>>>> GO >>>>>> database. It also incorporates the plans for col 17, for >>>>>> annotating >>>>>> spliceforms. Please have a look at the proposal here: >>>>>> >>>>>> http://wiki.geneontology.org/index.php/Annotation_File_Format_Proposal >>>>>> >>>>>> Feedback / questions / etc. happily received. >>>>>> >>>>>> Cheers, >>>>>> Amelia. >>>>>> >>>>>> -- >>>>>> Amelia Ireland >>>>>> GO Editorial Office >>>>>> http://www.berkeleybop.org || http://www.ebi.ac.uk >>>>>> Boycott Trader Joe's Red List seafood: http://traitorjoe.com/ >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Annotation mailing list >>>>>> Ann...@ge... >>>>>> http://fafner.stanford.edu/mailman/listinfo/annotation >>>>>> >>>>> _______________________________________________ >>>>> Annotation mailing list >>>>> Ann...@ge... >>>>> http://fafner.stanford.edu/mailman/listinfo/annotation >>>>> >>>>> >>>> _______________________________________________ >>>> Obo-coordinators mailing list >>>> Obo...@ob... >>>> http://mail.fruitfly.org/mailman/listinfo/obo-coordinators >>> >> ------------------------------------------------------------------------------ >> Come build with us! The BlackBerry(R) Developer Conference in SF, CA >> is the only developer event you need to attend this year. Jumpstart >> your >> developing skills, take BlackBerry mobile applications to market and >> stay >> ahead of the curve. Join us from November 9 - 12, 2009. Register now! >> http://p.sf.net/sfu/devconference >> _______________________________________________ >> pro-obo-discuss mailing list >> pro...@li... >> https://lists.sourceforge.net/lists/listinfo/pro-obo-discuss >> > > > ------------------------------------------------------------------------------ > Come build with us! The BlackBerry(R) Developer Conference in SF, CA > is the only developer event you need to attend this year. Jumpstart your > developing skills, take BlackBerry mobile applications to market and stay > ahead of the curve. Join us from November 9 - 12, 2009. Register now! > http://p.sf.net/sfu/devconference > _______________________________________________ > pro-obo-discuss mailing list > pro...@li... > https://lists.sourceforge.net/lists/listinfo/pro-obo-discuss |
|
From: Judith B. <Jud...@ja...> - 2009-10-21 18:21:20
|
Chris, all PRO extension to complexes is to represent specific complexes relative to specific protein membership on a taxon and variant level. These complexes would have a relationship to the generic GO complexes. We are increasingly seeing experimental literature that documents the changes in functioning of complexes based on the construction of the complex and on the cellular milieus in which it is present. To document this, we need to represent the specific complex. This work is now funded as a collaboration between MGI, PRO, and Reactome. It is designed to complement not duplicate the work of GO. In regards to the relationship between PRO and UniProt. Yes PRO provides generic ID, and too, it is within the context of an ontological structure. PRO from the start is dedicated to providing the representation of relationships between isoforms and variants. UniProt does not provide an ontological structure for organizing the relationships between protein forms. At the moment, UniProt collects isoforms and variants within one UniProt record. PRO will be able to supply an ID for forms not represented elsewhere. For example, a phosphorylated protein or a functional fragment such as Notch receptor C-terminus for which we want to functionally annotate. PRO in this context is an structured ID set that has intersections with GO and UniProt. >From my perspective, PRO complements the work of UniProt and increases to ability to understand the complexities of protein forms and their relationships to each other in an ontological context. Judy On 10/21/09 2:01 PM, "Chris Mungall" <cj...@be...> wrote: On Oct 21, 2009, at 9:22 AM, Alan Ruttenberg wrote: > On Wed, Oct 21, 2009 at 11:47 AM, Michael Ashburner <ma...@ge... > > wrote: >> Alan >> >> Forgive me if I am wrong. PRO is an ONTOLOGY, it DONT DO instances >> which is >> what GO needs > > I didn't know GO did anything with instances. Which instances does GO > use? Certainly not instances of gene products - I don't know many > experiments that are about specific single molecules. Strictly speaking you are correct, we care about things such as ""mouse Shh" and "mouse Shh, isoform 1" which denote types. But these are tied to particular sequences, which are instances. Regardless of the instance/class and database/ontology distinctions, Michael's objection stands. A number of us thought that PRO was principally concerned with providing identifiers for *generic* protein types such as "interleukin", "CD4", "alpha-synuclein" and so on. I believe this is extremely useful. What is unclear is how far PRO intends to extend down into *specific* protein types, such as "mouse Shh isoform 1", a realm already covered by sequence databases such as UniProtKB, and what additional value PRO will provide balanced against the additional ID-mapping headaches caused by yet another contender in this overpopulated space (we already have Ensembl, The UniProtKB family of databases, NCBI/EMBL/ DDBJ IDs, NCBI-specific IDs to name but a few - and that's leaving out MOD IDs). Going back to your original email: >>> Regarding the first benefit, submitting un-annotated gene >>> products, it >>> seems that the appropriate place to submit these would be PRO. I >>> think >>> GO would then retrieve PRO to get the estimates they are concerned >>> with. >>> >>> Otherwise we risk having two places for such information, which >>> reduces the effectiveness of both, and entailing costly >>> synchronization efforts. I think you have to make a clear argument why annotation providers should have to go through the extremely time-consuming effort of submitting to PRO when UniProtKB IDs already exist. Version 6.0 of PRO *only* has the generic species-neutral proteins and isoforms. Whilst useful for other purposes, it's not clear that these are useful here. Your proposal at the least seems premature. It might be an idea for PRO to circulate an email regarding its intent to extend it's scope to obo-discuss, and where that scope overlaps with other informatics resources to have some kind of plan or MOU regarding how to partition the work or at least synchronize what they are doing. I also heard recently that PRO intends to represent protein complexes, something I thought was in the domain of GO. I think this needs more open discussion as well. My personal preference would be to see PRO focus efforts on providing comprehensive coverage of generic proteins - from a selfish GO point of view, at least enough to complete the necessary logical definitions for BP, CC and CL. > -Alan > > >> >> M >> On 21 Oct 2009, at 15:58, Alan Ruttenberg wrote: >> >>> On Wed, Oct 21, 2009 at 10:15 AM, Judith Blake >>> <Jud...@ja...> >>> wrote: >>>> >>>> Alan, >>>> >>>> We need to represent all proteins in GO for the reasons >>>> mentioned. These >>>> would carry PRO xrefs as well as xref to UniProt, NP. >>> >>> PRO should not be used as a dbxref - it should be used as the actual >>> ids for the gene products. >>> >>>> The point being that >>>> folks searching GO resources via protein IDs need to be able to >>>> enter not >>>> only IDs that have GO annotations, but IDs for proteins that >>>> don't have >>>> GO >>>> annotations. So they can recover datasets +/- GO annotations for >>>> all >>>> proteins of interest >>> >>> This is a tooling issue. All that means is that the GO resource >>> tools >>> should be able to search PRO. >>> >>> -Alan >>> . >>>> >>>> Also, while PRO will provide IDs for all mouse and human, the IDs >>>> and >>>> intersections for other organisms, now included in GO, will come >>>> later. >>>> >>>> Judy >>>> >>>> >>>> On 10/20/09 6:20 PM, "Alan Ruttenberg" <ala...@gm...> >>>> wrote: >>>> >>>> Regarding the first benefit, submitting un-annotated gene >>>> products, it >>>> seems that the appropriate place to submit these would be PRO. I >>>> think >>>> GO would then retrieve PRO to get the estimates they are concerned >>>> with. >>>> >>>> Otherwise we risk having two places for such information, which >>>> reduces the effectiveness of both, and entailing costly >>>> synchronization efforts. >>>> >>>> -Alan >>>> >>>> On Mon, Oct 19, 2009 at 12:25 PM, Amelia Ireland <aj...@eb...> >>>> wrote: >>>>> >>>>> Hello annotators, >>>>> >>>>> I've written up a proposal for a new format for the annotation and >>>>> gp2protein files which would separate gene product data from >>>>> annotation >>>>> data, thereby allowing unannotated gene products to be submitted >>>>> to the >>>>> GO >>>>> database. It also incorporates the plans for col 17, for >>>>> annotating >>>>> spliceforms. Please have a look at the proposal here: >>>>> >>>>> http://wiki.geneontology.org/index.php/Annotation_File_Format_Proposal >>>>> >>>>> Feedback / questions / etc. happily received. >>>>> >>>>> Cheers, >>>>> Amelia. >>>>> >>>>> -- >>>>> Amelia Ireland >>>>> GO Editorial Office >>>>> http://www.berkeleybop.org || http://www.ebi.ac.uk >>>>> Boycott Trader Joe's Red List seafood: http://traitorjoe.com/ >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> Annotation mailing list >>>>> Ann...@ge... >>>>> http://fafner.stanford.edu/mailman/listinfo/annotation >>>>> >>>> _______________________________________________ >>>> Annotation mailing list >>>> Ann...@ge... >>>> http://fafner.stanford.edu/mailman/listinfo/annotation >>>> >>>> >>> _______________________________________________ >>> Obo-coordinators mailing list >>> Obo...@ob... >>> http://mail.fruitfly.org/mailman/listinfo/obo-coordinators >> >> > > ------------------------------------------------------------------------------ > Come build with us! The BlackBerry(R) Developer Conference in SF, CA > is the only developer event you need to attend this year. Jumpstart > your > developing skills, take BlackBerry mobile applications to market and > stay > ahead of the curve. Join us from November 9 - 12, 2009. Register now! > http://p.sf.net/sfu/devconference > _______________________________________________ > pro-obo-discuss mailing list > pro...@li... > https://lists.sourceforge.net/lists/listinfo/pro-obo-discuss > |
|
From: Chris M. <cj...@be...> - 2009-10-21 18:01:52
|
On Oct 21, 2009, at 9:22 AM, Alan Ruttenberg wrote: > On Wed, Oct 21, 2009 at 11:47 AM, Michael Ashburner <ma...@ge... > > wrote: >> Alan >> >> Forgive me if I am wrong. PRO is an ONTOLOGY, it DONT DO instances >> which is >> what GO needs > > I didn't know GO did anything with instances. Which instances does GO > use? Certainly not instances of gene products - I don't know many > experiments that are about specific single molecules. Strictly speaking you are correct, we care about things such as ""mouse Shh" and "mouse Shh, isoform 1" which denote types. But these are tied to particular sequences, which are instances. Regardless of the instance/class and database/ontology distinctions, Michael's objection stands. A number of us thought that PRO was principally concerned with providing identifiers for *generic* protein types such as "interleukin", "CD4", "alpha-synuclein" and so on. I believe this is extremely useful. What is unclear is how far PRO intends to extend down into *specific* protein types, such as "mouse Shh isoform 1", a realm already covered by sequence databases such as UniProtKB, and what additional value PRO will provide balanced against the additional ID-mapping headaches caused by yet another contender in this overpopulated space (we already have Ensembl, The UniProtKB family of databases, NCBI/EMBL/ DDBJ IDs, NCBI-specific IDs to name but a few - and that's leaving out MOD IDs). Going back to your original email: >>> Regarding the first benefit, submitting un-annotated gene >>> products, it >>> seems that the appropriate place to submit these would be PRO. I >>> think >>> GO would then retrieve PRO to get the estimates they are concerned >>> with. >>> >>> Otherwise we risk having two places for such information, which >>> reduces the effectiveness of both, and entailing costly >>> synchronization efforts. I think you have to make a clear argument why annotation providers should have to go through the extremely time-consuming effort of submitting to PRO when UniProtKB IDs already exist. Version 6.0 of PRO *only* has the generic species-neutral proteins and isoforms. Whilst useful for other purposes, it's not clear that these are useful here. Your proposal at the least seems premature. It might be an idea for PRO to circulate an email regarding its intent to extend it's scope to obo-discuss, and where that scope overlaps with other informatics resources to have some kind of plan or MOU regarding how to partition the work or at least synchronize what they are doing. I also heard recently that PRO intends to represent protein complexes, something I thought was in the domain of GO. I think this needs more open discussion as well. My personal preference would be to see PRO focus efforts on providing comprehensive coverage of generic proteins - from a selfish GO point of view, at least enough to complete the necessary logical definitions for BP, CC and CL. > -Alan > > >> >> M >> On 21 Oct 2009, at 15:58, Alan Ruttenberg wrote: >> >>> On Wed, Oct 21, 2009 at 10:15 AM, Judith Blake >>> <Jud...@ja...> >>> wrote: >>>> >>>> Alan, >>>> >>>> We need to represent all proteins in GO for the reasons >>>> mentioned. These >>>> would carry PRO xrefs as well as xref to UniProt, NP. >>> >>> PRO should not be used as a dbxref - it should be used as the actual >>> ids for the gene products. >>> >>>> The point being that >>>> folks searching GO resources via protein IDs need to be able to >>>> enter not >>>> only IDs that have GO annotations, but IDs for proteins that >>>> don’t have >>>> GO >>>> annotations. So they can recover datasets +/- GO annotations for >>>> all >>>> proteins of interest >>> >>> This is a tooling issue. All that means is that the GO resource >>> tools >>> should be able to search PRO. >>> >>> -Alan >>> . >>>> >>>> Also, while PRO will provide IDs for all mouse and human, the IDs >>>> and >>>> intersections for other organisms, now included in GO, will come >>>> later. >>>> >>>> Judy >>>> >>>> >>>> On 10/20/09 6:20 PM, "Alan Ruttenberg" <ala...@gm...> >>>> wrote: >>>> >>>> Regarding the first benefit, submitting un-annotated gene >>>> products, it >>>> seems that the appropriate place to submit these would be PRO. I >>>> think >>>> GO would then retrieve PRO to get the estimates they are concerned >>>> with. >>>> >>>> Otherwise we risk having two places for such information, which >>>> reduces the effectiveness of both, and entailing costly >>>> synchronization efforts. >>>> >>>> -Alan >>>> >>>> On Mon, Oct 19, 2009 at 12:25 PM, Amelia Ireland <aj...@eb...> >>>> wrote: >>>>> >>>>> Hello annotators, >>>>> >>>>> I've written up a proposal for a new format for the annotation and >>>>> gp2protein files which would separate gene product data from >>>>> annotation >>>>> data, thereby allowing unannotated gene products to be submitted >>>>> to the >>>>> GO >>>>> database. It also incorporates the plans for col 17, for >>>>> annotating >>>>> spliceforms. Please have a look at the proposal here: >>>>> >>>>> http://wiki.geneontology.org/index.php/Annotation_File_Format_Proposal >>>>> >>>>> Feedback / questions / etc. happily received. >>>>> >>>>> Cheers, >>>>> Amelia. >>>>> >>>>> -- >>>>> Amelia Ireland >>>>> GO Editorial Office >>>>> http://www.berkeleybop.org || http://www.ebi.ac.uk >>>>> Boycott Trader Joe's Red List seafood: http://traitorjoe.com/ >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> Annotation mailing list >>>>> Ann...@ge... >>>>> http://fafner.stanford.edu/mailman/listinfo/annotation >>>>> >>>> _______________________________________________ >>>> Annotation mailing list >>>> Ann...@ge... >>>> http://fafner.stanford.edu/mailman/listinfo/annotation >>>> >>>> >>> _______________________________________________ >>> Obo-coordinators mailing list >>> Obo...@ob... >>> http://mail.fruitfly.org/mailman/listinfo/obo-coordinators >> >> > > ------------------------------------------------------------------------------ > Come build with us! The BlackBerry(R) Developer Conference in SF, CA > is the only developer event you need to attend this year. Jumpstart > your > developing skills, take BlackBerry mobile applications to market and > stay > ahead of the curve. Join us from November 9 - 12, 2009. Register now! > http://p.sf.net/sfu/devconference > _______________________________________________ > pro-obo-discuss mailing list > pro...@li... > https://lists.sourceforge.net/lists/listinfo/pro-obo-discuss > |
|
From: Judith B. <Jud...@ja...> - 2009-10-21 17:56:21
|
Another good point. We are including functional RNAs in our MGI file. Judy On 10/21/09 1:41 PM, "Midori Harris" <mi...@eb...> wrote: On Tue, 20 Oct 2009, Alan Ruttenberg wrote: > Regarding the first benefit, submitting un-annotated gene products, it > seems that the appropriate place to submit these would be PRO. I think > GO would then retrieve PRO to get the estimates they are concerned > with. GO annotation files need to record data for all gene products, not just proteins. m > Otherwise we risk having two places for such information, which > reduces the effectiveness of both, and entailing costly > synchronization efforts. > > -Alan > > On Mon, Oct 19, 2009 at 12:25 PM, Amelia Ireland <aj...@eb...> wrote: >> Hello annotators, >> >> I've written up a proposal for a new format for the annotation and >> gp2protein files which would separate gene product data from annotation >> data, thereby allowing unannotated gene products to be submitted to the GO >> database. It also incorporates the plans for col 17, for annotating >> spliceforms. Please have a look at the proposal here: >> >> http://wiki.geneontology.org/index.php/Annotation_File_Format_Proposal >> >> Feedback / questions / etc. happily received. >> >> Cheers, >> Amelia. >> >> -- >> Amelia Ireland >> GO Editorial Office >> http://www.berkeleybop.org || http://www.ebi.ac.uk >> Boycott Trader Joe's Red List seafood: http://traitorjoe.com/ >> ============================ Midori A. Harris, Ph.D. GO Editor EMBL - EBI Wellcome Trust Genome Campus Hinxton Cambridge CB10 1SD UK Tel: +44 (0) 1223 494667 Fax: +44 (0) 1223 494468 Email: mi...@eb... _______________________________________________ Annotation mailing list Ann...@ge... http://fafner.stanford.edu/mailman/listinfo/annotation |
|
From: Midori H. <mi...@eb...> - 2009-10-21 17:42:00
|
On Tue, 20 Oct 2009, Alan Ruttenberg wrote: > Regarding the first benefit, submitting un-annotated gene products, it > seems that the appropriate place to submit these would be PRO. I think > GO would then retrieve PRO to get the estimates they are concerned > with. GO annotation files need to record data for all gene products, not just proteins. m > Otherwise we risk having two places for such information, which > reduces the effectiveness of both, and entailing costly > synchronization efforts. > > -Alan > > On Mon, Oct 19, 2009 at 12:25 PM, Amelia Ireland <aj...@eb...> wrote: >> Hello annotators, >> >> I've written up a proposal for a new format for the annotation and >> gp2protein files which would separate gene product data from annotation >> data, thereby allowing unannotated gene products to be submitted to the GO >> database. It also incorporates the plans for col 17, for annotating >> spliceforms. Please have a look at the proposal here: >> >> http://wiki.geneontology.org/index.php/Annotation_File_Format_Proposal >> >> Feedback / questions / etc. happily received. >> >> Cheers, >> Amelia. >> >> -- >> Amelia Ireland >> GO Editorial Office >> http://www.berkeleybop.org || http://www.ebi.ac.uk >> Boycott Trader Joe's Red List seafood: http://traitorjoe.com/ >> ============================ Midori A. Harris, Ph.D. GO Editor EMBL - EBI Wellcome Trust Genome Campus Hinxton Cambridge CB10 1SD UK Tel: +44 (0) 1223 494667 Fax: +44 (0) 1223 494468 Email: mi...@eb... |
|
From: Darren N. <da...@ge...> - 2009-10-21 16:38:17
|
PRO will represent only those proteins that are known to exist (corresponding to UniProt PE [protein existence] tag level 1 or 2). Furthermore, we will (at least at first) only contain proteins that are likely to be referenced somewhere (literature, pathway ontology, etc.), meaning that many unannotated proteins will not be within PRO (unless requested). The reason for this is to reduce the number of terms that will require revision or obsoleting upon characterization. Judith Blake wrote: > It’s not just PRO IDs, it’s UniProt, NP, XP, all protein IDs need to be > included in GO lookup. > > The actual ID for human / mouse might be PRO at some point. At this > point, priority is UniProt, NP, XP. These are associated with the MOD > ID. It is to the MOD ID that the GO annotations are provided. This is > all a work in progress. We are not now able to provide PRO IDs for all > mouse proteins, although that should come soon. > > Judy > > > On 10/21/09 10:58 AM, "Alan Ruttenberg" <ala...@gm...> wrote: > > On Wed, Oct 21, 2009 at 10:15 AM, Judith Blake > <Jud...@ja...> wrote: > > Alan, > > > > We need to represent all proteins in GO for the reasons mentioned. > These > > would carry PRO xrefs as well as xref to UniProt, NP. > > PRO should not be used as a dbxref - it should be used as the actual > ids for the gene products. > > > The point being that > > folks searching GO resources via protein IDs need to be able to > enter not > > only IDs that have GO annotations, but IDs for proteins that don’t > have GO > > annotations. So they can recover datasets +/- GO annotations for all > > proteins of interest > > This is a tooling issue. All that means is that the GO resource tools > should be able to search PRO. > > -Alan > . > > > > Also, while PRO will provide IDs for all mouse and human, the IDs and > > intersections for other organisms, now included in GO, will come > later. > > > > Judy > > > > > > On 10/20/09 6:20 PM, "Alan Ruttenberg" <ala...@gm...> > wrote: > > > > Regarding the first benefit, submitting un-annotated gene products, it > > seems that the appropriate place to submit these would be PRO. I think > > GO would then retrieve PRO to get the estimates they are concerned > > with. > > > > Otherwise we risk having two places for such information, which > > reduces the effectiveness of both, and entailing costly > > synchronization efforts. > > > > -Alan > > > > On Mon, Oct 19, 2009 at 12:25 PM, Amelia Ireland <aj...@eb...> > wrote: > > > Hello annotators, > > > > > > I've written up a proposal for a new format for the annotation and > > > gp2protein files which would separate gene product data from > annotation > > > data, thereby allowing unannotated gene products to be submitted > to the GO > > > database. It also incorporates the plans for col 17, for annotating > > > spliceforms. Please have a look at the proposal here: > > > > > > > http://wiki.geneontology.org/index.php/Annotation_File_Format_Proposal > > > > > > Feedback / questions / etc. happily received. > > > > > > Cheers, > > > Amelia. > > > > > > -- > > > Amelia Ireland > > > GO Editorial Office > > > http://www.berkeleybop.org || http://www.ebi.ac.uk > > > Boycott Trader Joe's Red List seafood: http://traitorjoe.com/ > > > > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > Annotation mailing list > > > Ann...@ge... > > > http://fafner.stanford.edu/mailman/listinfo/annotation > > > > > _______________________________________________ > > Annotation mailing list > > Ann...@ge... > > http://fafner.stanford.edu/mailman/listinfo/annotation > > > > > |
|
From: Alan R. <ala...@gm...> - 2009-10-21 16:23:22
|
On Wed, Oct 21, 2009 at 11:47 AM, Michael Ashburner <ma...@ge...> wrote: > Alan > > Forgive me if I am wrong. PRO is an ONTOLOGY, it DONT DO instances which is > what GO needs I didn't know GO did anything with instances. Which instances does GO use? Certainly not instances of gene products - I don't know many experiments that are about specific single molecules. -Alan > > M > On 21 Oct 2009, at 15:58, Alan Ruttenberg wrote: > >> On Wed, Oct 21, 2009 at 10:15 AM, Judith Blake <Jud...@ja...> >> wrote: >>> >>> Alan, >>> >>> We need to represent all proteins in GO for the reasons mentioned. These >>> would carry PRO xrefs as well as xref to UniProt, NP. >> >> PRO should not be used as a dbxref - it should be used as the actual >> ids for the gene products. >> >>> The point being that >>> folks searching GO resources via protein IDs need to be able to enter not >>> only IDs that have GO annotations, but IDs for proteins that don’t have >>> GO >>> annotations. So they can recover datasets +/- GO annotations for all >>> proteins of interest >> >> This is a tooling issue. All that means is that the GO resource tools >> should be able to search PRO. >> >> -Alan >> . >>> >>> Also, while PRO will provide IDs for all mouse and human, the IDs and >>> intersections for other organisms, now included in GO, will come later. >>> >>> Judy >>> >>> >>> On 10/20/09 6:20 PM, "Alan Ruttenberg" <ala...@gm...> wrote: >>> >>> Regarding the first benefit, submitting un-annotated gene products, it >>> seems that the appropriate place to submit these would be PRO. I think >>> GO would then retrieve PRO to get the estimates they are concerned >>> with. >>> >>> Otherwise we risk having two places for such information, which >>> reduces the effectiveness of both, and entailing costly >>> synchronization efforts. >>> >>> -Alan >>> >>> On Mon, Oct 19, 2009 at 12:25 PM, Amelia Ireland <aj...@eb...> wrote: >>>> >>>> Hello annotators, >>>> >>>> I've written up a proposal for a new format for the annotation and >>>> gp2protein files which would separate gene product data from annotation >>>> data, thereby allowing unannotated gene products to be submitted to the >>>> GO >>>> database. It also incorporates the plans for col 17, for annotating >>>> spliceforms. Please have a look at the proposal here: >>>> >>>> http://wiki.geneontology.org/index.php/Annotation_File_Format_Proposal >>>> >>>> Feedback / questions / etc. happily received. >>>> >>>> Cheers, >>>> Amelia. >>>> >>>> -- >>>> Amelia Ireland >>>> GO Editorial Office >>>> http://www.berkeleybop.org || http://www.ebi.ac.uk >>>> Boycott Trader Joe's Red List seafood: http://traitorjoe.com/ >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> _______________________________________________ >>>> Annotation mailing list >>>> Ann...@ge... >>>> http://fafner.stanford.edu/mailman/listinfo/annotation >>>> >>> _______________________________________________ >>> Annotation mailing list >>> Ann...@ge... >>> http://fafner.stanford.edu/mailman/listinfo/annotation >>> >>> >> _______________________________________________ >> Obo-coordinators mailing list >> Obo...@ob... >> http://mail.fruitfly.org/mailman/listinfo/obo-coordinators > > |