|
From: Chris M. <cj...@fr...> - 2007-04-16 23:52:40
|
Hi Jim You're pointing out a lot of the weak points in the use of SO in combination with data models. In an ideal world most of the semantics could be derived entirely from SO, but in practice there are a lot of conventions that have to be documented and implemented and transformed. The rationale for using polypeptide and not CDS was that it's often desirable to attach properties to the polypeptide object that apply only to the actual molecule - it's localisation, weight, etc - and not to the region that encodes the polypeptide. Representing both CDS and polypeptide seemed like overkill so the pp was used, with the understanding that it implicitly represented a CDS, since all pps (according to SO) are derived_from a CDS. Unfortunately these kinds of conventions can't be purely derived in a declarative manner from first principles from the ontology, they have to be implemented in annoyingly messy and imperative code. (In actual fact it is possible to represent polycistronic genes using pps and not CDSs - FlyBase has many polycistronic genes represented) On Apr 15, 2007, at 5:15 PM, Jim Hu wrote: > Hi Karen, > > I actually started from the GFF3 docs (I'm on sabbatical in > Lincoln's lab right now and have also been discussing this with > people there and on the GMOD conf calls) so this kind of brings me > full circle. I just wasn't sure from that if the SO representation > was the same as the GFF3 representation (esp. wrt Chado) or if > there is a data transformation between GFF3 and SO/Chado. For > example, the Chado docs suggest that CDS is not explicitly in the > feature graph for Chado, but is inferred from polypeptides + > exons. But I see that for polycistronic genes we need explicit CDS > features because otherwise we don't get the intergenic > bits...unless we use a CDS only gene model. > > Thanks! > > Jim > > On Apr 15, 2007, at 7:00 PM, Karen Eilbeck wrote: > >> Hi Jim, >> I'm sorry, I should have pointed you to the gff3 documentation in >> the first place. >> http://www.sequenceontology.org/gff3.shtml >> >> f) An operon >> >> A classic operon occurs when the genes in a polycistronic transcript >> are co-regulated by cis-regulatory element(s): >> >> regulatory element >> * ================================================> operon >> ----->XXXXXXX*-->BBBBBB*--->ZZZZ*-->AAAAAA*----- >> >> It can be indicated in GFF3 in this way: >> >> ChrX . operon XXXX YYYY . + . ID=operon01;name=my_operon >> ChrX . promoter XXXX YYYY . + . Parent=operon01 >> ChrX . gene XXXX YYYY . + . >> ID=gene01;Parent=operon01;name=resA >> ChrX . gene XXXX YYYY . + . >> ID=gene02;Parent=operon01;name=resB >> ChrX . gene XXXX YYYY . + . >> ID=gene03;Parent=operon01;name=resX >> ChrX . gene XXXX YYYY . + . >> ID=gene04;Parent=operon01;name=resZ >> ChrX . mRNA XXXX YYYY . + . >> ID=tran01;Parent=gene01,gene02,gene03,gene04 >> ChrX . exon XXXX YYYY . + . ID=exon00001;Parent=tran01 >> ChrX . CDS XXXX YYYY . + . Parent=tran01;Derives_from=gene01 >> ChrX . CDS XXXX YYYY . + . Parent=tran01;Derives_from=gene02 >> ChrX . CDS XXXX YYYY . + . Parent=tran01;Derives_from=gene03 >> ChrX . CDS XXXX YYYY . + . Parent=tran01;Derives_from=gene04 >> >> Hope this helps, >> Karen >> >> >> On Apr 15, 2007, at 1:36 PM, Jim Hu wrote: >> >>> Thanks Karen, >>> >>> The part I read in the meeting notes suggested that consensus was >>> not actually reached on how to handle the polycitronic >>> situation. Am I misreading? I also looked at your 2004 summary >>> of the Cambridge meeting >>> >>> http://www.sequenceontology.org/meetings/SO-meeting-Cambridge.pdf >>> >>> From that, and poking around the ontology, it looks like your >>> first solution (on page 6 of the pdf, which is the one that makes >>> more sense to me, fwiw) was not adopted, since associated_with >>> isn't in the SO relationships I see and transcribed_region is >>> obsoleted. But then, I don't see feature_collection either. >>> >>> As an aside, it seems to me that the assertion that gene is not >>> part_of polycistronic transcript is related to a difference in >>> prokaryotic and eukaryotic usage of what a gene is (I would not >>> assign the lac promoter/operator to lacZ; some of the older maps >>> show the lac region as I P O Z Y A). Was there ever discussion >>> of splitting gene into different sensu forms? >>> >>> I read the other reply as a yes. mRNA member_of gene is correct >>> within SO and SOFA. But in my limited looking at sample data >>> from Flybase it seems like it's common to use part_of when mRNAs >>> are recorded in Chado. Perhaps that's part of what people mean >>> by level 1 vs level 2 compliance between Chado and SO? >>> >>> Again, sorry if this is a rehash of ancient history and I'm just >>> being obtuse. My basic problem is that I'm working on mapping >>> various E. coli annotation sets to Chado but I'm new enough to >>> both SO and Chado that I'm worrying about unforeseen problems >>> arising from how I do it. So I'm trying to get as solid a >>> foundation as I can...which means I'm probably overreading any >>> ambiguity in the various sources of documentation. >>> >>> Jim >>> >>> On Apr 12, 2007, at 4:18 PM, Karen Eilbeck wrote: >>> >>>> Hi Jim, >>>> The polycistronic/monocistronic debate happened at a SO meeting. >>>> http://www.sequenceontology.org/meetings/meeting-doc.html >>>> >>>> Basically a gene can be thought of as a collection of transcripts. >>>> That is what the member relationship is trying to represent. >>>> >>>> --Karen >>>> >>>> On Apr 12, 2007, at 10:07 AM, Jim Hu wrote: >>>> >>>>> I was browsing the ontology to try to understand the >>>>> representation of polycistronic mRNAs and found myself >>>>> wondering about monocitronic genes >>>>> >>>>> Is the relationship >>>>> >>>>> mRNA member_of gene >>>>> >>>>> via >>>>> >>>>> mRNA is_a processed_transcript >>>>> is_a transcript >>>>> is_a gene_member_region >>>>> member_of gene >>>>> >>>>> or is there somewhere else that would give >>>>> >>>>> mRNA derives from gene >>>>> >>>>> or something along those lines? >>>>> >>>>> Presumably, how bacterial genes, mRNAs, and polypeptides should >>>>> be represented in SO is an old discussion. Is there a FAQ or >>>>> archived list discussion anyone can point me to? Thanks. >>>>> >>>>> Jim >>>>> ===================================== >>>>> Jim Hu >>>>> Associate Professor >>>>> Dept. of Biochemistry and Biophysics >>>>> 2128 TAMU >>>>> Texas A&M Univ. >>>>> College Station, TX 77843-2128 >>>>> 979-862-4054 >>>>> >>>>> >>>>> ------------------------------------------------------------------ >>>>> ------- >>>>> Take Surveys. Earn Cash. Influence the Future of IT >>>>> Join SourceForge.net's Techsay panel and you'll get the chance >>>>> to share your >>>>> opinions on IT & business topics through brief surveys-and earn >>>>> cash >>>>> http://www.techsay.com/default.php? >>>>> page=join.php&p=sourceforge&CID=DEVDEV____________________________ >>>>> ___________________ >>>>> SOng-devel mailing list >>>>> SOn...@li... >>>>> https://lists.sourceforge.net/lists/listinfo/song-devel >>>> >>>> ------------------------------------------------------------------- >>>> ------ >>>> Take Surveys. Earn Cash. Influence the Future of IT >>>> Join SourceForge.net's Techsay panel and you'll get the chance >>>> to share your >>>> opinions on IT & business topics through brief surveys-and earn >>>> cash >>>> http://www.techsay.com/default.php? >>>> page=join.php&p=sourceforge&CID=DEVDEV_____________________________ >>>> __________________ >>>> SOng-devel mailing list >>>> SOn...@li... >>>> https://lists.sourceforge.net/lists/listinfo/song-devel >>> >>> ===================================== >>> Jim Hu >>> Associate Professor >>> Dept. of Biochemistry and Biophysics >>> 2128 TAMU >>> Texas A&M Univ. >>> College Station, TX 77843-2128 >>> 979-862-4054 >>> >>> >> > > ===================================== > Jim Hu > Associate Professor > Dept. of Biochemistry and Biophysics > 2128 TAMU > Texas A&M Univ. > College Station, TX 77843-2128 > 979-862-4054 > > > ---------------------------------------------------------------------- > --- > This SF.net email is sponsored by DB2 Express > Download DB2 Express C - the FREE version of DB2 express and take > control of your XML. No limits. Just data. Click to get it now. > http://sourceforge.net/powerbar/db2/ > _______________________________________________ > SOng-devel mailing list > SOn...@li... > https://lists.sourceforge.net/lists/listinfo/song-devel |