|
From: Lincoln S. <ls...@cs...> - 2007-04-11 15:40:50
|
Yes, interbase sites are to the right of the indicated base, regardless of the strandedness of the feature that you are mapping. Lincoln On 4/11/07, Cook, Malcolm <ME...@st...> wrote: > > Lincoln and Jannick and fellow GFF/SO wanderers, > > Just recently, I've been going through this exact set of issues myself > as applied to correctly inferring SO compliant features for > splice_donor_site and splice_acceptor_site given a gene model. > > I hope the following simplified example is useful to understanding the > issue, and I would appreciate your comments as to whether you agree how > I interpret the GFF3 and SO specifications in this regard. > > > EXAMPLE > ============================ > > Given this simplified gene model containing two exon each being 3bp > long: > > 123456789 > EEEIIIEEE > >>>--->>> > > and given these SO definitions: > > splice_donor_site: The junction between the 3 prime end of an > exon and the following intron. > <http://www.broad.mit.edu/annotation/genome/ontology/Sequence_Ontology/T > erm.html?sp=SSO%3A0000163> > splice_acceptor_site: The junction between the 3 prime end of an > intron and the following exon. > <http://www.broad.mit.edu/annotation/genome/ontology/Sequence_Ontology/T > erm.html?sp=SSO%3A0000164> > > ...we should encode the gene as: > exon(1,3,+) > splice_donor_site(3,3,+) > intron(4,6,+) > splice_acceptor_site(6,6,+) > exon(7,9,+) > > HOWEVER, if the gene codes the other way, viz. > > 123456789 > EEEIIIEEE > <<<---<<< > > ...we should encode it as: > exon(7,9,-) > splice_donor_site(6,6,-) > intron(4,6,-) > splice_acceptor_site(3,3,-) > exon(1,3,-) > > Note that the coordinates of the exon and intron are the same in both > encodings, only the strand is different; AND, the coordinates of the > splice sites are also the same between encodings, due to understanding > "to the right of the indicated base in the direction of the landmark." > as "1 plus the indicated base, in interbase coordinates" > > It is this understanding that I am trying to clarify by this example, > and would in particular appreciate confirmation that the splice sites > should NOT be encoded in the second model as: > > splice_donor_site(7,7,-) > splice_acceptor_site(4,4,+) > > Thanks, > > Malcolm Cook > Stowers Institute for Medical Research - Kansas City, Missouri > > ________________________________ > > From: son...@li... > [mailto:son...@li...] On Behalf Of Lincoln > Stein > Sent: Wednesday, April 11, 2007 8:46 AM > To: Jannick D. Bendtsen > Cc: SO developers; ls...@cs... > Subject: Re: [SO-devel] GFF 3 > > > Hi, > > Zero-length and one-length features are both represented using > coordinates start==end. One uses the ontology to determine whether this > is a zero length feature or a one-length feature. A zero-length feature > will inherit from the "junction" parent class, while a one-length > feature will inherit from the "region" parent class. > > I would much rather use interbase coordinates (in which the > numbers refer to the positions between bases), but legacy requires GFF3 > to use base coordinates. > > Lincoln > > > On 4/11/07, Jannick D. Bendtsen <jbe...@cl...> wrote: > > Dear Lincoln, > > I'm trying to parse a GFF file and reading > http://www.sequenceontology.org/gff3.shtml left me with > a few questions > which I hope you will take the time to answer. > > -- snip -- > Columns 4 & 5: "start" and "end" The start and end of > the feature, in > 1-based integer coordinates, relative to the landmark > given in column 1. > Start is always less than or equal to end. For > zero-length features, > such as insertion sites, start equals end and the > implied site is to the > right of the indicated base in the direction of the > landmark. > -- snip -- > > From this it is clear that insertion sites, cleavage > sites etc. can be > mapped onto a sequence simply by this > > ctg123 . gene 1000 1000 . + . ID=gene00001;Name=EDEN > > But what is the region syntax for just covering one > position? > > 1000 1001 will cover two positions?? > > Thanks for your help. > > Best wishes > Jannick > > -- > ______________________________ > > Jannick D. Bendtsen > Senior scientific officer > > CLC bio > Gustav Wieds Vej 10 > 8000 Aarhus C > Denmark > > www.clcbio.com > > jbe...@cl... > > Contact numbers: > Telephone: +45 70 22 32 44 > Fax: +45 70 22 55 19 > Mobile: +45 51 20 96 94 > > CLC bio A/S Disclaimer: > > ------------------------------------------------------------------- > Any information contained in this e-mail and/or any > attachments > is confidential, and only intended for reception and use > by the > specified person. > If you are not the intended recipient, please return the > email to > the sender and delete it afterwards. In this case any > copying, > forwarding, printing, disclosure and use is strictly > prohibited. > > ------------------------------------------------------------------- > > > > > > > -- > Lincoln D. Stein > Cold Spring Harbor Laboratory > 1 Bungtown Road > Cold Spring Harbor, NY 11724 > (516) 367-8380 (voice) > (516) 367-8389 (fax) > FOR URGENT MESSAGES & SCHEDULING, > PLEASE CONTACT MY ASSISTANT, > SANDRA MICHELSEN, AT mic...@cs... > > -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT mic...@cs... |