Thread: [Transdecoder-users] ORFs without ATG as start
Extracting likely coding regions from transcript sequences
Brought to you by:
bhaas
From: Andre M. <And...@cr...> - 2014-05-14 18:57:04
|
Hi, First of all thanks for providing a tool to predict ORFs. I am surprised to see, that so many of my ORFs predicted with transdecoder do not start with ATG (5prime_partial, 650 out of 1807). To me it seems like a malfunction of the software. Is there an easy way to trim the ORFs to the nearest start codon? Thanks André -- André Minoche, Postdoc Centre for Genomic Regulation (CRG) Doctor Aiguader, 88, 4th floor 08003 Barcelona http://seq.crg.es |
From: Matthew M. <mat...@un...> - 2014-05-14 19:45:29
|
Andre, Alternatively, this could be a metric of assembly quality (e.g. http://journal.frontiersin.org/Journal/10.3389/fgene.2014.00013/abstract) which may be a function of the input data. Can you tell us a little bit about the type and quantity of data used in this assembly? Matt __________________________________ *Matthew MacManes*, Ph.D. University of New Hampshire I Assistant Professor Department of Molecular, Cellular, & Biomedical Sciences Durham, NH 03824 Phone: 603-862-4052 I Twitter: @PeroMHC <https://twitter.com/PeroMHC> Web: genomebio.org Office: 189 Rudman Hall I Lab: 145 Rudman Hall On Wed, May 14, 2014 at 2:44 PM, Andre Minoche <And...@cr...> wrote: > > Hi, > > First of all thanks for providing a tool to predict ORFs. > > I am surprised to see, that so many of my ORFs predicted with > transdecoder do not start with ATG (5prime_partial, 650 out of 1807). > > To me it seems like a malfunction of the software. > > Is there an easy way to trim the ORFs to the nearest start codon? > > Thanks > André > > -- > André Minoche, Postdoc > > Centre for Genomic Regulation (CRG) > Doctor Aiguader, 88, 4th floor > 08003 Barcelona > > http://seq.crg.es > > > > ------------------------------------------------------------------------------ > "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE > Instantly run your Selenium tests across 300+ browser/OS combos. > Get unparalleled scalability from the best Selenium testing platform > available > Simple to use. Nothing to install. Get started now for free." > http://p.sf.net/sfu/SauceLabs > _______________________________________________ > Transdecoder-users mailing list > Tra...@li... > https://lists.sourceforge.net/lists/listinfo/transdecoder-users > |
From: Andre M. <And...@cr...> - 2014-05-14 21:57:24
|
Dear Matt, Thank you for your quick reply. The quality of the reference sequence from which I derived the cds is not an issue. I have full length transcript cDNA sequences I aligned to a high quality genome assembly. From the alignment coordinates and using the genome assembly as source, I derived high quality mRNA sequences I used as input for transdecoder. Deep RNA-seq data confirms the full length nature of my cDNA sequences. At the locations where my full length transcripts align I also have Augustus gene models supported by RNA-seq data. The ORFs of the predicted gene models generally correspond to the ORFs of transdecoder, with the exception, that in many cases transdecoder ORFs start upstream of the Augustus start codon and not with an ATG (5prime_partial, ORFs). Best regards André On 14 May 2014, at 21:44, Matthew MacManes <mat...@un...<mailto:mat...@un...>> wrote: Andre, Alternatively, this could be a metric of assembly quality (e.g. http://journal.frontiersin.org/Journal/10.3389/fgene.2014.00013/abstract) which may be a function of the input data. Can you tell us a little bit about the type and quantity of data used in this assembly? Matt __________________________________ Matthew MacManes, Ph.D. University of New Hampshire I Assistant Professor Department of Molecular, Cellular, & Biomedical Sciences Durham, NH 03824 Phone: 603-862-4052 I Twitter: @PeroMHC<https://twitter.com/PeroMHC> Web: genomebio.org<http://genomebio.org/> Office: 189 Rudman Hall I Lab: 145 Rudman Hall On Wed, May 14, 2014 at 2:44 PM, Andre Minoche <And...@cr...<mailto:And...@cr...>> wrote: Hi, First of all thanks for providing a tool to predict ORFs. I am surprised to see, that so many of my ORFs predicted with transdecoder do not start with ATG (5prime_partial, 650 out of 1807). To me it seems like a malfunction of the software. Is there an easy way to trim the ORFs to the nearest start codon? Thanks André -- André Minoche, Postdoc Centre for Genomic Regulation (CRG) Doctor Aiguader, 88, 4th floor 08003 Barcelona http://seq.crg.es<http://seq.crg.es/> ------------------------------------------------------------------------------ "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE Instantly run your Selenium tests across 300+ browser/OS combos. Get unparalleled scalability from the best Selenium testing platform available Simple to use. Nothing to install. Get started now for free." http://p.sf.net/sfu/SauceLabs _______________________________________________ Transdecoder-users mailing list Tra...@li...<mailto:Tra...@li...> https://lists.sourceforge.net/lists/listinfo/transdecoder-users |
From: Andre M. <And...@cr...> - 2014-05-14 21:57:22
|
Dear Matt, Thank you for your quick reply. The quality of the reference sequence from which I derived the cds is not an issue. I have full length transcript cDNA sequences I aligned to a high quality genome assembly. From the alignment coordinates and using the genome assembly as source, I derived high quality mRNA sequences I used as input for transdecoder. Deep RNA-seq data confirms the full length nature of my cDNA sequences. At the locations where my full length transcripts align I also have Augustus gene models supported by RNA-seq data. The ORFs of the predicted gene models generally correspond to the ORFs of transdecoder, with the exception, that in many cases transdecoder ORFs start upstream of the Augustus start codon and not with an ATG (5prime_partial, ORFs). Best regards André On 14 May 2014, at 21:44, Matthew MacManes <mat...@un...<mailto:mat...@un...>> wrote: Andre, Alternatively, this could be a metric of assembly quality (e.g. http://journal.frontiersin.org/Journal/10.3389/fgene.2014.00013/abstract) which may be a function of the input data. Can you tell us a little bit about the type and quantity of data used in this assembly? Matt __________________________________ Matthew MacManes, Ph.D. University of New Hampshire I Assistant Professor Department of Molecular, Cellular, & Biomedical Sciences Durham, NH 03824 Phone: 603-862-4052 I Twitter: @PeroMHC<https://twitter.com/PeroMHC> Web: genomebio.org<http://genomebio.org/> Office: 189 Rudman Hall I Lab: 145 Rudman Hall On Wed, May 14, 2014 at 2:44 PM, Andre Minoche <And...@cr...<mailto:And...@cr...>> wrote: Hi, First of all thanks for providing a tool to predict ORFs. I am surprised to see, that so many of my ORFs predicted with transdecoder do not start with ATG (5prime_partial, 650 out of 1807). To me it seems like a malfunction of the software. Is there an easy way to trim the ORFs to the nearest start codon? Thanks André -- André Minoche, Postdoc Centre for Genomic Regulation (CRG) Doctor Aiguader, 88, 4th floor 08003 Barcelona http://seq.crg.es<http://seq.crg.es/> ------------------------------------------------------------------------------ "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE Instantly run your Selenium tests across 300+ browser/OS combos. Get unparalleled scalability from the best Selenium testing platform available Simple to use. Nothing to install. Get started now for free." http://p.sf.net/sfu/SauceLabs _______________________________________________ Transdecoder-users mailing list Tra...@li...<mailto:Tra...@li...> https://lists.sourceforge.net/lists/listinfo/transdecoder-users |
From: Brian H. <bh...@br...> - 2014-05-14 22:05:13
|
Hi Andre, Transdecoder doesn't have a start-codon finding function. It finds the longest ORFs, and in the case where there isn't an intervening in-frame stop-codon between your optimal start codon and the 5' end of the sequence, it's going to simply assume that the transcript is not full-length and give you the full translation from the beginning of the transcript. Usually, there's an in-frame stop codon in the 5' UTR that prevents most full-length transcripts from being reported as 5' partials. I suppose, if you have a GC-rich target (rarer random stop codons), this could definitely be more of an issue. The high prevalence of 5' partials tends to be a general indicator of not having full-length transcripts, resulting from the 3' coverage bias typical of poly-A captured cDNA sequencing. I'm sure there are occassions where a genuine full-length transcript has an in-frame translatable 5' UTR, but in my experience, these have generally been the rare exceptions. cheers, ~brian On Wed, May 14, 2014 at 5:57 PM, Andre Minoche <And...@cr...> wrote: > Dear Matt, > > Thank you for your quick reply. > > The quality of the reference sequence from which I derived the cds is not > an issue. > > I have full length transcript cDNA sequences I aligned to a high quality > genome assembly. From the alignment coordinates and using the genome > assembly as source, I derived high quality mRNA sequences I used as input > for transdecoder. > > Deep RNA-seq data confirms the full length nature of my cDNA sequences. At > the locations where my full length transcripts align I also have Augustus > gene models supported by RNA-seq data. The ORFs of the predicted gene > models generally correspond to the ORFs of transdecoder, with the > exception, that in many cases transdecoder ORFs start upstream of the > Augustus start codon and not with an ATG (5prime_partial, ORFs). > > Best regards > André > > On 14 May 2014, at 21:44, Matthew MacManes <mat...@un...> > wrote: > > Andre, > > Alternatively, this could be a metric of assembly quality (e.g. > http://journal.frontiersin.org/Journal/10.3389/fgene.2014.00013/abstract) > which may be a function of the input data. Can you tell us a little bit > about the type and quantity of data used in this assembly? > > Matt > > __________________________________ > *Matthew MacManes*, Ph.D. > University of New Hampshire I Assistant Professor > Department of Molecular, Cellular, & Biomedical Sciences > Durham, NH 03824 > Phone: 603-862-4052 I Twitter: @PeroMHC <https://twitter.com/PeroMHC> > Web: genomebio.org > Office: 189 Rudman Hall I Lab: 145 Rudman Hall > > > On Wed, May 14, 2014 at 2:44 PM, Andre Minoche <And...@cr...>wrote: > >> >> Hi, >> >> First of all thanks for providing a tool to predict ORFs. >> >> I am surprised to see, that so many of my ORFs predicted with >> transdecoder do not start with ATG (5prime_partial, 650 out of 1807). >> >> To me it seems like a malfunction of the software. >> >> Is there an easy way to trim the ORFs to the nearest start codon? >> >> Thanks >> André >> >> -- >> André Minoche, Postdoc >> >> Centre for Genomic Regulation (CRG) >> Doctor Aiguader, 88, 4th floor >> 08003 Barcelona >> >> http://seq.crg.es >> >> >> >> ------------------------------------------------------------------------------ >> "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE >> Instantly run your Selenium tests across 300+ browser/OS combos. >> Get unparalleled scalability from the best Selenium testing platform >> available >> Simple to use. Nothing to install. Get started now for free." >> http://p.sf.net/sfu/SauceLabs >> _______________________________________________ >> Transdecoder-users mailing list >> Tra...@li... >> https://lists.sourceforge.net/lists/listinfo/transdecoder-users >> > > > > > ------------------------------------------------------------------------------ > "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE > Instantly run your Selenium tests across 300+ browser/OS combos. > Get unparalleled scalability from the best Selenium testing platform > available > Simple to use. Nothing to install. Get started now for free." > http://p.sf.net/sfu/SauceLabs > _______________________________________________ > Transdecoder-users mailing list > Tra...@li... > https://lists.sourceforge.net/lists/listinfo/transdecoder-users > > -- -- Brian J. Haas The Broad Institute http://broad.mit.edu/~bhaas |