Re: [Transdecoder-users] ORFs without ATG as start
Extracting likely coding regions from transcript sequences
Brought to you by:
bhaas
From: Brian H. <bh...@br...> - 2014-05-14 22:05:13
|
Hi Andre, Transdecoder doesn't have a start-codon finding function. It finds the longest ORFs, and in the case where there isn't an intervening in-frame stop-codon between your optimal start codon and the 5' end of the sequence, it's going to simply assume that the transcript is not full-length and give you the full translation from the beginning of the transcript. Usually, there's an in-frame stop codon in the 5' UTR that prevents most full-length transcripts from being reported as 5' partials. I suppose, if you have a GC-rich target (rarer random stop codons), this could definitely be more of an issue. The high prevalence of 5' partials tends to be a general indicator of not having full-length transcripts, resulting from the 3' coverage bias typical of poly-A captured cDNA sequencing. I'm sure there are occassions where a genuine full-length transcript has an in-frame translatable 5' UTR, but in my experience, these have generally been the rare exceptions. cheers, ~brian On Wed, May 14, 2014 at 5:57 PM, Andre Minoche <And...@cr...> wrote: > Dear Matt, > > Thank you for your quick reply. > > The quality of the reference sequence from which I derived the cds is not > an issue. > > I have full length transcript cDNA sequences I aligned to a high quality > genome assembly. From the alignment coordinates and using the genome > assembly as source, I derived high quality mRNA sequences I used as input > for transdecoder. > > Deep RNA-seq data confirms the full length nature of my cDNA sequences. At > the locations where my full length transcripts align I also have Augustus > gene models supported by RNA-seq data. The ORFs of the predicted gene > models generally correspond to the ORFs of transdecoder, with the > exception, that in many cases transdecoder ORFs start upstream of the > Augustus start codon and not with an ATG (5prime_partial, ORFs). > > Best regards > André > > On 14 May 2014, at 21:44, Matthew MacManes <mat...@un...> > wrote: > > Andre, > > Alternatively, this could be a metric of assembly quality (e.g. > http://journal.frontiersin.org/Journal/10.3389/fgene.2014.00013/abstract) > which may be a function of the input data. Can you tell us a little bit > about the type and quantity of data used in this assembly? > > Matt > > __________________________________ > *Matthew MacManes*, Ph.D. > University of New Hampshire I Assistant Professor > Department of Molecular, Cellular, & Biomedical Sciences > Durham, NH 03824 > Phone: 603-862-4052 I Twitter: @PeroMHC <https://twitter.com/PeroMHC> > Web: genomebio.org > Office: 189 Rudman Hall I Lab: 145 Rudman Hall > > > On Wed, May 14, 2014 at 2:44 PM, Andre Minoche <And...@cr...>wrote: > >> >> Hi, >> >> First of all thanks for providing a tool to predict ORFs. >> >> I am surprised to see, that so many of my ORFs predicted with >> transdecoder do not start with ATG (5prime_partial, 650 out of 1807). >> >> To me it seems like a malfunction of the software. >> >> Is there an easy way to trim the ORFs to the nearest start codon? >> >> Thanks >> André >> >> -- >> André Minoche, Postdoc >> >> Centre for Genomic Regulation (CRG) >> Doctor Aiguader, 88, 4th floor >> 08003 Barcelona >> >> http://seq.crg.es >> >> >> >> ------------------------------------------------------------------------------ >> "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE >> Instantly run your Selenium tests across 300+ browser/OS combos. >> Get unparalleled scalability from the best Selenium testing platform >> available >> Simple to use. Nothing to install. Get started now for free." >> http://p.sf.net/sfu/SauceLabs >> _______________________________________________ >> Transdecoder-users mailing list >> Tra...@li... >> https://lists.sourceforge.net/lists/listinfo/transdecoder-users >> > > > > > ------------------------------------------------------------------------------ > "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE > Instantly run your Selenium tests across 300+ browser/OS combos. > Get unparalleled scalability from the best Selenium testing platform > available > Simple to use. Nothing to install. Get started now for free." > http://p.sf.net/sfu/SauceLabs > _______________________________________________ > Transdecoder-users mailing list > Tra...@li... > https://lists.sourceforge.net/lists/listinfo/transdecoder-users > > -- -- Brian J. Haas The Broad Institute http://broad.mit.edu/~bhaas |