Re: [Transdecoder-users] transdecoder cd-hit-est
Extracting likely coding regions from transcript sequences
Brought to you by:
bhaas
From: Brian H. <bh...@br...> - 2014-02-26 14:26:29
|
Hi Mun Hua, responses below: On Wed, Feb 26, 2014 at 4:26 AM, mun hua <mh....@gm...> wrote: > Hi Brian, > > I am looking to: > > 1) cluster spliced transcript variants from assembled transcriptome (via > Trinity or other programs) and select the best representative as a gene. > We tend to run the abundance estimation procedure and then filter based on abundance: http://trinityrnaseq.sourceforge.net/analysis/abundance_estimation.html see towards bottom of page. > 2) translate the representative transcript and pick the longest > orf/protein sequence. > > Will TransDecoder be able to carry out both tasks or just task (2)? I > noticed there is an option to include cd-hit-est executable. Does this do > task (1) at the same time or does this serve another purpose? > > TransDecoder should pull all 'good' ORFs, rather than just the single longest one. CD-HIT is included and used as part of first identifying long orfs that are used for training a Markov model, where cd-hit removes redundant orfs from that set. You'd have to pull the longest orf for your representative transcript separately. best, ~brian > Thanks, > Mun Hua > -- -- Brian J. Haas The Broad Institute http://broad.mit.edu/~bhaas |