Re: [Transdecoder-users] transdecoder cd-hit-est

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hi Mun Hua,

responses below:

On Wed, Feb 26, 2014 at 4:26 AM, mun hua <mh....@gm...> wrote:

> Hi Brian,
>
> I am looking to:
>
> 1) cluster spliced transcript variants from assembled transcriptome (via
> Trinity or other programs) and select the best representative as a gene.
>

We tend to run the abundance estimation procedure and then filter based on
abundance:

http://trinityrnaseq.sourceforge.net/analysis/abundance_estimation.html

see towards bottom of page.

> 2) translate the representative transcript and pick the longest
> orf/protein sequence.
>
> Will TransDecoder be able to carry out both tasks or just task (2)? I
> noticed there is an option to include cd-hit-est executable. Does this do
> task (1) at the same time or does this serve another purpose?
>
>
TransDecoder should pull all 'good' ORFs, rather than just the single
longest one.  CD-HIT is included and used as part of first identifying long
orfs that are used for training a Markov model, where cd-hit removes
redundant orfs from that set.   You'd have to pull the longest orf for your
representative transcript separately.

best,

~brian

> Thanks,
> Mun Hua
>

-- 
--
Brian J. Haas
The Broad Institute
http://broad.mit.edu/~bhaas

Re: [Transdecoder-users] transdecoder cd-hit-est

Extracting likely coding regions from transcript sequences

Re: [Transdecoder-users] transdecoder cd-hit-est