Re: [Transdecoder-users] Protein found in domtbl file miss in .pep file

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hi David,

All orfs found to have pfam hits should be included in the final
transdecoder.pep output file. If this is not the case, then there could be
a bug that we weren't aware of.

If you take your 'missing' transcripts and run them through transdecoder
separately, is it not picking them up and reporting them in the final
output?  If there's a bug, we'll need some example data to help
troubleshoot it.

many thanks,

~brian

On Tue, Jun 10, 2014 at 9:42 AM, 卢 汉斌 <lh...@gm...> wrote:

> Hello everyone,
>
> I use the following command to  find coding region in my trinity assembly:
>
> TransDecoder -t target_transcripts.fasta --reuse  --search_pfam /path_to_transdecoder/pfam/Pfam-AB.hmm.bin --CPU 5
>
>
> It generates several output files. Next, I want to find the transcripts that contain the domain I interested. I search the target_transcripts.transdecorder.pfam.dat ( .domtbl ) file for lines that contain the name of the domain I am interested. A typical record is display as follow:
>
>
> DUF640               PF04852.7    133 Unigene0069328|m.26647 -            127   1.2e-50  172.2   0.1   1   1   5.6e-55     2e-50  171.6   0.1    39   133     1    95     1    95 0.99 Protein of unknown function (DUF640)
>
> I get the ids ( "Unigene0069328|m.26647" in the example line ) and  pick up those protein sequences in the target_transcripts.transdecorder.pep file, output of the TransDecoder. However, many records I found in the pfam.dat cannot be found in .pep file.
>
> I select several "missing sequences" and predict their coding region on NCBI. They all have ORFs and the domain I interested. So why are these sequences not transformed to peptide sequences and record in the TransDecoder output file —— target_transcripts.transdecorder.pep.
>
> Thank you for your help.
>
> Best,
> David Lu
>
>
>
> ------------------------------------------------------------------------------
> HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions
> Find What Matters Most in Your Big Data with HPCC Systems
> Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
> Leverages Graph Analysis for Fast Processing & Easy Data Exploration
> http://p.sf.net/sfu/hpccsystems
> _______________________________________________
> Transdecoder-users mailing list
> Tra...@li...
> https://lists.sourceforge.net/lists/listinfo/transdecoder-users
>
>

-- 
--
Brian J. Haas
The Broad Institute
http://broad.mit.edu/~bhaas

Re: [Transdecoder-users] Protein found in domtbl file miss in .pep file

Extracting likely coding regions from transcript sequences

Re: [Transdecoder-users] Protein found in domtbl file miss in .pep file