[Transdecoder-users] Protein found in domtbl file miss in .pep file

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hello everyone,

I use the following command to  find coding region in my trinity assembly:

TransDecoder -t target_transcripts.fasta --reuse  --search_pfam /path_to_transdecoder/pfam/Pfam-AB.hmm.bin --CPU 5

It generates several output files. Next, I want to find the transcripts that contain the domain I interested. I search the target_transcripts.transdecorder.pfam.dat ( .domtbl ) file for lines that contain the name of the domain I am interested. A typical record is display as follow: 

DUF640               PF04852.7    133 Unigene0069328|m.26647 -            127   1.2e-50  172.2   0.1   1   1   5.6e-55     2e-50  171.6   0.1    39   133     1    95     1    95 0.99 Protein of unknown function (DUF640)

I get the ids ( "Unigene0069328|m.26647" in the example line ) and  pick up those protein sequences in the target_transcripts.transdecorder.pep file, output of the TransDecoder. However, many records I found in the pfam.dat cannot be found in .pep file. 

I select several "missing sequences" and predict their coding region on NCBI. They all have ORFs and the domain I interested. So why are these sequences not transformed to peptide sequences and record in the TransDecoder output file —— target_transcripts.transdecorder.pep.

Thank you for your help.

Best,
David Lu

[Transdecoder-users] Protein found in domtbl file miss in .pep file

Extracting likely coding regions from transcript sequences

[Transdecoder-users] Protein found in domtbl file miss in .pep file