[Transdecoder-users] Protein found in domtbl file miss in .pep file
Extracting likely coding regions from transcript sequences
Brought to you by:
bhaas
From: 卢 汉斌 <lh...@gm...> - 2014-06-10 13:42:21
|
Hello everyone, I use the following command to find coding region in my trinity assembly: TransDecoder -t target_transcripts.fasta --reuse --search_pfam /path_to_transdecoder/pfam/Pfam-AB.hmm.bin --CPU 5 It generates several output files. Next, I want to find the transcripts that contain the domain I interested. I search the target_transcripts.transdecorder.pfam.dat ( .domtbl ) file for lines that contain the name of the domain I am interested. A typical record is display as follow: DUF640 PF04852.7 133 Unigene0069328|m.26647 - 127 1.2e-50 172.2 0.1 1 1 5.6e-55 2e-50 171.6 0.1 39 133 1 95 1 95 0.99 Protein of unknown function (DUF640) I get the ids ( "Unigene0069328|m.26647" in the example line ) and pick up those protein sequences in the target_transcripts.transdecorder.pep file, output of the TransDecoder. However, many records I found in the pfam.dat cannot be found in .pep file. I select several "missing sequences" and predict their coding region on NCBI. They all have ORFs and the domain I interested. So why are these sequences not transformed to peptide sequences and record in the TransDecoder output file —— target_transcripts.transdecorder.pep. Thank you for your help. Best, David Lu |