Re: [Transdecoder-users] Protein found in domtbl file miss in .pep file
Extracting likely coding regions from transcript sequences
Brought to you by:
bhaas
From: Brian H. <bh...@br...> - 2014-06-11 01:58:01
|
Hi David, All orfs found to have pfam hits should be included in the final transdecoder.pep output file. If this is not the case, then there could be a bug that we weren't aware of. If you take your 'missing' transcripts and run them through transdecoder separately, is it not picking them up and reporting them in the final output? If there's a bug, we'll need some example data to help troubleshoot it. many thanks, ~brian On Tue, Jun 10, 2014 at 9:42 AM, 卢 汉斌 <lh...@gm...> wrote: > Hello everyone, > > I use the following command to find coding region in my trinity assembly: > > TransDecoder -t target_transcripts.fasta --reuse --search_pfam /path_to_transdecoder/pfam/Pfam-AB.hmm.bin --CPU 5 > > > It generates several output files. Next, I want to find the transcripts that contain the domain I interested. I search the target_transcripts.transdecorder.pfam.dat ( .domtbl ) file for lines that contain the name of the domain I am interested. A typical record is display as follow: > > > DUF640 PF04852.7 133 Unigene0069328|m.26647 - 127 1.2e-50 172.2 0.1 1 1 5.6e-55 2e-50 171.6 0.1 39 133 1 95 1 95 0.99 Protein of unknown function (DUF640) > > I get the ids ( "Unigene0069328|m.26647" in the example line ) and pick up those protein sequences in the target_transcripts.transdecorder.pep file, output of the TransDecoder. However, many records I found in the pfam.dat cannot be found in .pep file. > > I select several "missing sequences" and predict their coding region on NCBI. They all have ORFs and the domain I interested. So why are these sequences not transformed to peptide sequences and record in the TransDecoder output file —— target_transcripts.transdecorder.pep. > > Thank you for your help. > > Best, > David Lu > > > > ------------------------------------------------------------------------------ > HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions > Find What Matters Most in Your Big Data with HPCC Systems > Open Source. Fast. Scalable. Simple. Ideal for Dirty Data. > Leverages Graph Analysis for Fast Processing & Easy Data Exploration > http://p.sf.net/sfu/hpccsystems > _______________________________________________ > Transdecoder-users mailing list > Tra...@li... > https://lists.sourceforge.net/lists/listinfo/transdecoder-users > > -- -- Brian J. Haas The Broad Institute http://broad.mit.edu/~bhaas |