[Transdecoder-users] File for Training

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hello,

I try to find coding regions within transcripts using TransDecoder.  I want to use the close species (with relatively detailed genome annotation) to train Markov Mod for protein identification. I don’t quite understand what kind of file should I transmit to “—train” option, whether the annotation protein FASTA file or annotation CDS FASTA file of the close species?  

Also, the annotation proteins of this close species still contain a portion of low confidence genes. Should I filter out those low confidence genes or pick up some high confidence ORFs for training Markov Mod?

Thank you for you advise.

Best,
David

[Transdecoder-users] File for Training

Extracting likely coding regions from transcript sequences

[Transdecoder-users] File for Training