Re: [Transdecoder-users] File for Training
Extracting likely coding regions from transcript sequences
Brought to you by:
bhaas
From: <Ale...@cs...> - 2014-05-03 07:25:11
|
Hello I never used it with another species. Are you sure you want to do this? If you do then certainly filter out the low confidence ORFs WRT the optio --train <string> FASTA file with ORFs to train Markov Mod for protein identification; otherwise longest non-redundant ORFs used So basically give it a FASTA file with open reading frames (ORFs i.e. CDS, codons, etc). If you don't provide --train then the longest (non-redundant) ORFs are used. If you assembly is RNASeq and is decent then I think that should be good. If you do decide to try with a different species, I'd be keen to know what the results are. I guess one metric is how many unique PFAM hits you get from a protein translation (unique so that isoforms/alleles don't exagerrate stats). Compare that with the official annotation of a related species... a -- Dr. Alexie Papanicolaou Phone: +61(0) 2 6246 4511| Mobile: +61 (0) 46 85 81 247 CSIRO Ecosystem Sciences, GPO Box 1700, Canberra 2601, ACT, Australia -- CSIRO profile<http://www.csiro.au/Organisation-Structure/Divisions/Ecosystem-Sciences/AlexiePapanicolaou.aspx> -- ResearcherID<http://www.researcherid.com/rid/A-1618-2011> -- Vision without action is dreaming -- Action without vision is waste ________________________________ From: Brian Haas [bh...@br...] Sent: Friday, 2 May 2014 10:04 PM To: 卢 汉斌 Cc: tra...@li...; Papanicolaou, Alexie (CES, Black Mountain) Subject: Re: [Transdecoder-users] File for Training Alexie - can you respond to this? It's one of the options you incorporated. thanks, ~brian On Fri, May 2, 2014 at 5:37 AM, 卢 汉斌 <lh...@gm...<mailto:lh...@gm...>> wrote: Hello, I try to find coding regions within transcripts using TransDecoder. I want to use the close species (with relatively detailed genome annotation) to train Markov Mod for protein identification. I don’t quite understand what kind of file should I transmit to “―train” option, whether the annotation protein FASTA file or annotation CDS FASTA file of the close species? Also, the annotation proteins of this close species still contain a portion of low confidence genes. Should I filter out those low confidence genes or pick up some high confidence ORFs for training Markov Mod? Thank you for you advise. Best, David ------------------------------------------------------------------------------ "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE Instantly run your Selenium tests across 300+ browser/OS combos. Get unparalleled scalability from the best Selenium testing platform available. Simple to use. Nothing to install. Get started now for free." http://p.sf.net/sfu/SauceLabs _______________________________________________ Transdecoder-users mailing list Tra...@li...<mailto:Tra...@li...> https://lists.sourceforge.net/lists/listinfo/transdecoder-users -- -- Brian J. Haas The Broad Institute http://broad.mit.edu/~bhaas |