Re: [Transdecoder-users] File for Training

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hello

I never used it with another species. Are you sure you want to do this? If you do then certainly filter out the low confidence ORFs
WRT the optio
--train <string>                       FASTA file with ORFs to train Markov Mod for protein identification; otherwise longest non-redundant ORFs used

So basically give it a FASTA file with open reading frames (ORFs i.e. CDS, codons, etc).

If you don't provide --train then the longest (non-redundant) ORFs are used. If you assembly is RNASeq and is decent then I think that should be good.

If you do decide to try with a different species, I'd be keen to know what the results are. I guess one metric is how many unique PFAM hits you get from a protein translation (unique so that isoforms/alleles don't exagerrate stats). Compare that with the official annotation of a related species...
a

--
Dr. Alexie Papanicolaou

Phone: +61(0) 2 6246 4511| Mobile: +61 (0) 46 85 81 247
CSIRO Ecosystem Sciences, GPO Box 1700, Canberra 2601, ACT, Australia

-- CSIRO profile<http://www.csiro.au/Organisation-Structure/Divisions/Ecosystem-Sciences/AlexiePapanicolaou.aspx>
-- ResearcherID<http://www.researcherid.com/rid/A-1618-2011>
-- Vision without action is dreaming
-- Action without vision is waste
________________________________
From: Brian Haas [bh...@br...]
Sent: Friday, 2 May 2014 10:04 PM
To: 卢 汉斌
Cc: tra...@li...; Papanicolaou, Alexie (CES, Black Mountain)
Subject: Re: [Transdecoder-users] File for Training

Alexie - can you respond to this?  It's one of the options you incorporated.

thanks,

~brian

On Fri, May 2, 2014 at 5:37 AM, 卢 汉斌 <lh...@gm...<mailto:lh...@gm...>> wrote:
Hello,

I try to find coding regions within transcripts using TransDecoder.  I want to use the close species (with relatively detailed genome annotation) to train Markov Mod for protein identification. I don’t quite understand what kind of file should I transmit to “―train” option, whether the annotation protein FASTA file or annotation CDS FASTA file of the close species?

Also, the annotation proteins of this close species still contain a portion of low confidence genes. Should I filter out those low confidence genes or pick up some high confidence ORFs for training Markov Mod?

Thank you for you advise.

Best,
David
------------------------------------------------------------------------------
"Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE
Instantly run your Selenium tests across 300+ browser/OS combos.  Get
unparalleled scalability from the best Selenium testing platform available.
Simple to use. Nothing to install. Get started now for free."
http://p.sf.net/sfu/SauceLabs
_______________________________________________
Transdecoder-users mailing list
Tra...@li...<mailto:Tra...@li...>
https://lists.sourceforge.net/lists/listinfo/transdecoder-users

--
--
Brian J. Haas
The Broad Institute
http://broad.mit.edu/~bhaas

Re: [Transdecoder-users] File for Training

Extracting likely coding regions from transcript sequences

Re: [Transdecoder-users] File for Training