From: Lee K. <ls...@gm...> - 2013-07-02 15:08:21
|
Sure. When developing the strategy for gene prediction, we decided to accept the longest orf possible. Therefore, we always capture the whole gene (if the locus is correctly identified as a gene because there are undoubtedly some false positives), but the 5' region might be a false positive. In other words, a downstream start codon might be the true start in CGP-predicted genes. I believe in our publication, we correctly identified genes 95% of the time, but we predicted the correct start 85% of the time (and the true start site would have been downstream in 15% of the predictions). On Tue, Jul 2, 2013 at 5:14 AM, kokwei <kok...@gm...> wrote: > Thanks for your information. I am thinking the reference proteins I used > as database are conserved in their gene model, so hoping that with the > blast result against database will predict the "real start" (with > reference) instead of based on ab initio which might have false prediction > for start codon. With the current protein blast alignment, it will pinned > to the closest start codon upstream, but the "real start" might be even > more upstream. > > Can I have your opinion on this based on your experience with the > optimization? Thanks. > > > On 2/7/2013 3:36 AM, Lee Katz wrote: > > It looks like you are correct. Over the last couple of years, I optimized > a few different things and I guess I obviated that subroutine! > > Yes, run_prediction will run GeneMark and Glimmer3, and combine with > blast, making a high-confidence set of genes. To see a graphical > representation and explanation of our combining strategy, please see our > 2010 paper: > http://bioinformatics.oxfordjournals.org/content/26/15/1819.full > > > On Mon, Jul 1, 2013 at 12:49 PM, kokwei <kok...@gm...> wrote: > >> >> Hi, >> >> I noticed that during execution of pipeline with command <run_pipeline >> predict -p project>, the module "sub blastSeqs($$$) {" at line 1817 is >> not executed. Is that normal or only happen in my case? Attached the log >> of the prediction for your reference. >> >> I guess that module will do the blast of the contig/scaffolds sequences >> against the database selected and output the homology based gene model >> to combine with other ab initio gene model by glimmer and genemark, >> which is what I need and hope that this can be solved. >> >> Thanks. >> >> >> >> >> >> > > > -- > Lee Katz, Ph.D. > > > -- Lee Katz, Ph.D. |