From: Gmail <kok...@gm...> - 2013-07-02 15:16:23
|
Okay, probably it's the incompleteness of my draft genome that caused the predicted start site to be downstream from the true start site. Thanks and your replies are highly appreciated. On 02/07/2013, at 23:07, Lee Katz <ls...@gm...> wrote: > Sure. When developing the strategy for gene prediction, we decided to accept the longest orf possible. Therefore, we always capture the whole gene (if the locus is correctly identified as a gene because there are undoubtedly some false positives), but the 5' region might be a false positive. In other words, a downstream start codon might be the true start in CGP-predicted genes. I believe in our publication, we correctly identified genes 95% of the time, but we predicted the correct start 85% of the time (and the true start site would have been downstream in 15% of the predictions). > > > On Tue, Jul 2, 2013 at 5:14 AM, kokwei <kok...@gm...> wrote: >> Thanks for your information. I am thinking the reference proteins I used as database are conserved in their gene model, so hoping that with the blast result against database will predict the "real start" (with reference) instead of based on ab initio which might have false prediction for start codon. With the current protein blast alignment, it will pinned to the closest start codon upstream, but the "real start" might be even more upstream. >> >> Can I have your opinion on this based on your experience with the optimization? Thanks. >> >> >> On 2/7/2013 3:36 AM, Lee Katz wrote: >>> It looks like you are correct. Over the last couple of years, I optimized a few different things and I guess I obviated that subroutine! >>> >>> Yes, run_prediction will run GeneMark and Glimmer3, and combine with blast, making a high-confidence set of genes. To see a graphical representation and explanation of our combining strategy, please see our 2010 paper: http://bioinformatics.oxfordjournals.org/content/26/15/1819.full >>> >>> >>> On Mon, Jul 1, 2013 at 12:49 PM, kokwei <kok...@gm...> wrote: >>>> >>>> Hi, >>>> >>>> I noticed that during execution of pipeline with command <run_pipeline >>>> predict -p project>, the module "sub blastSeqs($$$) {" at line 1817 is >>>> not executed. Is that normal or only happen in my case? Attached the log >>>> of the prediction for your reference. >>>> >>>> I guess that module will do the blast of the contig/scaffolds sequences >>>> against the database selected and output the homology based gene model >>>> to combine with other ab initio gene model by glimmer and genemark, >>>> which is what I need and hope that this can be solved. >>>> >>>> Thanks. >>>> >>>> >>>> >>>> >>>> >>> >>> >>> >>> -- >>> Lee Katz, Ph.D. >> > > > > -- > Lee Katz, Ph.D. |