From: Lee K. <ls...@gm...> - 2014-03-19 18:05:02
|
Hi Kok, I am revisiting this issue right now. I am trying gene prediction on a plasmid that we have in-house because it is faster to do so. My table looks like the following: conditionevidenceCDS w/o prodigal1194 w prodigal1203 w prodigal2178w prodigal3164 w prodigal45 So using all four predictors is ridiculous (not that anyone was suggesting that). I also agree that two predictors is sufficient and is in line with what I expected. I'll add that as a default option in the conf file in cgpipelinerc. The option will be listed as prediction_min_predictors_to_call_orf = 2 I know it has been a long time, but did I answer the whole issue? On Tue, Oct 1, 2013 at 10:11 AM, Lee Katz <ls...@gm...> wrote: > This looks like something that I need to understand further. > Unfortunately I am on furlough and so I am wrapping up everything right > now. Please keep this project folder on-hand though because this is a very > relevant analysis that will help CG-Pipeline in the future. > > > On Sun, Sep 29, 2013 at 11:26 PM, kok <kok...@gm...> wrote: > >> Dear Lee Katz, >> >> condition evidence CDS w/o prodigal 1 6048 w prodigal 1 8263 w >> prodigal 2 6192 w prodigal 3 5683 w prodigal 4 875 >> Table above shows the statistics of the results that I have got for the >> test on several evidence value. It seems that evidence 2 is the one that I >> would like to proceed for as it returns with the most CDS which by manual >> check I can see some are collected from just prodigal and BLAST evidence >> that missed from previous CGP without prodigal included. >> >> However, I find out that the coordinates of the CDS from BLAST and >> prodigal recorded with infinity "complement(1617923..inf)". From the >> run_prediction scripts I see that the prodigal is not used for reconcile >> prediction which I am not sure whether it's the cause of the error, but >> including prodigal for the start prediction during prediction >> reconciliation will be good as I have seen cases that prodigal are doing >> better with the start prediction compared to the other predictors. Hope >> this is easy to be implemented and thanks for your time. >> >> Regards, >> Kok >> >> >> >> On 12/8/2013 7:25 PM, Lee Katz wrote: >> >> I know! Prodigal is just so easy to use, and so it was really easy to >> make a wrapper around it. >> >> 2/4 might be ok too, but I do not have enough time to perform any >> rigorous tests to see which way is better. If you have time, please let me >> and the community know which gives you better results. I think it would be >> informative to know what 1/4, 2/4, 3/4, and 4/4 gives you for each genome. >> There is an interesting table that the Georgia Tech compgenomics class >> created this year, at >> http://compgenomics2013.biology.gatech.edu/index.php/Gene_Prediction_Group#Gene_Prediction_Pipeline. >> >> >> The way to change the minimum number of predictors is to alter the >> variable $$settings{min_predictors_to_call_orf} in run_prediction. >> >> Around line 161 in run_prediction, where it says something like >> # Categorize and reconcile predictions >> Set it back to 2 so that you can have 2/4 predictors. >> >> $$settings{min_predictors_to_call_orf}=2; >> >> >> On Sun, Aug 11, 2013 at 11:57 PM, kok wei <kok...@gm...> wrote: >> >>> Wow, that's very great and it's faster than planned. I will certainly >>> try out the pipeline on my genome and update you with the results. I'm >>> thinking of probably having 2/4 evidence will be good enough as false >>> positive is preferred over false negative for the gene prediction, any >>> opinion? >>> >>> Thanks for your efforts and helps. >>> >>> >>> >>> On Sun, Aug 11, 2013 at 7:35 PM, Lee Katz <ls...@gm...> wrote: >>> >>>> Hi Kok, I added a script run_prediction_prodigal.pl into source >>>> control. It outputs a GFF file of CDS predictions. I also made sure it >>>> outputs the training file to the temporary directory because it seems like >>>> you are interested in the training files. >>>> >>>> I also modified run_prediction and the CGPipelineUtils module so that >>>> it predicts alongside the other predictors. Lastly, I added an option >>>> prediction_use_prodigal = 1 under the config file so that you can enable it >>>> for run_prediction. With Prodigal, each gene must have 2/3 or 3/4 majority >>>> to be called (depending on whether you use genemark too). >>>> >>>> I'm new to prodigal, so please let me know if it all looks correct. >>>> The command seems simple enough but I don't know if there are >>>> any idiosyncrasies to be aware of. >>>> >>>> >>>> On Fri, Aug 2, 2013 at 11:32 PM, Gmail <kok...@gm...> wrote: >>>> >>>>> Thanks Jay and Lee. It will be great if the option is added. I like >>>>> prodigal for their better start prediction (from what i get for my test >>>>> genome) and less false prediction for bacterial genome as claimed. Looking >>>>> forward to the update, thanks! >>>>> >>>>> On 02/08/2013, at 23:51, Lee Katz <ls...@gm...> wrote: >>>>> >>>>> I'm returning to the US and back to work on Aug 12. It sounds like >>>>> a worthy addition. >>>>> >>>>> I like prodigal but never bothered to put it in as an option. I >>>>> think it could be something optional like genemark and would be preferred >>>>> if not using genemark. In this way, CGP would still be able to have a >>>>> majority for gene calling even if you don't have genemark. >>>>> >>>>> On Aug 2, 2013, at 15:43, Jay <jhu...@gm...> wrote: >>>>> >>>>> As far as I know, there is no convenient way of doing this. The >>>>> run_prediction script would have to be modified to support running it and >>>>> parsing the results. >>>>> >>>>> On 02/08/2013 22:52, kok wrote: >>>>> >>>>> Is it possible for cg pipeline to include the results of other *ab-initio >>>>> *predictor (eg. prodigal)? Is there any development for this >>>>> function? >>>>> Or if I would like to use prodigal in place of genemark (if only two >>>>> predictors allowed), can I convert the results of prodigal into >>>>> genemark-like gm_out.lst file for cg pipeline's run_predict as a simple >>>>> modification? >>>>> >>>>> - kok - >>>>> >>>>> >>>>> ------------------------------------------------------------------------------ >>>>> Get your SQL database under version control now! >>>>> Version control is standard for application code, but databases havent >>>>> caught up. So what steps can you take to put your SQL databases under >>>>> version control? Why should you start doing it? Read more to find out.http://pubads.g.doubleclick.net/gampad/clk?id=49501711&iu=/4140/ostg.clktrk >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> Cg-pipeline-users mailing lis...@li...https://lists.sourceforge.net/lists/listinfo/cg-pipeline-users >>>>> >>>>> >>>>> >>>>> ------------------------------------------------------------------------------ >>>>> Get your SQL database under version control now! >>>>> Version control is standard for application code, but databases havent >>>>> caught up. So what steps can you take to put your SQL databases under >>>>> version control? Why should you start doing it? Read more to find out. >>>>> >>>>> http://pubads.g.doubleclick.net/gampad/clk?id=49501711&iu=/4140/ostg.clktrk >>>>> >>>>> _______________________________________________ >>>>> Cg-pipeline-users mailing list >>>>> Cg-...@li... >>>>> https://lists.sourceforge.net/lists/listinfo/cg-pipeline-users >>>>> >>>>> >>>> >>>> >>>> -- >>>> Lee Katz, Ph.D. >>>> >>> >>> >> >> >> -- >> Lee Katz, Ph.D. >> >> >> > > > -- > Lee Katz, Ph.D. > -- Lee Katz, Ph.D. |