Re: [Cg-pipeline-users] additional predictor

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Dear Lee Katz,

Thanks for revisiting the issue. I did changed the option in the conf 
file previously during my test but another issue I pointed out was the 
coordinates of the CDS from only BLAST and prodigal recorded with 
infinity coordinate "complement(1617923..inf)".

 From the run_prediction scripts I see that the prodigal is not used for 
reconcile prediction which I am not sure whether it's the cause of the 
error (or maybe you have changed this as well?). Hope you get what I 
mean here and thanks for your time again.

- kok -

> Hi Kok, I am revisiting this issue right now.  I am trying gene 
> prediction on a plasmid that we have in-house because it is faster to 
> do so.  My table looks like the following:
>
> condition 	evidence 	CDS
> w/o prodigal 	1 	194
> w prodigal 	1 	203
> w prodigal 	2 	178
> w prodigal 	3 	164
> w prodigal 	4 	5
>
>
> So using all four predictors is ridiculous (not that anyone was 
> suggesting that).  I also agree that two predictors is sufficient and 
> is in line with what I expected.  I'll add that as a default option in 
> the conf file in cgpipelinerc.  The option will be listed as
>
> prediction_min_predictors_to_call_orf = 2
>
> I know it has been a long time, but did I answer the whole issue?
>
>
>
> On Tue, Oct 1, 2013 at 10:11 AM, Lee Katz <ls...@gm... 
> <mailto:ls...@gm...>> wrote:
>
>     This looks like something that I need to understand further.
>      Unfortunately I am on furlough and so I am wrapping up everything
>     right now.  Please keep this project folder on-hand though because
>     this is a very relevant analysis that will help CG-Pipeline in the
>     future.
>
>
>     On Sun, Sep 29, 2013 at 11:26 PM, kok <kok...@gm...
>     <mailto:kok...@gm...>> wrote:
>
>         Dear Lee Katz,
>
>         condition 	evidence 	CDS
>         w/o prodigal 	1 	6048
>         w prodigal 	1 	8263
>         w prodigal 	2 	6192
>         w prodigal 	3 	5683
>         w prodigal 	4 	875
>
>
>         Table above shows the statistics of the results that I have
>         got for the test on several evidence value. It seems that
>         evidence 2  is the one that I would like to proceed for as it
>         returns with the most CDS which by manual check I can see some
>         are collected from just prodigal and BLAST evidence that
>         missed from previous CGP without prodigal included.
>
>         However, I find out that the coordinates of the CDS from BLAST
>         and prodigal recorded with infinity
>         "complement(1617923..inf)". From the run_prediction scripts I
>         see that the prodigal is not used for reconcile prediction
>         which I am not sure whether it's the cause of the error, but
>         including prodigal for the start prediction during prediction
>         reconciliation will be good as I have seen cases that prodigal
>         are doing better with the start prediction compared to the
>         other predictors. Hope this is easy to be implemented and
>         thanks for your time.
>
>         Regards,
>         Kok
>
>
>
>         On 12/8/2013 7:25 PM, Lee Katz wrote:
>>         I know!  Prodigal is just so easy to use, and so it was
>>         really easy to make a wrapper around it.
>>
>>         2/4 might be ok too, but I do not have enough time to perform
>>         any rigorous tests to see which way is better.  If you have
>>         time, please let me and the community know which gives you
>>         better results.  I think it would be informative to know what
>>         1/4, 2/4, 3/4, and 4/4 gives you for each genome.  There is
>>         an interesting table that the Georgia Tech compgenomics class
>>         created this year, at
>>         http://compgenomics2013.biology.gatech.edu/index.php/Gene_Prediction_Group#Gene_Prediction_Pipeline.
>>
>>
>>         The way to change the minimum number of predictors is to
>>         alter the variable $$settings{min_predictors_to_call_orf} in
>>         run_prediction.
>>
>>         Around line 161 in run_prediction, where it says something like
>>         # Categorize and reconcile predictions
>>         Set it back to 2 so that you can have 2/4 predictors.
>>
>>         $$settings{min_predictors_to_call_orf}=2;
>>
>>
>>         On Sun, Aug 11, 2013 at 11:57 PM, kok wei <kok...@gm...
>>         <mailto:kok...@gm...>> wrote:
>>
>>             Wow, that's very great and it's faster than planned. I
>>             will certainly try out the pipeline on my genome and
>>             update you with the results. I'm thinking of probably
>>             having 2/4 evidence will be good enough as false positive
>>             is preferred over false negative for the gene prediction,
>>             any opinion?
>>
>>             Thanks for your efforts and helps.
>>
>>
>>
>>             On Sun, Aug 11, 2013 at 7:35 PM, Lee Katz
>>             <ls...@gm... <mailto:ls...@gm...>> wrote:
>>
>>                 Hi Kok, I added a script run_prediction_prodigal.pl
>>                 <http://run_prediction_prodigal.pl> into source
>>                 control.  It outputs a GFF file of CDS predictions.
>>                  I also made sure it outputs the training file to the
>>                 temporary directory because it seems like you are
>>                 interested in the training files.
>>
>>                 I also modified run_prediction and the
>>                 CGPipelineUtils module so that it predicts alongside
>>                 the other predictors.  Lastly, I added an option
>>                 prediction_use_prodigal = 1 under the config file so
>>                 that you can enable it for run_prediction.  With
>>                 Prodigal, each gene must have 2/3 or 3/4 majority to
>>                 be called (depending on whether you use genemark too).
>>
>>                 I'm new to prodigal, so please let me know if it all
>>                 looks correct.  The command seems simple enough but I
>>                 don't know if there are any idiosyncrasies to be
>>                 aware of.
>>
>>
>>                 On Fri, Aug 2, 2013 at 11:32 PM, Gmail
>>                 <kok...@gm... <mailto:kok...@gm...>> wrote:
>>
>>                     Thanks Jay and Lee. It will be great if the
>>                     option is added. I like prodigal for their better
>>                     start prediction (from what i get for my test
>>                     genome) and less false prediction for bacterial
>>                     genome as claimed. Looking forward to the update,
>>                     thanks!
>>
>>                     On 02/08/2013, at 23:51, Lee Katz
>>                     <ls...@gm... <mailto:ls...@gm...>> wrote:
>>
>>>                     I'm returning to the US and back to work on Aug
>>>                     12. It sounds like a worthy addition.
>>>
>>>                      I like prodigal but never bothered to put it in
>>>                     as an option. I think it could be something
>>>                     optional like genemark and would be preferred if
>>>                     not using genemark. In this way, CGP would still
>>>                     be able to have a majority for gene calling even
>>>                     if you don't have genemark.
>>>
>>>                     On Aug 2, 2013, at 15:43, Jay
>>>                     <jhu...@gm...
>>>                     <mailto:jhu...@gm...>> wrote:
>>>
>>>>                     As far as I know, there is no convenient way of
>>>>                     doing this. The run_prediction script would
>>>>                     have to be modified to support running it and
>>>>                     parsing the results.
>>>>
>>>>                     On 02/08/2013 22:52, kok wrote:
>>>>>                     Is it possible for cg pipeline to include the
>>>>>                     results of other /ab-initio /predictor (eg.
>>>>>                     prodigal)? Is there any development for this
>>>>>                     function?
>>>>>                     Or if I would like to use prodigal in place of
>>>>>                     genemark (if only two predictors allowed), can
>>>>>                     I convert the results of prodigal into
>>>>>                     genemark-like gm_out.lst file for cg
>>>>>                     pipeline's run_predict as a simple modification?
>>>>>
>>>>>                     - kok -
>>>>>
>>>>>
>>>>>                     ------------------------------------------------------------------------------
>>>>>                     Get your SQL database under version control now!
>>>>>                     Version control is standard for application code, but databases havent
>>>>>                     caught up. So what steps can you take to put your SQL databases under
>>>>>                     version control? Why should you start doing it? Read more to find out.
>>>>>                     http://pubads.g.doubleclick.net/gampad/clk?id=49501711&iu=/4140/ostg.clktrk
>>>>>
>>>>>
>>>>>                     _______________________________________________
>>>>>                     Cg-pipeline-users mailing list
>>>>>                     Cg-...@li...  <mailto:Cg-...@li...>
>>>>>                     https://lists.sourceforge.net/lists/listinfo/cg-pipeline-users
>>>>
>>>>                     ------------------------------------------------------------------------------
>>>>                     Get your SQL database under version control now!
>>>>                     Version control is standard for application
>>>>                     code, but databases havent
>>>>                     caught up. So what steps can you take to put
>>>>                     your SQL databases under
>>>>                     version control? Why should you start doing it?
>>>>                     Read more to find out.
>>>>                     http://pubads.g.doubleclick.net/gampad/clk?id=49501711&iu=/4140/ostg.clktrk
>>>>                     _______________________________________________
>>>>                     Cg-pipeline-users mailing list
>>>>                     Cg-...@li...
>>>>                     <mailto:Cg-...@li...>
>>>>                     https://lists.sourceforge.net/lists/listinfo/cg-pipeline-users
>>
>>
>>
>>
>>                 -- 
>>                 Lee Katz, Ph.D.
>>
>>
>>
>>
>>
>>         -- 
>>         Lee Katz, Ph.D.
>
>
>
>
>     -- 
>     Lee Katz, Ph.D.
>
>
>
>
> -- 
> Lee Katz, Ph.D.

Re: [Cg-pipeline-users] additional predictor

A computational genomics pipeline for prokaryotic sequencing projects

Re: [Cg-pipeline-users] additional predictor