Re: [Cg-pipeline-users] additional predictor

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Hi Kok, I am revisiting this issue right now.  I am trying gene prediction
on a plasmid that we have in-house because it is faster to do so.  My table
looks like the following:

conditionevidenceCDS w/o prodigal1194
w prodigal1203
w prodigal2178w prodigal3164
w prodigal45
So using all four predictors is ridiculous (not that anyone was suggesting
that).  I also agree that two predictors is sufficient and is in line with
what I expected.  I'll add that as a default option in the conf file in
cgpipelinerc.  The option will be listed as

prediction_min_predictors_to_call_orf = 2

I know it has been a long time, but did I answer the whole issue?

On Tue, Oct 1, 2013 at 10:11 AM, Lee Katz <ls...@gm...> wrote:

> This looks like something that I need to understand further.
>  Unfortunately I am on furlough and so I am wrapping up everything right
> now.  Please keep this project folder on-hand though because this is a very
> relevant analysis that will help CG-Pipeline in the future.
>
>
> On Sun, Sep 29, 2013 at 11:26 PM, kok <kok...@gm...> wrote:
>
>>  Dear Lee Katz,
>>
>>   condition evidence CDS  w/o prodigal 1 6048  w prodigal 1 8263  w
>> prodigal 2 6192  w prodigal 3 5683  w prodigal 4 875
>> Table above shows the statistics of the results that I have got for the
>> test on several evidence value. It seems that evidence 2  is the one that I
>> would like to proceed for as it returns with the most CDS which by manual
>> check I can see some are collected from just prodigal and BLAST evidence
>> that missed from previous CGP without prodigal included.
>>
>> However, I find out that the coordinates of the CDS from BLAST and
>> prodigal recorded with infinity "complement(1617923..inf)". From the
>> run_prediction scripts I see that the prodigal is not used for reconcile
>> prediction which I am not sure whether it's the cause of the error, but
>> including prodigal for the start prediction during prediction
>> reconciliation will be good as I have seen cases that prodigal are doing
>> better with the start prediction compared to the other predictors. Hope
>> this is easy to be implemented and thanks for your time.
>>
>> Regards,
>> Kok
>>
>>
>>
>> On 12/8/2013 7:25 PM, Lee Katz wrote:
>>
>> I know!  Prodigal is just so easy to use, and so it was really easy to
>> make a wrapper around it.
>>
>>  2/4 might be ok too, but I do not have enough time to perform any
>> rigorous tests to see which way is better.  If you have time, please let me
>> and the community know which gives you better results.  I think it would be
>> informative to know what 1/4, 2/4, 3/4, and 4/4 gives you for each genome.
>>  There is an interesting table that the Georgia Tech compgenomics class
>> created this year, at
>> http://compgenomics2013.biology.gatech.edu/index.php/Gene_Prediction_Group#Gene_Prediction_Pipeline.
>>
>>
>>  The way to change the minimum number of predictors is to alter the
>> variable $$settings{min_predictors_to_call_orf} in run_prediction.
>>
>>  Around line 161 in run_prediction, where it says something like
>> # Categorize and reconcile predictions
>> Set it back to 2 so that you can have 2/4 predictors.
>>
>>  $$settings{min_predictors_to_call_orf}=2;
>>
>>
>> On Sun, Aug 11, 2013 at 11:57 PM, kok wei <kok...@gm...> wrote:
>>
>>> Wow, that's very great and it's faster than planned. I will certainly
>>> try out the pipeline on my genome and update you with the results. I'm
>>> thinking of probably having 2/4 evidence will be good enough as false
>>> positive is preferred over false negative for the gene prediction, any
>>> opinion?
>>>
>>> Thanks for your efforts and helps.
>>>
>>>
>>>
>>> On Sun, Aug 11, 2013 at 7:35 PM, Lee Katz <ls...@gm...> wrote:
>>>
>>>> Hi Kok, I added a script run_prediction_prodigal.pl into source
>>>> control.  It outputs a GFF file of CDS predictions.  I also made sure it
>>>> outputs the training file to the temporary directory because it seems like
>>>> you are interested in the training files.
>>>>
>>>>  I also modified run_prediction and the CGPipelineUtils module so that
>>>> it predicts alongside the other predictors.  Lastly, I added an option
>>>> prediction_use_prodigal = 1 under the config file so that you can enable it
>>>> for run_prediction.  With Prodigal, each gene must have 2/3 or 3/4 majority
>>>> to be called (depending on whether you use genemark too).
>>>>
>>>>  I'm new to prodigal, so please let me know if it all looks correct.
>>>>  The command seems simple enough but I don't know if there are
>>>> any idiosyncrasies to be aware of.
>>>>
>>>>
>>>> On Fri, Aug 2, 2013 at 11:32 PM, Gmail <kok...@gm...> wrote:
>>>>
>>>>>  Thanks Jay and Lee. It will be great if the option is added. I like
>>>>> prodigal for their better start prediction (from what i get for my test
>>>>> genome) and less false prediction for bacterial genome as claimed. Looking
>>>>> forward to the update, thanks!
>>>>>
>>>>>  On 02/08/2013, at 23:51, Lee Katz <ls...@gm...> wrote:
>>>>>
>>>>>   I'm returning to the US and back to work on Aug 12. It sounds like
>>>>> a worthy addition.
>>>>>
>>>>>   I like prodigal but never bothered to put it in as an option. I
>>>>> think it could be something optional like genemark and would be preferred
>>>>> if not using genemark. In this way, CGP would still be able to have a
>>>>> majority for gene calling even if you don't have genemark.
>>>>>
>>>>> On Aug 2, 2013, at 15:43, Jay <jhu...@gm...> wrote:
>>>>>
>>>>>   As far as I know, there is no convenient way of doing this. The
>>>>> run_prediction script would have to be modified to support running it and
>>>>> parsing the results.
>>>>>
>>>>> On 02/08/2013 22:52, kok wrote:
>>>>>
>>>>> Is it possible for cg pipeline to include the results of other *ab-initio
>>>>> *predictor (eg. prodigal)? Is there any development for this
>>>>> function?
>>>>> Or if I would like to use prodigal in place of genemark (if only two
>>>>> predictors allowed), can I convert the results of prodigal into
>>>>> genemark-like gm_out.lst file for cg pipeline's run_predict as a simple
>>>>> modification?
>>>>>
>>>>> - kok -
>>>>>
>>>>>
>>>>> ------------------------------------------------------------------------------
>>>>> Get your SQL database under version control now!
>>>>> Version control is standard for application code, but databases havent
>>>>> caught up. So what steps can you take to put your SQL databases under
>>>>> version control? Why should you start doing it? Read more to find out.http://pubads.g.doubleclick.net/gampad/clk?id=49501711&iu=/4140/ostg.clktrk
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Cg-pipeline-users mailing lis...@li...https://lists.sourceforge.net/lists/listinfo/cg-pipeline-users
>>>>>
>>>>>
>>>>>
>>>>> ------------------------------------------------------------------------------
>>>>> Get your SQL database under version control now!
>>>>> Version control is standard for application code, but databases havent
>>>>> caught up. So what steps can you take to put your SQL databases under
>>>>> version control? Why should you start doing it? Read more to find out.
>>>>>
>>>>> http://pubads.g.doubleclick.net/gampad/clk?id=49501711&iu=/4140/ostg.clktrk
>>>>>
>>>>>  _______________________________________________
>>>>> Cg-pipeline-users mailing list
>>>>> Cg-...@li...
>>>>> https://lists.sourceforge.net/lists/listinfo/cg-pipeline-users
>>>>>
>>>>>
>>>>
>>>>
>>>>   --
>>>> Lee Katz, Ph.D.
>>>>
>>>
>>>
>>
>>
>>  --
>> Lee Katz, Ph.D.
>>
>>
>>
>
>
> --
> Lee Katz, Ph.D.
>

-- 
Lee Katz, Ph.D.

Re: [Cg-pipeline-users] additional predictor

A computational genomics pipeline for prokaryotic sequencing projects

Re: [Cg-pipeline-users] additional predictor