You can subscribe to this list here.
2011 |
Jan
|
Feb
(6) |
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
---|---|---|---|---|---|---|---|---|---|---|---|---|
2012 |
Jan
(1) |
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
(8) |
2013 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
(12) |
Jul
(14) |
Aug
(9) |
Sep
(1) |
Oct
(2) |
Nov
|
Dec
|
2014 |
Jan
|
Feb
|
Mar
(2) |
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2015 |
Jan
|
Feb
|
Mar
(1) |
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: Lee K. <ls...@gm...> - 2015-03-31 19:41:04
|
Hi everyone, I haven't said or done much with CGP recently but wanted to let you know just a few things: 1) I have moved the project over to github: https://github.com/lskatz/CG-Pipeline/. I am learning about pull requests and am open to constructive code changes. Please email me before you dive into changing code, so that I am aware of it. 2) I have recovered the documentation (although it is older at this point) at http://cg-pipeline.sourceforge.net/wiki/ 3) I'll keep the mailing list, wiki, and older code at sourceforge 4) If you haven't used it for a while, there are at least a few new assembly/reads scripts, and a bit of optimization in the annotation stage. Thank you for using CGP. I apologize for the severe lack of updates but I just don't have enough time to keep up with it as much as I want to. -- Lee Katz, Ph.D. |
From: kok <kok...@gm...> - 2014-03-25 04:05:30
|
Dear Lee Katz, Thanks for revisiting the issue. I did changed the option in the conf file previously during my test but another issue I pointed out was the coordinates of the CDS from only BLAST and prodigal recorded with infinity coordinate "complement(1617923..inf)". From the run_prediction scripts I see that the prodigal is not used for reconcile prediction which I am not sure whether it's the cause of the error (or maybe you have changed this as well?). Hope you get what I mean here and thanks for your time again. - kok - > Hi Kok, I am revisiting this issue right now. I am trying gene > prediction on a plasmid that we have in-house because it is faster to > do so. My table looks like the following: > > condition evidence CDS > w/o prodigal 1 194 > w prodigal 1 203 > w prodigal 2 178 > w prodigal 3 164 > w prodigal 4 5 > > > So using all four predictors is ridiculous (not that anyone was > suggesting that). I also agree that two predictors is sufficient and > is in line with what I expected. I'll add that as a default option in > the conf file in cgpipelinerc. The option will be listed as > > prediction_min_predictors_to_call_orf = 2 > > I know it has been a long time, but did I answer the whole issue? > > > > On Tue, Oct 1, 2013 at 10:11 AM, Lee Katz <ls...@gm... > <mailto:ls...@gm...>> wrote: > > This looks like something that I need to understand further. > Unfortunately I am on furlough and so I am wrapping up everything > right now. Please keep this project folder on-hand though because > this is a very relevant analysis that will help CG-Pipeline in the > future. > > > On Sun, Sep 29, 2013 at 11:26 PM, kok <kok...@gm... > <mailto:kok...@gm...>> wrote: > > Dear Lee Katz, > > condition evidence CDS > w/o prodigal 1 6048 > w prodigal 1 8263 > w prodigal 2 6192 > w prodigal 3 5683 > w prodigal 4 875 > > > Table above shows the statistics of the results that I have > got for the test on several evidence value. It seems that > evidence 2 is the one that I would like to proceed for as it > returns with the most CDS which by manual check I can see some > are collected from just prodigal and BLAST evidence that > missed from previous CGP without prodigal included. > > However, I find out that the coordinates of the CDS from BLAST > and prodigal recorded with infinity > "complement(1617923..inf)". From the run_prediction scripts I > see that the prodigal is not used for reconcile prediction > which I am not sure whether it's the cause of the error, but > including prodigal for the start prediction during prediction > reconciliation will be good as I have seen cases that prodigal > are doing better with the start prediction compared to the > other predictors. Hope this is easy to be implemented and > thanks for your time. > > Regards, > Kok > > > > On 12/8/2013 7:25 PM, Lee Katz wrote: >> I know! Prodigal is just so easy to use, and so it was >> really easy to make a wrapper around it. >> >> 2/4 might be ok too, but I do not have enough time to perform >> any rigorous tests to see which way is better. If you have >> time, please let me and the community know which gives you >> better results. I think it would be informative to know what >> 1/4, 2/4, 3/4, and 4/4 gives you for each genome. There is >> an interesting table that the Georgia Tech compgenomics class >> created this year, at >> http://compgenomics2013.biology.gatech.edu/index.php/Gene_Prediction_Group#Gene_Prediction_Pipeline. >> >> >> The way to change the minimum number of predictors is to >> alter the variable $$settings{min_predictors_to_call_orf} in >> run_prediction. >> >> Around line 161 in run_prediction, where it says something like >> # Categorize and reconcile predictions >> Set it back to 2 so that you can have 2/4 predictors. >> >> $$settings{min_predictors_to_call_orf}=2; >> >> >> On Sun, Aug 11, 2013 at 11:57 PM, kok wei <kok...@gm... >> <mailto:kok...@gm...>> wrote: >> >> Wow, that's very great and it's faster than planned. I >> will certainly try out the pipeline on my genome and >> update you with the results. I'm thinking of probably >> having 2/4 evidence will be good enough as false positive >> is preferred over false negative for the gene prediction, >> any opinion? >> >> Thanks for your efforts and helps. >> >> >> >> On Sun, Aug 11, 2013 at 7:35 PM, Lee Katz >> <ls...@gm... <mailto:ls...@gm...>> wrote: >> >> Hi Kok, I added a script run_prediction_prodigal.pl >> <http://run_prediction_prodigal.pl> into source >> control. It outputs a GFF file of CDS predictions. >> I also made sure it outputs the training file to the >> temporary directory because it seems like you are >> interested in the training files. >> >> I also modified run_prediction and the >> CGPipelineUtils module so that it predicts alongside >> the other predictors. Lastly, I added an option >> prediction_use_prodigal = 1 under the config file so >> that you can enable it for run_prediction. With >> Prodigal, each gene must have 2/3 or 3/4 majority to >> be called (depending on whether you use genemark too). >> >> I'm new to prodigal, so please let me know if it all >> looks correct. The command seems simple enough but I >> don't know if there are any idiosyncrasies to be >> aware of. >> >> >> On Fri, Aug 2, 2013 at 11:32 PM, Gmail >> <kok...@gm... <mailto:kok...@gm...>> wrote: >> >> Thanks Jay and Lee. It will be great if the >> option is added. I like prodigal for their better >> start prediction (from what i get for my test >> genome) and less false prediction for bacterial >> genome as claimed. Looking forward to the update, >> thanks! >> >> On 02/08/2013, at 23:51, Lee Katz >> <ls...@gm... <mailto:ls...@gm...>> wrote: >> >>> I'm returning to the US and back to work on Aug >>> 12. It sounds like a worthy addition. >>> >>> I like prodigal but never bothered to put it in >>> as an option. I think it could be something >>> optional like genemark and would be preferred if >>> not using genemark. In this way, CGP would still >>> be able to have a majority for gene calling even >>> if you don't have genemark. >>> >>> On Aug 2, 2013, at 15:43, Jay >>> <jhu...@gm... >>> <mailto:jhu...@gm...>> wrote: >>> >>>> As far as I know, there is no convenient way of >>>> doing this. The run_prediction script would >>>> have to be modified to support running it and >>>> parsing the results. >>>> >>>> On 02/08/2013 22:52, kok wrote: >>>>> Is it possible for cg pipeline to include the >>>>> results of other /ab-initio /predictor (eg. >>>>> prodigal)? Is there any development for this >>>>> function? >>>>> Or if I would like to use prodigal in place of >>>>> genemark (if only two predictors allowed), can >>>>> I convert the results of prodigal into >>>>> genemark-like gm_out.lst file for cg >>>>> pipeline's run_predict as a simple modification? >>>>> >>>>> - kok - >>>>> >>>>> >>>>> ------------------------------------------------------------------------------ >>>>> Get your SQL database under version control now! >>>>> Version control is standard for application code, but databases havent >>>>> caught up. So what steps can you take to put your SQL databases under >>>>> version control? Why should you start doing it? Read more to find out. >>>>> http://pubads.g.doubleclick.net/gampad/clk?id=49501711&iu=/4140/ostg.clktrk >>>>> >>>>> >>>>> _______________________________________________ >>>>> Cg-pipeline-users mailing list >>>>> Cg-...@li... <mailto:Cg-...@li...> >>>>> https://lists.sourceforge.net/lists/listinfo/cg-pipeline-users >>>> >>>> ------------------------------------------------------------------------------ >>>> Get your SQL database under version control now! >>>> Version control is standard for application >>>> code, but databases havent >>>> caught up. So what steps can you take to put >>>> your SQL databases under >>>> version control? Why should you start doing it? >>>> Read more to find out. >>>> http://pubads.g.doubleclick.net/gampad/clk?id=49501711&iu=/4140/ostg.clktrk >>>> _______________________________________________ >>>> Cg-pipeline-users mailing list >>>> Cg-...@li... >>>> <mailto:Cg-...@li...> >>>> https://lists.sourceforge.net/lists/listinfo/cg-pipeline-users >> >> >> >> >> -- >> Lee Katz, Ph.D. >> >> >> >> >> >> -- >> Lee Katz, Ph.D. > > > > > -- > Lee Katz, Ph.D. > > > > > -- > Lee Katz, Ph.D. |
From: Lee K. <ls...@gm...> - 2014-03-19 18:05:02
|
Hi Kok, I am revisiting this issue right now. I am trying gene prediction on a plasmid that we have in-house because it is faster to do so. My table looks like the following: conditionevidenceCDS w/o prodigal1194 w prodigal1203 w prodigal2178w prodigal3164 w prodigal45 So using all four predictors is ridiculous (not that anyone was suggesting that). I also agree that two predictors is sufficient and is in line with what I expected. I'll add that as a default option in the conf file in cgpipelinerc. The option will be listed as prediction_min_predictors_to_call_orf = 2 I know it has been a long time, but did I answer the whole issue? On Tue, Oct 1, 2013 at 10:11 AM, Lee Katz <ls...@gm...> wrote: > This looks like something that I need to understand further. > Unfortunately I am on furlough and so I am wrapping up everything right > now. Please keep this project folder on-hand though because this is a very > relevant analysis that will help CG-Pipeline in the future. > > > On Sun, Sep 29, 2013 at 11:26 PM, kok <kok...@gm...> wrote: > >> Dear Lee Katz, >> >> condition evidence CDS w/o prodigal 1 6048 w prodigal 1 8263 w >> prodigal 2 6192 w prodigal 3 5683 w prodigal 4 875 >> Table above shows the statistics of the results that I have got for the >> test on several evidence value. It seems that evidence 2 is the one that I >> would like to proceed for as it returns with the most CDS which by manual >> check I can see some are collected from just prodigal and BLAST evidence >> that missed from previous CGP without prodigal included. >> >> However, I find out that the coordinates of the CDS from BLAST and >> prodigal recorded with infinity "complement(1617923..inf)". From the >> run_prediction scripts I see that the prodigal is not used for reconcile >> prediction which I am not sure whether it's the cause of the error, but >> including prodigal for the start prediction during prediction >> reconciliation will be good as I have seen cases that prodigal are doing >> better with the start prediction compared to the other predictors. Hope >> this is easy to be implemented and thanks for your time. >> >> Regards, >> Kok >> >> >> >> On 12/8/2013 7:25 PM, Lee Katz wrote: >> >> I know! Prodigal is just so easy to use, and so it was really easy to >> make a wrapper around it. >> >> 2/4 might be ok too, but I do not have enough time to perform any >> rigorous tests to see which way is better. If you have time, please let me >> and the community know which gives you better results. I think it would be >> informative to know what 1/4, 2/4, 3/4, and 4/4 gives you for each genome. >> There is an interesting table that the Georgia Tech compgenomics class >> created this year, at >> http://compgenomics2013.biology.gatech.edu/index.php/Gene_Prediction_Group#Gene_Prediction_Pipeline. >> >> >> The way to change the minimum number of predictors is to alter the >> variable $$settings{min_predictors_to_call_orf} in run_prediction. >> >> Around line 161 in run_prediction, where it says something like >> # Categorize and reconcile predictions >> Set it back to 2 so that you can have 2/4 predictors. >> >> $$settings{min_predictors_to_call_orf}=2; >> >> >> On Sun, Aug 11, 2013 at 11:57 PM, kok wei <kok...@gm...> wrote: >> >>> Wow, that's very great and it's faster than planned. I will certainly >>> try out the pipeline on my genome and update you with the results. I'm >>> thinking of probably having 2/4 evidence will be good enough as false >>> positive is preferred over false negative for the gene prediction, any >>> opinion? >>> >>> Thanks for your efforts and helps. >>> >>> >>> >>> On Sun, Aug 11, 2013 at 7:35 PM, Lee Katz <ls...@gm...> wrote: >>> >>>> Hi Kok, I added a script run_prediction_prodigal.pl into source >>>> control. It outputs a GFF file of CDS predictions. I also made sure it >>>> outputs the training file to the temporary directory because it seems like >>>> you are interested in the training files. >>>> >>>> I also modified run_prediction and the CGPipelineUtils module so that >>>> it predicts alongside the other predictors. Lastly, I added an option >>>> prediction_use_prodigal = 1 under the config file so that you can enable it >>>> for run_prediction. With Prodigal, each gene must have 2/3 or 3/4 majority >>>> to be called (depending on whether you use genemark too). >>>> >>>> I'm new to prodigal, so please let me know if it all looks correct. >>>> The command seems simple enough but I don't know if there are >>>> any idiosyncrasies to be aware of. >>>> >>>> >>>> On Fri, Aug 2, 2013 at 11:32 PM, Gmail <kok...@gm...> wrote: >>>> >>>>> Thanks Jay and Lee. It will be great if the option is added. I like >>>>> prodigal for their better start prediction (from what i get for my test >>>>> genome) and less false prediction for bacterial genome as claimed. Looking >>>>> forward to the update, thanks! >>>>> >>>>> On 02/08/2013, at 23:51, Lee Katz <ls...@gm...> wrote: >>>>> >>>>> I'm returning to the US and back to work on Aug 12. It sounds like >>>>> a worthy addition. >>>>> >>>>> I like prodigal but never bothered to put it in as an option. I >>>>> think it could be something optional like genemark and would be preferred >>>>> if not using genemark. In this way, CGP would still be able to have a >>>>> majority for gene calling even if you don't have genemark. >>>>> >>>>> On Aug 2, 2013, at 15:43, Jay <jhu...@gm...> wrote: >>>>> >>>>> As far as I know, there is no convenient way of doing this. The >>>>> run_prediction script would have to be modified to support running it and >>>>> parsing the results. >>>>> >>>>> On 02/08/2013 22:52, kok wrote: >>>>> >>>>> Is it possible for cg pipeline to include the results of other *ab-initio >>>>> *predictor (eg. prodigal)? Is there any development for this >>>>> function? >>>>> Or if I would like to use prodigal in place of genemark (if only two >>>>> predictors allowed), can I convert the results of prodigal into >>>>> genemark-like gm_out.lst file for cg pipeline's run_predict as a simple >>>>> modification? >>>>> >>>>> - kok - >>>>> >>>>> >>>>> ------------------------------------------------------------------------------ >>>>> Get your SQL database under version control now! >>>>> Version control is standard for application code, but databases havent >>>>> caught up. So what steps can you take to put your SQL databases under >>>>> version control? Why should you start doing it? Read more to find out.http://pubads.g.doubleclick.net/gampad/clk?id=49501711&iu=/4140/ostg.clktrk >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> Cg-pipeline-users mailing lis...@li...https://lists.sourceforge.net/lists/listinfo/cg-pipeline-users >>>>> >>>>> >>>>> >>>>> ------------------------------------------------------------------------------ >>>>> Get your SQL database under version control now! >>>>> Version control is standard for application code, but databases havent >>>>> caught up. So what steps can you take to put your SQL databases under >>>>> version control? Why should you start doing it? Read more to find out. >>>>> >>>>> http://pubads.g.doubleclick.net/gampad/clk?id=49501711&iu=/4140/ostg.clktrk >>>>> >>>>> _______________________________________________ >>>>> Cg-pipeline-users mailing list >>>>> Cg-...@li... >>>>> https://lists.sourceforge.net/lists/listinfo/cg-pipeline-users >>>>> >>>>> >>>> >>>> >>>> -- >>>> Lee Katz, Ph.D. >>>> >>> >>> >> >> >> -- >> Lee Katz, Ph.D. >> >> >> > > > -- > Lee Katz, Ph.D. > -- Lee Katz, Ph.D. |
From: kok <kok...@gm...> - 2013-10-02 03:25:56
|
Sure, wish the government issue get over soon and everything's fine. Thanks. On 1/10/2013 10:11 PM, Lee Katz wrote: > This looks like something that I need to understand further. > Unfortunately I am on furlough and so I am wrapping up everything > right now. Please keep this project folder on-hand though because > this is a very relevant analysis that will help CG-Pipeline in the future. > > > On Sun, Sep 29, 2013 at 11:26 PM, kok <kok...@gm... > <mailto:kok...@gm...>> wrote: > > Dear Lee Katz, > > condition evidence CDS > w/o prodigal 1 6048 > w prodigal 1 8263 > w prodigal 2 6192 > w prodigal 3 5683 > w prodigal 4 875 > > > Table above shows the statistics of the results that I have got > for the test on several evidence value. It seems that evidence 2 > is the one that I would like to proceed for as it returns with the > most CDS which by manual check I can see some are collected from > just prodigal and BLAST evidence that missed from previous CGP > without prodigal included. > > However, I find out that the coordinates of the CDS from BLAST and > prodigal recorded with infinity "complement(1617923..inf)". From > the run_prediction scripts I see that the prodigal is not used for > reconcile prediction which I am not sure whether it's the cause of > the error, but including prodigal for the start prediction during > prediction reconciliation will be good as I have seen cases that > prodigal are doing better with the start prediction compared to > the other predictors. Hope this is easy to be implemented and > thanks for your time. > > Regards, > Kok > > > > On 12/8/2013 7:25 PM, Lee Katz wrote: >> I know! Prodigal is just so easy to use, and so it was really >> easy to make a wrapper around it. >> >> 2/4 might be ok too, but I do not have enough time to perform any >> rigorous tests to see which way is better. If you have time, >> please let me and the community know which gives you better >> results. I think it would be informative to know what 1/4, 2/4, >> 3/4, and 4/4 gives you for each genome. There is an interesting >> table that the Georgia Tech compgenomics class created this year, >> at >> http://compgenomics2013.biology.gatech.edu/index.php/Gene_Prediction_Group#Gene_Prediction_Pipeline. >> >> >> The way to change the minimum number of predictors is to alter >> the variable $$settings{min_predictors_to_call_orf} in >> run_prediction. >> >> Around line 161 in run_prediction, where it says something like >> # Categorize and reconcile predictions >> Set it back to 2 so that you can have 2/4 predictors. >> >> $$settings{min_predictors_to_call_orf}=2; >> >> >> On Sun, Aug 11, 2013 at 11:57 PM, kok wei <kok...@gm... >> <mailto:kok...@gm...>> wrote: >> >> Wow, that's very great and it's faster than planned. I will >> certainly try out the pipeline on my genome and update you >> with the results. I'm thinking of probably having 2/4 >> evidence will be good enough as false positive is preferred >> over false negative for the gene prediction, any opinion? >> >> Thanks for your efforts and helps. >> >> >> >> On Sun, Aug 11, 2013 at 7:35 PM, Lee Katz <ls...@gm... >> <mailto:ls...@gm...>> wrote: >> >> Hi Kok, I added a script run_prediction_prodigal.pl >> <http://run_prediction_prodigal.pl> into source control. >> It outputs a GFF file of CDS predictions. I also made >> sure it outputs the training file to the temporary >> directory because it seems like you are interested in the >> training files. >> >> I also modified run_prediction and the CGPipelineUtils >> module so that it predicts alongside the other >> predictors. Lastly, I added an option >> prediction_use_prodigal = 1 under the config file so that >> you can enable it for run_prediction. With Prodigal, >> each gene must have 2/3 or 3/4 majority to be called >> (depending on whether you use genemark too). >> >> I'm new to prodigal, so please let me know if it all >> looks correct. The command seems simple enough but I >> don't know if there are any idiosyncrasies to be aware of. >> >> >> On Fri, Aug 2, 2013 at 11:32 PM, Gmail >> <kok...@gm... <mailto:kok...@gm...>> wrote: >> >> Thanks Jay and Lee. It will be great if the option is >> added. I like prodigal for their better start >> prediction (from what i get for my test genome) and >> less false prediction for bacterial genome as >> claimed. Looking forward to the update, thanks! >> >> On 02/08/2013, at 23:51, Lee Katz <ls...@gm... >> <mailto:ls...@gm...>> wrote: >> >>> I'm returning to the US and back to work on Aug 12. >>> It sounds like a worthy addition. >>> >>> I like prodigal but never bothered to put it in as >>> an option. I think it could be something optional >>> like genemark and would be preferred if not using >>> genemark. In this way, CGP would still be able to >>> have a majority for gene calling even if you don't >>> have genemark. >>> >>> On Aug 2, 2013, at 15:43, Jay <jhu...@gm... >>> <mailto:jhu...@gm...>> wrote: >>> >>>> As far as I know, there is no convenient way of >>>> doing this. The run_prediction script would have to >>>> be modified to support running it and parsing the >>>> results. >>>> >>>> On 02/08/2013 22:52, kok wrote: >>>>> Is it possible for cg pipeline to include the >>>>> results of other /ab-initio /predictor (eg. >>>>> prodigal)? Is there any development for this >>>>> function? >>>>> Or if I would like to use prodigal in place of >>>>> genemark (if only two predictors allowed), can I >>>>> convert the results of prodigal into genemark-like >>>>> gm_out.lst file for cg pipeline's run_predict as a >>>>> simple modification? >>>>> >>>>> - kok - >>>>> >>>>> >>>>> ------------------------------------------------------------------------------ >>>>> Get your SQL database under version control now! >>>>> Version control is standard for application code, but databases havent >>>>> caught up. So what steps can you take to put your SQL databases under >>>>> version control? Why should you start doing it? Read more to find out. >>>>> http://pubads.g.doubleclick.net/gampad/clk?id=49501711&iu=/4140/ostg.clktrk >>>>> >>>>> >>>>> _______________________________________________ >>>>> Cg-pipeline-users mailing list >>>>> Cg-...@li... <mailto:Cg-...@li...> >>>>> https://lists.sourceforge.net/lists/listinfo/cg-pipeline-users >>>> >>>> ------------------------------------------------------------------------------ >>>> Get your SQL database under version control now! >>>> Version control is standard for application code, >>>> but databases havent >>>> caught up. So what steps can you take to put your >>>> SQL databases under >>>> version control? Why should you start doing it? >>>> Read more to find out. >>>> http://pubads.g.doubleclick.net/gampad/clk?id=49501711&iu=/4140/ostg.clktrk >>>> _______________________________________________ >>>> Cg-pipeline-users mailing list >>>> Cg-...@li... >>>> <mailto:Cg-...@li...> >>>> https://lists.sourceforge.net/lists/listinfo/cg-pipeline-users >> >> >> >> >> -- >> Lee Katz, Ph.D. >> >> >> >> >> >> -- >> Lee Katz, Ph.D. > > > > > -- > Lee Katz, Ph.D. |
From: Lee K. <ls...@gm...> - 2013-10-01 14:12:03
|
This looks like something that I need to understand further. Unfortunately I am on furlough and so I am wrapping up everything right now. Please keep this project folder on-hand though because this is a very relevant analysis that will help CG-Pipeline in the future. On Sun, Sep 29, 2013 at 11:26 PM, kok <kok...@gm...> wrote: > Dear Lee Katz, > > condition evidence CDS w/o prodigal 1 6048 w prodigal 1 8263 w > prodigal 2 6192 w prodigal 3 5683 w prodigal 4 875 > Table above shows the statistics of the results that I have got for the > test on several evidence value. It seems that evidence 2 is the one that I > would like to proceed for as it returns with the most CDS which by manual > check I can see some are collected from just prodigal and BLAST evidence > that missed from previous CGP without prodigal included. > > However, I find out that the coordinates of the CDS from BLAST and > prodigal recorded with infinity "complement(1617923..inf)". From the > run_prediction scripts I see that the prodigal is not used for reconcile > prediction which I am not sure whether it's the cause of the error, but > including prodigal for the start prediction during prediction > reconciliation will be good as I have seen cases that prodigal are doing > better with the start prediction compared to the other predictors. Hope > this is easy to be implemented and thanks for your time. > > Regards, > Kok > > > > On 12/8/2013 7:25 PM, Lee Katz wrote: > > I know! Prodigal is just so easy to use, and so it was really easy to > make a wrapper around it. > > 2/4 might be ok too, but I do not have enough time to perform any > rigorous tests to see which way is better. If you have time, please let me > and the community know which gives you better results. I think it would be > informative to know what 1/4, 2/4, 3/4, and 4/4 gives you for each genome. > There is an interesting table that the Georgia Tech compgenomics class > created this year, at > http://compgenomics2013.biology.gatech.edu/index.php/Gene_Prediction_Group#Gene_Prediction_Pipeline. > > > The way to change the minimum number of predictors is to alter the > variable $$settings{min_predictors_to_call_orf} in run_prediction. > > Around line 161 in run_prediction, where it says something like > # Categorize and reconcile predictions > Set it back to 2 so that you can have 2/4 predictors. > > $$settings{min_predictors_to_call_orf}=2; > > > On Sun, Aug 11, 2013 at 11:57 PM, kok wei <kok...@gm...> wrote: > >> Wow, that's very great and it's faster than planned. I will certainly try >> out the pipeline on my genome and update you with the results. I'm thinking >> of probably having 2/4 evidence will be good enough as false positive is >> preferred over false negative for the gene prediction, any opinion? >> >> Thanks for your efforts and helps. >> >> >> >> On Sun, Aug 11, 2013 at 7:35 PM, Lee Katz <ls...@gm...> wrote: >> >>> Hi Kok, I added a script run_prediction_prodigal.pl into source >>> control. It outputs a GFF file of CDS predictions. I also made sure it >>> outputs the training file to the temporary directory because it seems like >>> you are interested in the training files. >>> >>> I also modified run_prediction and the CGPipelineUtils module so that >>> it predicts alongside the other predictors. Lastly, I added an option >>> prediction_use_prodigal = 1 under the config file so that you can enable it >>> for run_prediction. With Prodigal, each gene must have 2/3 or 3/4 majority >>> to be called (depending on whether you use genemark too). >>> >>> I'm new to prodigal, so please let me know if it all looks correct. >>> The command seems simple enough but I don't know if there are >>> any idiosyncrasies to be aware of. >>> >>> >>> On Fri, Aug 2, 2013 at 11:32 PM, Gmail <kok...@gm...> wrote: >>> >>>> Thanks Jay and Lee. It will be great if the option is added. I like >>>> prodigal for their better start prediction (from what i get for my test >>>> genome) and less false prediction for bacterial genome as claimed. Looking >>>> forward to the update, thanks! >>>> >>>> On 02/08/2013, at 23:51, Lee Katz <ls...@gm...> wrote: >>>> >>>> I'm returning to the US and back to work on Aug 12. It sounds like a >>>> worthy addition. >>>> >>>> I like prodigal but never bothered to put it in as an option. I >>>> think it could be something optional like genemark and would be preferred >>>> if not using genemark. In this way, CGP would still be able to have a >>>> majority for gene calling even if you don't have genemark. >>>> >>>> On Aug 2, 2013, at 15:43, Jay <jhu...@gm...> wrote: >>>> >>>> As far as I know, there is no convenient way of doing this. The >>>> run_prediction script would have to be modified to support running it and >>>> parsing the results. >>>> >>>> On 02/08/2013 22:52, kok wrote: >>>> >>>> Is it possible for cg pipeline to include the results of other *ab-initio >>>> *predictor (eg. prodigal)? Is there any development for this function? >>>> Or if I would like to use prodigal in place of genemark (if only two >>>> predictors allowed), can I convert the results of prodigal into >>>> genemark-like gm_out.lst file for cg pipeline's run_predict as a simple >>>> modification? >>>> >>>> - kok - >>>> >>>> >>>> ------------------------------------------------------------------------------ >>>> Get your SQL database under version control now! >>>> Version control is standard for application code, but databases havent >>>> caught up. So what steps can you take to put your SQL databases under >>>> version control? Why should you start doing it? Read more to find out.http://pubads.g.doubleclick.net/gampad/clk?id=49501711&iu=/4140/ostg.clktrk >>>> >>>> >>>> >>>> _______________________________________________ >>>> Cg-pipeline-users mailing lis...@li...https://lists.sourceforge.net/lists/listinfo/cg-pipeline-users >>>> >>>> >>>> >>>> ------------------------------------------------------------------------------ >>>> Get your SQL database under version control now! >>>> Version control is standard for application code, but databases havent >>>> caught up. So what steps can you take to put your SQL databases under >>>> version control? Why should you start doing it? Read more to find out. >>>> >>>> http://pubads.g.doubleclick.net/gampad/clk?id=49501711&iu=/4140/ostg.clktrk >>>> >>>> _______________________________________________ >>>> Cg-pipeline-users mailing list >>>> Cg-...@li... >>>> https://lists.sourceforge.net/lists/listinfo/cg-pipeline-users >>>> >>>> >>> >>> >>> -- >>> Lee Katz, Ph.D. >>> >> >> > > > -- > Lee Katz, Ph.D. > > > -- Lee Katz, Ph.D. |
From: kok <kok...@gm...> - 2013-09-30 03:26:49
|
Dear Lee Katz, condition evidence CDS w/o prodigal 1 6048 w prodigal 1 8263 w prodigal 2 6192 w prodigal 3 5683 w prodigal 4 875 Table above shows the statistics of the results that I have got for the test on several evidence value. It seems that evidence 2 is the one that I would like to proceed for as it returns with the most CDS which by manual check I can see some are collected from just prodigal and BLAST evidence that missed from previous CGP without prodigal included. However, I find out that the coordinates of the CDS from BLAST and prodigal recorded with infinity "complement(1617923..inf)". From the run_prediction scripts I see that the prodigal is not used for reconcile prediction which I am not sure whether it's the cause of the error, but including prodigal for the start prediction during prediction reconciliation will be good as I have seen cases that prodigal are doing better with the start prediction compared to the other predictors. Hope this is easy to be implemented and thanks for your time. Regards, Kok On 12/8/2013 7:25 PM, Lee Katz wrote: > I know! Prodigal is just so easy to use, and so it was really easy to > make a wrapper around it. > > 2/4 might be ok too, but I do not have enough time to perform any > rigorous tests to see which way is better. If you have time, please > let me and the community know which gives you better results. I think > it would be informative to know what 1/4, 2/4, 3/4, and 4/4 gives you > for each genome. There is an interesting table that the Georgia Tech > compgenomics class created this year, at > http://compgenomics2013.biology.gatech.edu/index.php/Gene_Prediction_Group#Gene_Prediction_Pipeline. > > > The way to change the minimum number of predictors is to alter the > variable $$settings{min_predictors_to_call_orf} in run_prediction. > > Around line 161 in run_prediction, where it says something like > # Categorize and reconcile predictions > Set it back to 2 so that you can have 2/4 predictors. > > $$settings{min_predictors_to_call_orf}=2; > > > On Sun, Aug 11, 2013 at 11:57 PM, kok wei <kok...@gm... > <mailto:kok...@gm...>> wrote: > > Wow, that's very great and it's faster than planned. I will > certainly try out the pipeline on my genome and update you with > the results. I'm thinking of probably having 2/4 evidence will be > good enough as false positive is preferred over false negative for > the gene prediction, any opinion? > > Thanks for your efforts and helps. > > > > On Sun, Aug 11, 2013 at 7:35 PM, Lee Katz <ls...@gm... > <mailto:ls...@gm...>> wrote: > > Hi Kok, I added a script run_prediction_prodigal.pl > <http://run_prediction_prodigal.pl> into source control. It > outputs a GFF file of CDS predictions. I also made sure it > outputs the training file to the temporary directory because > it seems like you are interested in the training files. > > I also modified run_prediction and the CGPipelineUtils module > so that it predicts alongside the other predictors. Lastly, I > added an option prediction_use_prodigal = 1 under the config > file so that you can enable it for run_prediction. With > Prodigal, each gene must have 2/3 or 3/4 majority to be called > (depending on whether you use genemark too). > > I'm new to prodigal, so please let me know if it all looks > correct. The command seems simple enough but I don't know if > there are any idiosyncrasies to be aware of. > > > On Fri, Aug 2, 2013 at 11:32 PM, Gmail <kok...@gm... > <mailto:kok...@gm...>> wrote: > > Thanks Jay and Lee. It will be great if the option is > added. I like prodigal for their better start prediction > (from what i get for my test genome) and less false > prediction for bacterial genome as claimed. Looking > forward to the update, thanks! > > On 02/08/2013, at 23:51, Lee Katz <ls...@gm... > <mailto:ls...@gm...>> wrote: > >> I'm returning to the US and back to work on Aug 12. It >> sounds like a worthy addition. >> >> I like prodigal but never bothered to put it in as an >> option. I think it could be something optional like >> genemark and would be preferred if not using genemark. In >> this way, CGP would still be able to have a majority for >> gene calling even if you don't have genemark. >> >> On Aug 2, 2013, at 15:43, Jay <jhu...@gm... >> <mailto:jhu...@gm...>> wrote: >> >>> As far as I know, there is no convenient way of doing >>> this. The run_prediction script would have to be >>> modified to support running it and parsing the results. >>> >>> On 02/08/2013 22:52, kok wrote: >>>> Is it possible for cg pipeline to include the results >>>> of other /ab-initio /predictor (eg. prodigal)? Is there >>>> any development for this function? >>>> Or if I would like to use prodigal in place of genemark >>>> (if only two predictors allowed), can I convert the >>>> results of prodigal into genemark-like gm_out.lst file >>>> for cg pipeline's run_predict as a simple modification? >>>> >>>> - kok - >>>> >>>> >>>> ------------------------------------------------------------------------------ >>>> Get your SQL database under version control now! >>>> Version control is standard for application code, but databases havent >>>> caught up. So what steps can you take to put your SQL databases under >>>> version control? Why should you start doing it? Read more to find out. >>>> http://pubads.g.doubleclick.net/gampad/clk?id=49501711&iu=/4140/ostg.clktrk >>>> >>>> >>>> _______________________________________________ >>>> Cg-pipeline-users mailing list >>>> Cg-...@li... <mailto:Cg-...@li...> >>>> https://lists.sourceforge.net/lists/listinfo/cg-pipeline-users >>> >>> ------------------------------------------------------------------------------ >>> Get your SQL database under version control now! >>> Version control is standard for application code, but >>> databases havent >>> caught up. So what steps can you take to put your SQL >>> databases under >>> version control? Why should you start doing it? Read >>> more to find out. >>> http://pubads.g.doubleclick.net/gampad/clk?id=49501711&iu=/4140/ostg.clktrk >>> _______________________________________________ >>> Cg-pipeline-users mailing list >>> Cg-...@li... >>> <mailto:Cg-...@li...> >>> https://lists.sourceforge.net/lists/listinfo/cg-pipeline-users > > > > > -- > Lee Katz, Ph.D. > > > > > > -- > Lee Katz, Ph.D. |
From: Lee K. <ls...@gm...> - 2013-08-12 11:25:59
|
I know! Prodigal is just so easy to use, and so it was really easy to make a wrapper around it. 2/4 might be ok too, but I do not have enough time to perform any rigorous tests to see which way is better. If you have time, please let me and the community know which gives you better results. I think it would be informative to know what 1/4, 2/4, 3/4, and 4/4 gives you for each genome. There is an interesting table that the Georgia Tech compgenomics class created this year, at http://compgenomics2013.biology.gatech.edu/index.php/Gene_Prediction_Group#Gene_Prediction_Pipeline. The way to change the minimum number of predictors is to alter the variable $$settings{min_predictors_to_call_orf} in run_prediction. Around line 161 in run_prediction, where it says something like # Categorize and reconcile predictions Set it back to 2 so that you can have 2/4 predictors. $$settings{min_predictors_to_call_orf}=2; On Sun, Aug 11, 2013 at 11:57 PM, kok wei <kok...@gm...> wrote: > Wow, that's very great and it's faster than planned. I will certainly try > out the pipeline on my genome and update you with the results. I'm thinking > of probably having 2/4 evidence will be good enough as false positive is > preferred over false negative for the gene prediction, any opinion? > > Thanks for your efforts and helps. > > > > On Sun, Aug 11, 2013 at 7:35 PM, Lee Katz <ls...@gm...> wrote: > >> Hi Kok, I added a script run_prediction_prodigal.pl into source control. >> It outputs a GFF file of CDS predictions. I also made sure it outputs the >> training file to the temporary directory because it seems like you are >> interested in the training files. >> >> I also modified run_prediction and the CGPipelineUtils module so that it >> predicts alongside the other predictors. Lastly, I added an option >> prediction_use_prodigal = 1 under the config file so that you can enable it >> for run_prediction. With Prodigal, each gene must have 2/3 or 3/4 majority >> to be called (depending on whether you use genemark too). >> >> I'm new to prodigal, so please let me know if it all looks correct. The >> command seems simple enough but I don't know if there are >> any idiosyncrasies to be aware of. >> >> >> On Fri, Aug 2, 2013 at 11:32 PM, Gmail <kok...@gm...> wrote: >> >>> Thanks Jay and Lee. It will be great if the option is added. I like >>> prodigal for their better start prediction (from what i get for my test >>> genome) and less false prediction for bacterial genome as claimed. Looking >>> forward to the update, thanks! >>> >>> On 02/08/2013, at 23:51, Lee Katz <ls...@gm...> wrote: >>> >>> I'm returning to the US and back to work on Aug 12. It sounds like a >>> worthy addition. >>> >>> I like prodigal but never bothered to put it in as an option. I think >>> it could be something optional like genemark and would be preferred if not >>> using genemark. In this way, CGP would still be able to have a majority for >>> gene calling even if you don't have genemark. >>> >>> On Aug 2, 2013, at 15:43, Jay <jhu...@gm...> wrote: >>> >>> As far as I know, there is no convenient way of doing this. The >>> run_prediction script would have to be modified to support running it and >>> parsing the results. >>> >>> On 02/08/2013 22:52, kok wrote: >>> >>> Is it possible for cg pipeline to include the results of other *ab-initio >>> *predictor (eg. prodigal)? Is there any development for this function? >>> Or if I would like to use prodigal in place of genemark (if only two >>> predictors allowed), can I convert the results of prodigal into >>> genemark-like gm_out.lst file for cg pipeline's run_predict as a simple >>> modification? >>> >>> - kok - >>> >>> >>> ------------------------------------------------------------------------------ >>> Get your SQL database under version control now! >>> Version control is standard for application code, but databases havent >>> caught up. So what steps can you take to put your SQL databases under >>> version control? Why should you start doing it? Read more to find out.http://pubads.g.doubleclick.net/gampad/clk?id=49501711&iu=/4140/ostg.clktrk >>> >>> >>> >>> _______________________________________________ >>> Cg-pipeline-users mailing lis...@li...https://lists.sourceforge.net/lists/listinfo/cg-pipeline-users >>> >>> >>> >>> ------------------------------------------------------------------------------ >>> Get your SQL database under version control now! >>> Version control is standard for application code, but databases havent >>> caught up. So what steps can you take to put your SQL databases under >>> version control? Why should you start doing it? Read more to find out. >>> >>> http://pubads.g.doubleclick.net/gampad/clk?id=49501711&iu=/4140/ostg.clktrk >>> >>> _______________________________________________ >>> Cg-pipeline-users mailing list >>> Cg-...@li... >>> https://lists.sourceforge.net/lists/listinfo/cg-pipeline-users >>> >>> >> >> >> -- >> Lee Katz, Ph.D. >> > > -- Lee Katz, Ph.D. |
From: kok w. <kok...@gm...> - 2013-08-12 03:57:46
|
Wow, that's very great and it's faster than planned. I will certainly try out the pipeline on my genome and update you with the results. I'm thinking of probably having 2/4 evidence will be good enough as false positive is preferred over false negative for the gene prediction, any opinion? Thanks for your efforts and helps. On Sun, Aug 11, 2013 at 7:35 PM, Lee Katz <ls...@gm...> wrote: > Hi Kok, I added a script run_prediction_prodigal.pl into source control. > It outputs a GFF file of CDS predictions. I also made sure it outputs the > training file to the temporary directory because it seems like you are > interested in the training files. > > I also modified run_prediction and the CGPipelineUtils module so that it > predicts alongside the other predictors. Lastly, I added an option > prediction_use_prodigal = 1 under the config file so that you can enable it > for run_prediction. With Prodigal, each gene must have 2/3 or 3/4 majority > to be called (depending on whether you use genemark too). > > I'm new to prodigal, so please let me know if it all looks correct. The > command seems simple enough but I don't know if there are > any idiosyncrasies to be aware of. > > > On Fri, Aug 2, 2013 at 11:32 PM, Gmail <kok...@gm...> wrote: > >> Thanks Jay and Lee. It will be great if the option is added. I like >> prodigal for their better start prediction (from what i get for my test >> genome) and less false prediction for bacterial genome as claimed. Looking >> forward to the update, thanks! >> >> On 02/08/2013, at 23:51, Lee Katz <ls...@gm...> wrote: >> >> I'm returning to the US and back to work on Aug 12. It sounds like a >> worthy addition. >> >> I like prodigal but never bothered to put it in as an option. I think it >> could be something optional like genemark and would be preferred if not >> using genemark. In this way, CGP would still be able to have a majority for >> gene calling even if you don't have genemark. >> >> On Aug 2, 2013, at 15:43, Jay <jhu...@gm...> wrote: >> >> As far as I know, there is no convenient way of doing this. The >> run_prediction script would have to be modified to support running it and >> parsing the results. >> >> On 02/08/2013 22:52, kok wrote: >> >> Is it possible for cg pipeline to include the results of other *ab-initio >> *predictor (eg. prodigal)? Is there any development for this function? >> Or if I would like to use prodigal in place of genemark (if only two >> predictors allowed), can I convert the results of prodigal into >> genemark-like gm_out.lst file for cg pipeline's run_predict as a simple >> modification? >> >> - kok - >> >> >> ------------------------------------------------------------------------------ >> Get your SQL database under version control now! >> Version control is standard for application code, but databases havent >> caught up. So what steps can you take to put your SQL databases under >> version control? Why should you start doing it? Read more to find out.http://pubads.g.doubleclick.net/gampad/clk?id=49501711&iu=/4140/ostg.clktrk >> >> >> >> _______________________________________________ >> Cg-pipeline-users mailing lis...@li...https://lists.sourceforge.net/lists/listinfo/cg-pipeline-users >> >> >> >> ------------------------------------------------------------------------------ >> Get your SQL database under version control now! >> Version control is standard for application code, but databases havent >> caught up. So what steps can you take to put your SQL databases under >> version control? Why should you start doing it? Read more to find out. >> >> http://pubads.g.doubleclick.net/gampad/clk?id=49501711&iu=/4140/ostg.clktrk >> >> _______________________________________________ >> Cg-pipeline-users mailing list >> Cg-...@li... >> https://lists.sourceforge.net/lists/listinfo/cg-pipeline-users >> >> > > > -- > Lee Katz, Ph.D. > |
From: Lee K. <ls...@gm...> - 2013-08-12 02:35:37
|
Hi Kok, I added a script run_prediction_prodigal.pl into source control. It outputs a GFF file of CDS predictions. I also made sure it outputs the training file to the temporary directory because it seems like you are interested in the training files. I also modified run_prediction and the CGPipelineUtils module so that it predicts alongside the other predictors. Lastly, I added an option prediction_use_prodigal = 1 under the config file so that you can enable it for run_prediction. With Prodigal, each gene must have 2/3 or 3/4 majority to be called (depending on whether you use genemark too). I'm new to prodigal, so please let me know if it all looks correct. The command seems simple enough but I don't know if there are any idiosyncrasies to be aware of. On Fri, Aug 2, 2013 at 11:32 PM, Gmail <kok...@gm...> wrote: > Thanks Jay and Lee. It will be great if the option is added. I like > prodigal for their better start prediction (from what i get for my test > genome) and less false prediction for bacterial genome as claimed. Looking > forward to the update, thanks! > > On 02/08/2013, at 23:51, Lee Katz <ls...@gm...> wrote: > > I'm returning to the US and back to work on Aug 12. It sounds like a > worthy addition. > > I like prodigal but never bothered to put it in as an option. I think it > could be something optional like genemark and would be preferred if not > using genemark. In this way, CGP would still be able to have a majority for > gene calling even if you don't have genemark. > > On Aug 2, 2013, at 15:43, Jay <jhu...@gm...> wrote: > > As far as I know, there is no convenient way of doing this. The > run_prediction script would have to be modified to support running it and > parsing the results. > > On 02/08/2013 22:52, kok wrote: > > Is it possible for cg pipeline to include the results of other *ab-initio > *predictor (eg. prodigal)? Is there any development for this function? > Or if I would like to use prodigal in place of genemark (if only two > predictors allowed), can I convert the results of prodigal into > genemark-like gm_out.lst file for cg pipeline's run_predict as a simple > modification? > > - kok - > > > ------------------------------------------------------------------------------ > Get your SQL database under version control now! > Version control is standard for application code, but databases havent > caught up. So what steps can you take to put your SQL databases under > version control? Why should you start doing it? Read more to find out.http://pubads.g.doubleclick.net/gampad/clk?id=49501711&iu=/4140/ostg.clktrk > > > > _______________________________________________ > Cg-pipeline-users mailing lis...@li...https://lists.sourceforge.net/lists/listinfo/cg-pipeline-users > > > > ------------------------------------------------------------------------------ > Get your SQL database under version control now! > Version control is standard for application code, but databases havent > caught up. So what steps can you take to put your SQL databases under > version control? Why should you start doing it? Read more to find out. > http://pubads.g.doubleclick.net/gampad/clk?id=49501711&iu=/4140/ostg.clktrk > > _______________________________________________ > Cg-pipeline-users mailing list > Cg-...@li... > https://lists.sourceforge.net/lists/listinfo/cg-pipeline-users > > -- Lee Katz, Ph.D. |
From: Gmail <kok...@gm...> - 2013-08-03 03:33:35
|
Thanks Jay and Lee. It will be great if the option is added. I like prodigal for their better start prediction (from what i get for my test genome) and less false prediction for bacterial genome as claimed. Looking forward to the update, thanks! On 02/08/2013, at 23:51, Lee Katz <ls...@gm...> wrote: > I'm returning to the US and back to work on Aug 12. It sounds like a worthy addition. > > I like prodigal but never bothered to put it in as an option. I think it could be something optional like genemark and would be preferred if not using genemark. In this way, CGP would still be able to have a majority for gene calling even if you don't have genemark. > > On Aug 2, 2013, at 15:43, Jay <jhu...@gm...> wrote: > >> As far as I know, there is no convenient way of doing this. The run_prediction script would have to be modified to support running it and parsing the results. >> >> On 02/08/2013 22:52, kok wrote: >>> Is it possible for cg pipeline to include the results of other ab-initio predictor (eg. prodigal)? Is there any development for this function? >>> Or if I would like to use prodigal in place of genemark (if only two predictors allowed), can I convert the results of prodigal into genemark-like gm_out.lst file for cg pipeline's run_predict as a simple modification? >>> >>> - kok - >>> >>> >>> ------------------------------------------------------------------------------ >>> Get your SQL database under version control now! >>> Version control is standard for application code, but databases havent >>> caught up. So what steps can you take to put your SQL databases under >>> version control? Why should you start doing it? Read more to find out. >>> http://pubads.g.doubleclick.net/gampad/clk?id=49501711&iu=/4140/ostg.clktrk >>> >>> >>> _______________________________________________ >>> Cg-pipeline-users mailing list >>> Cg-...@li... >>> https://lists.sourceforge.net/lists/listinfo/cg-pipeline-users >> >> ------------------------------------------------------------------------------ >> Get your SQL database under version control now! >> Version control is standard for application code, but databases havent >> caught up. So what steps can you take to put your SQL databases under >> version control? Why should you start doing it? Read more to find out. >> http://pubads.g.doubleclick.net/gampad/clk?id=49501711&iu=/4140/ostg.clktrk >> _______________________________________________ >> Cg-pipeline-users mailing list >> Cg-...@li... >> https://lists.sourceforge.net/lists/listinfo/cg-pipeline-users |
From: Lee K. <ls...@gm...> - 2013-08-02 15:52:09
|
I'm returning to the US and back to work on Aug 12. It sounds like a worthy addition. I like prodigal but never bothered to put it in as an option. I think it could be something optional like genemark and would be preferred if not using genemark. In this way, CGP would still be able to have a majority for gene calling even if you don't have genemark. On Aug 2, 2013, at 15:43, Jay <jhu...@gm...> wrote: As far as I know, there is no convenient way of doing this. The run_prediction script would have to be modified to support running it and parsing the results. On 02/08/2013 22:52, kok wrote: Is it possible for cg pipeline to include the results of other *ab-initio *predictor (eg. prodigal)? Is there any development for this function? Or if I would like to use prodigal in place of genemark (if only two predictors allowed), can I convert the results of prodigal into genemark-like gm_out.lst file for cg pipeline's run_predict as a simple modification? - kok - ------------------------------------------------------------------------------ Get your SQL database under version control now! Version control is standard for application code, but databases havent caught up. So what steps can you take to put your SQL databases under version control? Why should you start doing it? Read more to find out.http://pubads.g.doubleclick.net/gampad/clk?id=49501711&iu=/4140/ostg.clktrk _______________________________________________ Cg-pipeline-users mailing lis...@li...https://lists.sourceforge.net/lists/listinfo/cg-pipeline-users ------------------------------------------------------------------------------ Get your SQL database under version control now! Version control is standard for application code, but databases havent caught up. So what steps can you take to put your SQL databases under version control? Why should you start doing it? Read more to find out. http://pubads.g.doubleclick.net/gampad/clk?id=49501711&iu=/4140/ostg.clktrk _______________________________________________ Cg-pipeline-users mailing list Cg-...@li... https://lists.sourceforge.net/lists/listinfo/cg-pipeline-users |
From: Jay <jhu...@gm...> - 2013-08-02 13:43:15
|
As far as I know, there is no convenient way of doing this. The run_prediction script would have to be modified to support running it and parsing the results. On 02/08/2013 22:52, kok wrote: > Is it possible for cg pipeline to include the results of other > /ab-initio /predictor (eg. prodigal)? Is there any development for > this function? > Or if I would like to use prodigal in place of genemark (if only two > predictors allowed), can I convert the results of prodigal into > genemark-like gm_out.lst file for cg pipeline's run_predict as a > simple modification? > > - kok - > > > ------------------------------------------------------------------------------ > Get your SQL database under version control now! > Version control is standard for application code, but databases havent > caught up. So what steps can you take to put your SQL databases under > version control? Why should you start doing it? Read more to find out. > http://pubads.g.doubleclick.net/gampad/clk?id=49501711&iu=/4140/ostg.clktrk > > > _______________________________________________ > Cg-pipeline-users mailing list > Cg-...@li... > https://lists.sourceforge.net/lists/listinfo/cg-pipeline-users |
From: kok <kok...@gm...> - 2013-08-02 06:53:14
|
Is it possible for cg pipeline to include the results of other /ab-initio /predictor (eg. prodigal)? Is there any development for this function? Or if I would like to use prodigal in place of genemark (if only two predictors allowed), can I convert the results of prodigal into genemark-like gm_out.lst file for cg pipeline's run_predict as a simple modification? - kok - |
From: kok <kok...@gm...> - 2013-08-02 06:19:45
|
Hi Jay, Thanks for pointing me to the location, didn't notice about the existence of the page before. On 1/8/2013 10:22 AM, Jay Humphrey wrote: > Hi Kok, > Lee means he modified the source code in the script run_prediction, so > you can download this version here: > http://sourceforge.net/p/cg-pipeline/code/583/tree/cg_pipeline/branches/lkatz/scripts/run_prediction > -Jay > > On 8/1/2013 4:15 AM, kok wrote: >> Just noticing this, sorry I don't quite get the meaning of source >> control here. Is that means you tried removing the assembly linker >> and the prediction run without giving you any problems? >> >> On 22/7/2013 11:56 AM, Lee Katz wrote: >>> Hi, just to let you know, I think that this is fixed in the source >>> control now. I fixed it last week, and so far it hasn't given me >>> any problems. >>> >>> >>> On Tue, Jul 16, 2013 at 11:10 AM, Gmail <kok...@gm... >>> <mailto:kok...@gm...>> wrote: >>> >>> Sure, thanks for reminding. >>> >>> On 16/07/2013, at 20:54, Lee Katz <ls...@gm... >>> <mailto:ls...@gm...>> wrote: >>> >>>> Oh, one more thing: remember to clear out >>>> project/build/prediction before running it again, so that it >>>> doesn't read from those files. >>>> rm -rf build/prediction/* #but keep the prediction directory there >>>> >>>> >>>> On Tue, Jul 16, 2013 at 8:46 AM, kokwei <kok...@gm... >>>> <mailto:kok...@gm...>> wrote: >>>> >>>> Thanks for your reply. >>>> >>>> I understand the needs to add in the assembly linker as it >>>> helps to truncate the genes at the contig end instead of >>>> mistakenly extended into previous contig during >>>> pseudochromosome generation (I am not sure CGP do generate >>>> pseudomolecule or not though). >>>> >>>> I will try with the empty string to see how it goes and >>>> patiently wait for better features in CGP v0.5 by you. Thanks! >>>> >>>> >>>> On 16/7/2013 8:16 PM, Lee Katz wrote: >>>>> Thank you for pointing that out. I might not have caught >>>>> that on my own. I added the linker sequence so that BLAST >>>>> results would be more accurate but didn't think about how >>>>> it might affect GeneMark or Glimmer. I need to find a >>>>> happy medium, but in the meantime, CGP v0.3 does not have >>>>> this linker and you can try that for now. >>>>> >>>>> Also, but untested, is that you can alter this line so >>>>> that the linker is an empty string in run_prediction >>>>> >>>>> $$settings{assembly_linker} ||= >>>>> "NNNNNCACACACTTAATTAATTAAGTGTGTGNNNNN"; >>>>> >>>>> >>>>> >>>>> On Tue, Jul 16, 2013 at 4:20 AM, kokwei >>>>> <kok...@gm... <mailto:kok...@gm...>> wrote: >>>>> >>>>> >>>>> Hello, >>>>> >>>>> I notice that for cg pipeline, assembly linker used >>>>> for joining the >>>>> contigs/scaffolds during prediction is included in the >>>>> gene model >>>>> instead of just forming as a linker (I found VXX as 5' >>>>> and XXHT as 3' of >>>>> some gene models). All the coordinates of gene models >>>>> also have an >>>>> offset of 36 bp due to the inclusion of the linker at >>>>> the 5' end of the >>>>> assembly sequence. >>>>> Is it possible to remove those linker from the gene >>>>> model during >>>>> pipeline execution and output correct location? Thanks. >>>>> >>>>> >>>>> ------------------------------------------------------------------------------ >>>>> See everything from the browser to the database with >>>>> AppDynamics >>>>> Get end-to-end visibility with application monitoring >>>>> from AppDynamics >>>>> Isolate bottlenecks and diagnose root cause in seconds. >>>>> Start your free trial of AppDynamics Pro today! >>>>> http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk >>>>> _______________________________________________ >>>>> Cg-pipeline-users mailing list >>>>> Cg-...@li... >>>>> <mailto:Cg-...@li...> >>>>> https://lists.sourceforge.net/lists/listinfo/cg-pipeline-users >>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> Lee Katz, Ph.D. >>>> >>>> >>>> >>>> >>>> -- >>>> Lee Katz, Ph.D. >>> >>> >>> >>> >>> -- >>> Lee Katz, Ph.D. >> >> >> >> ------------------------------------------------------------------------------ >> Get your SQL database under version control now! >> Version control is standard for application code, but databases havent >> caught up. So what steps can you take to put your SQL databases under >> version control? Why should you start doing it? Read more to find out. >> http://pubads.g.doubleclick.net/gampad/clk?id=49501711&iu=/4140/ostg.clktrk >> >> >> _______________________________________________ >> Cg-pipeline-users mailing list >> Cg-...@li... >> https://lists.sourceforge.net/lists/listinfo/cg-pipeline-users > |
From: Jay H. <jhu...@gm...> - 2013-08-01 17:22:21
|
Hi Kok, Lee means he modified the source code in the script run_prediction, so you can download this version here: http://sourceforge.net/p/cg-pipeline/code/583/tree/cg_pipeline/branches/lkatz/scripts/run_prediction -Jay On 8/1/2013 4:15 AM, kok wrote: > Just noticing this, sorry I don't quite get the meaning of source > control here. Is that means you tried removing the assembly linker and > the prediction run without giving you any problems? > > On 22/7/2013 11:56 AM, Lee Katz wrote: >> Hi, just to let you know, I think that this is fixed in the source >> control now. I fixed it last week, and so far it hasn't given me any >> problems. >> >> >> On Tue, Jul 16, 2013 at 11:10 AM, Gmail <kok...@gm... >> <mailto:kok...@gm...>> wrote: >> >> Sure, thanks for reminding. >> >> On 16/07/2013, at 20:54, Lee Katz <ls...@gm... >> <mailto:ls...@gm...>> wrote: >> >>> Oh, one more thing: remember to clear out >>> project/build/prediction before running it again, so that it >>> doesn't read from those files. >>> rm -rf build/prediction/* #but keep the prediction directory there >>> >>> >>> On Tue, Jul 16, 2013 at 8:46 AM, kokwei <kok...@gm... >>> <mailto:kok...@gm...>> wrote: >>> >>> Thanks for your reply. >>> >>> I understand the needs to add in the assembly linker as it >>> helps to truncate the genes at the contig end instead of >>> mistakenly extended into previous contig during >>> pseudochromosome generation (I am not sure CGP do generate >>> pseudomolecule or not though). >>> >>> I will try with the empty string to see how it goes and >>> patiently wait for better features in CGP v0.5 by you. Thanks! >>> >>> >>> On 16/7/2013 8:16 PM, Lee Katz wrote: >>>> Thank you for pointing that out. I might not have caught >>>> that on my own. I added the linker sequence so that BLAST >>>> results would be more accurate but didn't think about how >>>> it might affect GeneMark or Glimmer. I need to find a >>>> happy medium, but in the meantime, CGP v0.3 does not have >>>> this linker and you can try that for now. >>>> >>>> Also, but untested, is that you can alter this line so that >>>> the linker is an empty string in run_prediction >>>> >>>> $$settings{assembly_linker} ||= >>>> "NNNNNCACACACTTAATTAATTAAGTGTGTGNNNNN"; >>>> >>>> >>>> >>>> On Tue, Jul 16, 2013 at 4:20 AM, kokwei <kok...@gm... >>>> <mailto:kok...@gm...>> wrote: >>>> >>>> >>>> Hello, >>>> >>>> I notice that for cg pipeline, assembly linker used for >>>> joining the >>>> contigs/scaffolds during prediction is included in the >>>> gene model >>>> instead of just forming as a linker (I found VXX as 5' >>>> and XXHT as 3' of >>>> some gene models). All the coordinates of gene models >>>> also have an >>>> offset of 36 bp due to the inclusion of the linker at >>>> the 5' end of the >>>> assembly sequence. >>>> Is it possible to remove those linker from the gene >>>> model during >>>> pipeline execution and output correct location? Thanks. >>>> >>>> >>>> ------------------------------------------------------------------------------ >>>> See everything from the browser to the database with >>>> AppDynamics >>>> Get end-to-end visibility with application monitoring >>>> from AppDynamics >>>> Isolate bottlenecks and diagnose root cause in seconds. >>>> Start your free trial of AppDynamics Pro today! >>>> http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk >>>> _______________________________________________ >>>> Cg-pipeline-users mailing list >>>> Cg-...@li... >>>> <mailto:Cg-...@li...> >>>> https://lists.sourceforge.net/lists/listinfo/cg-pipeline-users >>>> >>>> >>>> >>>> >>>> -- >>>> Lee Katz, Ph.D. >>> >>> >>> >>> >>> -- >>> Lee Katz, Ph.D. >> >> >> >> >> -- >> Lee Katz, Ph.D. > > > > ------------------------------------------------------------------------------ > Get your SQL database under version control now! > Version control is standard for application code, but databases havent > caught up. So what steps can you take to put your SQL databases under > version control? Why should you start doing it? Read more to find out. > http://pubads.g.doubleclick.net/gampad/clk?id=49501711&iu=/4140/ostg.clktrk > > > _______________________________________________ > Cg-pipeline-users mailing list > Cg-...@li... > https://lists.sourceforge.net/lists/listinfo/cg-pipeline-users |
From: kok <kok...@gm...> - 2013-07-31 12:16:17
|
Just noticing this, sorry I don't quite get the meaning of source control here. Is that means you tried removing the assembly linker and the prediction run without giving you any problems? On 22/7/2013 11:56 AM, Lee Katz wrote: > Hi, just to let you know, I think that this is fixed in the source > control now. I fixed it last week, and so far it hasn't given me any > problems. > > > On Tue, Jul 16, 2013 at 11:10 AM, Gmail <kok...@gm... > <mailto:kok...@gm...>> wrote: > > Sure, thanks for reminding. > > On 16/07/2013, at 20:54, Lee Katz <ls...@gm... > <mailto:ls...@gm...>> wrote: > >> Oh, one more thing: remember to clear out >> project/build/prediction before running it again, so that it >> doesn't read from those files. >> rm -rf build/prediction/* #but keep the prediction directory there >> >> >> On Tue, Jul 16, 2013 at 8:46 AM, kokwei <kok...@gm... >> <mailto:kok...@gm...>> wrote: >> >> Thanks for your reply. >> >> I understand the needs to add in the assembly linker as it >> helps to truncate the genes at the contig end instead of >> mistakenly extended into previous contig during >> pseudochromosome generation (I am not sure CGP do generate >> pseudomolecule or not though). >> >> I will try with the empty string to see how it goes and >> patiently wait for better features in CGP v0.5 by you. Thanks! >> >> >> On 16/7/2013 8:16 PM, Lee Katz wrote: >>> Thank you for pointing that out. I might not have caught >>> that on my own. I added the linker sequence so that BLAST >>> results would be more accurate but didn't think about how it >>> might affect GeneMark or Glimmer. I need to find a happy >>> medium, but in the meantime, CGP v0.3 does not have this >>> linker and you can try that for now. >>> >>> Also, but untested, is that you can alter this line so that >>> the linker is an empty string in run_prediction >>> >>> $$settings{assembly_linker} ||= >>> "NNNNNCACACACTTAATTAATTAAGTGTGTGNNNNN"; >>> >>> >>> >>> On Tue, Jul 16, 2013 at 4:20 AM, kokwei <kok...@gm... >>> <mailto:kok...@gm...>> wrote: >>> >>> >>> Hello, >>> >>> I notice that for cg pipeline, assembly linker used for >>> joining the >>> contigs/scaffolds during prediction is included in the >>> gene model >>> instead of just forming as a linker (I found VXX as 5' >>> and XXHT as 3' of >>> some gene models). All the coordinates of gene models >>> also have an >>> offset of 36 bp due to the inclusion of the linker at >>> the 5' end of the >>> assembly sequence. >>> Is it possible to remove those linker from the gene >>> model during >>> pipeline execution and output correct location? Thanks. >>> >>> >>> ------------------------------------------------------------------------------ >>> See everything from the browser to the database with >>> AppDynamics >>> Get end-to-end visibility with application monitoring >>> from AppDynamics >>> Isolate bottlenecks and diagnose root cause in seconds. >>> Start your free trial of AppDynamics Pro today! >>> http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk >>> _______________________________________________ >>> Cg-pipeline-users mailing list >>> Cg-...@li... >>> <mailto:Cg-...@li...> >>> https://lists.sourceforge.net/lists/listinfo/cg-pipeline-users >>> >>> >>> >>> >>> -- >>> Lee Katz, Ph.D. >> >> >> >> >> -- >> Lee Katz, Ph.D. > > > > > -- > Lee Katz, Ph.D. |
From: Lee K. <ls...@gm...> - 2013-07-22 18:56:58
|
Hi, just to let you know, I think that this is fixed in the source control now. I fixed it last week, and so far it hasn't given me any problems. On Tue, Jul 16, 2013 at 11:10 AM, Gmail <kok...@gm...> wrote: > Sure, thanks for reminding. > > On 16/07/2013, at 20:54, Lee Katz <ls...@gm...> wrote: > > Oh, one more thing: remember to clear out project/build/prediction before > running it again, so that it doesn't read from those files. > rm -rf build/prediction/* #but keep the prediction directory there > > > On Tue, Jul 16, 2013 at 8:46 AM, kokwei <kok...@gm...> wrote: > >> Thanks for your reply. >> >> I understand the needs to add in the assembly linker as it helps to >> truncate the genes at the contig end instead of mistakenly extended into >> previous contig during pseudochromosome generation (I am not sure CGP do >> generate pseudomolecule or not though). >> >> I will try with the empty string to see how it goes and patiently wait >> for better features in CGP v0.5 by you. Thanks! >> >> >> On 16/7/2013 8:16 PM, Lee Katz wrote: >> >> Thank you for pointing that out. I might not have caught that on my own. >> I added the linker sequence so that BLAST results would be more accurate >> but didn't think about how it might affect GeneMark or Glimmer. I need to >> find a happy medium, but in the meantime, CGP v0.3 does not have this >> linker and you can try that for now. >> >> Also, but untested, is that you can alter this line so that the linker >> is an empty string in run_prediction >> >> $$settings{assembly_linker} ||= "NNNNNCACACACTTAATTAATTAAGTGTGTGNNNNN"; >> >> >> >> On Tue, Jul 16, 2013 at 4:20 AM, kokwei <kok...@gm...> wrote: >> >>> >>> Hello, >>> >>> I notice that for cg pipeline, assembly linker used for joining the >>> contigs/scaffolds during prediction is included in the gene model >>> instead of just forming as a linker (I found VXX as 5' and XXHT as 3' of >>> some gene models). All the coordinates of gene models also have an >>> offset of 36 bp due to the inclusion of the linker at the 5' end of the >>> assembly sequence. >>> Is it possible to remove those linker from the gene model during >>> pipeline execution and output correct location? Thanks. >>> >>> >>> >>> ------------------------------------------------------------------------------ >>> See everything from the browser to the database with AppDynamics >>> Get end-to-end visibility with application monitoring from AppDynamics >>> Isolate bottlenecks and diagnose root cause in seconds. >>> Start your free trial of AppDynamics Pro today! >>> >>> http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk >>> _______________________________________________ >>> Cg-pipeline-users mailing list >>> Cg-...@li... >>> https://lists.sourceforge.net/lists/listinfo/cg-pipeline-users >>> >> >> >> >> -- >> Lee Katz, Ph.D. >> >> >> > > > -- > Lee Katz, Ph.D. > > -- Lee Katz, Ph.D. |
From: Gmail <kok...@gm...> - 2013-07-16 15:10:58
|
Sure, thanks for reminding. On 16/07/2013, at 20:54, Lee Katz <ls...@gm...> wrote: > Oh, one more thing: remember to clear out project/build/prediction before running it again, so that it doesn't read from those files. > rm -rf build/prediction/* #but keep the prediction directory there > > > On Tue, Jul 16, 2013 at 8:46 AM, kokwei <kok...@gm...> wrote: >> Thanks for your reply. >> >> I understand the needs to add in the assembly linker as it helps to truncate the genes at the contig end instead of mistakenly extended into previous contig during pseudochromosome generation (I am not sure CGP do generate pseudomolecule or not though). >> >> I will try with the empty string to see how it goes and patiently wait for better features in CGP v0.5 by you. Thanks! >> >> >> On 16/7/2013 8:16 PM, Lee Katz wrote: >>> Thank you for pointing that out. I might not have caught that on my own. I added the linker sequence so that BLAST results would be more accurate but didn't think about how it might affect GeneMark or Glimmer. I need to find a happy medium, but in the meantime, CGP v0.3 does not have this linker and you can try that for now. >>> >>> Also, but untested, is that you can alter this line so that the linker is an empty string in run_prediction >>> >>> $$settings{assembly_linker} ||= "NNNNNCACACACTTAATTAATTAAGTGTGTGNNNNN"; >>> >>> >>> >>> On Tue, Jul 16, 2013 at 4:20 AM, kokwei <kok...@gm...> wrote: >>>> >>>> Hello, >>>> >>>> I notice that for cg pipeline, assembly linker used for joining the >>>> contigs/scaffolds during prediction is included in the gene model >>>> instead of just forming as a linker (I found VXX as 5' and XXHT as 3' of >>>> some gene models). All the coordinates of gene models also have an >>>> offset of 36 bp due to the inclusion of the linker at the 5' end of the >>>> assembly sequence. >>>> Is it possible to remove those linker from the gene model during >>>> pipeline execution and output correct location? Thanks. >>>> >>>> >>>> ------------------------------------------------------------------------------ >>>> See everything from the browser to the database with AppDynamics >>>> Get end-to-end visibility with application monitoring from AppDynamics >>>> Isolate bottlenecks and diagnose root cause in seconds. >>>> Start your free trial of AppDynamics Pro today! >>>> http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk >>>> _______________________________________________ >>>> Cg-pipeline-users mailing list >>>> Cg-...@li... >>>> https://lists.sourceforge.net/lists/listinfo/cg-pipeline-users >>> >>> >>> >>> -- >>> Lee Katz, Ph.D. > > > > -- > Lee Katz, Ph.D. |
From: Lee K. <ls...@gm...> - 2013-07-16 12:55:07
|
Oh, one more thing: remember to clear out project/build/prediction before running it again, so that it doesn't read from those files. rm -rf build/prediction/* #but keep the prediction directory there On Tue, Jul 16, 2013 at 8:46 AM, kokwei <kok...@gm...> wrote: > Thanks for your reply. > > I understand the needs to add in the assembly linker as it helps to > truncate the genes at the contig end instead of mistakenly extended into > previous contig during pseudochromosome generation (I am not sure CGP do > generate pseudomolecule or not though). > > I will try with the empty string to see how it goes and patiently wait for > better features in CGP v0.5 by you. Thanks! > > > On 16/7/2013 8:16 PM, Lee Katz wrote: > > Thank you for pointing that out. I might not have caught that on my own. > I added the linker sequence so that BLAST results would be more accurate > but didn't think about how it might affect GeneMark or Glimmer. I need to > find a happy medium, but in the meantime, CGP v0.3 does not have this > linker and you can try that for now. > > Also, but untested, is that you can alter this line so that the linker > is an empty string in run_prediction > > $$settings{assembly_linker} ||= "NNNNNCACACACTTAATTAATTAAGTGTGTGNNNNN"; > > > > On Tue, Jul 16, 2013 at 4:20 AM, kokwei <kok...@gm...> wrote: > >> >> Hello, >> >> I notice that for cg pipeline, assembly linker used for joining the >> contigs/scaffolds during prediction is included in the gene model >> instead of just forming as a linker (I found VXX as 5' and XXHT as 3' of >> some gene models). All the coordinates of gene models also have an >> offset of 36 bp due to the inclusion of the linker at the 5' end of the >> assembly sequence. >> Is it possible to remove those linker from the gene model during >> pipeline execution and output correct location? Thanks. >> >> >> >> ------------------------------------------------------------------------------ >> See everything from the browser to the database with AppDynamics >> Get end-to-end visibility with application monitoring from AppDynamics >> Isolate bottlenecks and diagnose root cause in seconds. >> Start your free trial of AppDynamics Pro today! >> >> http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk >> _______________________________________________ >> Cg-pipeline-users mailing list >> Cg-...@li... >> https://lists.sourceforge.net/lists/listinfo/cg-pipeline-users >> > > > > -- > Lee Katz, Ph.D. > > > -- Lee Katz, Ph.D. |
From: kokwei <kok...@gm...> - 2013-07-16 12:46:17
|
Thanks for your reply. I understand the needs to add in the assembly linker as it helps to truncate the genes at the contig end instead of mistakenly extended into previous contig during pseudochromosome generation (I am not sure CGP do generate pseudomolecule or not though). I will try with the empty string to see how it goes and patiently wait for better features in CGP v0.5 by you. Thanks! On 16/7/2013 8:16 PM, Lee Katz wrote: > Thank you for pointing that out. I might not have caught that on my > own. I added the linker sequence so that BLAST results would be more > accurate but didn't think about how it might affect GeneMark or > Glimmer. I need to find a happy medium, but in the meantime, CGP v0.3 > does not have this linker and you can try that for now. > > Also, but untested, is that you can alter this line so that the linker > is an empty string in run_prediction > > $$settings{assembly_linker} ||= "NNNNNCACACACTTAATTAATTAAGTGTGTGNNNNN"; > > > > On Tue, Jul 16, 2013 at 4:20 AM, kokwei <kok...@gm... > <mailto:kok...@gm...>> wrote: > > > Hello, > > I notice that for cg pipeline, assembly linker used for joining the > contigs/scaffolds during prediction is included in the gene model > instead of just forming as a linker (I found VXX as 5' and XXHT as > 3' of > some gene models). All the coordinates of gene models also have an > offset of 36 bp due to the inclusion of the linker at the 5' end > of the > assembly sequence. > Is it possible to remove those linker from the gene model during > pipeline execution and output correct location? Thanks. > > > ------------------------------------------------------------------------------ > See everything from the browser to the database with AppDynamics > Get end-to-end visibility with application monitoring from AppDynamics > Isolate bottlenecks and diagnose root cause in seconds. > Start your free trial of AppDynamics Pro today! > http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk > _______________________________________________ > Cg-pipeline-users mailing list > Cg-...@li... > <mailto:Cg-...@li...> > https://lists.sourceforge.net/lists/listinfo/cg-pipeline-users > > > > > -- > Lee Katz, Ph.D. |
From: Lee K. <ls...@gm...> - 2013-07-16 12:17:05
|
Thank you for pointing that out. I might not have caught that on my own. I added the linker sequence so that BLAST results would be more accurate but didn't think about how it might affect GeneMark or Glimmer. I need to find a happy medium, but in the meantime, CGP v0.3 does not have this linker and you can try that for now. Also, but untested, is that you can alter this line so that the linker is an empty string in run_prediction $$settings{assembly_linker} ||= "NNNNNCACACACTTAATTAATTAAGTGTGTGNNNNN"; On Tue, Jul 16, 2013 at 4:20 AM, kokwei <kok...@gm...> wrote: > > Hello, > > I notice that for cg pipeline, assembly linker used for joining the > contigs/scaffolds during prediction is included in the gene model > instead of just forming as a linker (I found VXX as 5' and XXHT as 3' of > some gene models). All the coordinates of gene models also have an > offset of 36 bp due to the inclusion of the linker at the 5' end of the > assembly sequence. > Is it possible to remove those linker from the gene model during > pipeline execution and output correct location? Thanks. > > > > ------------------------------------------------------------------------------ > See everything from the browser to the database with AppDynamics > Get end-to-end visibility with application monitoring from AppDynamics > Isolate bottlenecks and diagnose root cause in seconds. > Start your free trial of AppDynamics Pro today! > http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk > _______________________________________________ > Cg-pipeline-users mailing list > Cg-...@li... > https://lists.sourceforge.net/lists/listinfo/cg-pipeline-users > -- Lee Katz, Ph.D. |
From: kokwei <kok...@gm...> - 2013-07-16 08:20:58
|
Hello, I notice that for cg pipeline, assembly linker used for joining the contigs/scaffolds during prediction is included in the gene model instead of just forming as a linker (I found VXX as 5' and XXHT as 3' of some gene models). All the coordinates of gene models also have an offset of 36 bp due to the inclusion of the linker at the 5' end of the assembly sequence. Is it possible to remove those linker from the gene model during pipeline execution and output correct location? Thanks. |
From: Lee K. <ls...@gm...> - 2013-07-12 14:18:25
|
Hi everyone! I have been actively working on CG-Pipeline with a little help from my colleagues, and it's obvious that there is a need for documentation. I have been hosting it on my personal site, which is not very professional, and so I finally migrated it over to the sourceforge site. Please check it out, and I appreciate any constructive feedback you may have. If you find any typos or missing links, or other minor things, please aggregate those thoughts and send them to me in bulk. https://sourceforge.net/apps/mediawiki/cg-pipeline Thank you everyone for using CGP! -- Lee Katz, Ph.D. |
From: Lee K. <ls...@gm...> - 2013-07-02 16:23:17
|
No problem! Good luck! On Tue, Jul 2, 2013 at 11:16 AM, Gmail <kok...@gm...> wrote: > Okay, probably it's the incompleteness of my draft genome that caused the > predicted start site to be downstream from the true start site. Thanks and > your replies are highly appreciated. > > > On 02/07/2013, at 23:07, Lee Katz <ls...@gm...> wrote: > > Sure. When developing the strategy for gene prediction, we decided to > accept the longest orf possible. Therefore, we always capture the whole > gene (if the locus is correctly identified as a gene because there are > undoubtedly some false positives), but the 5' region might be a false > positive. In other words, a downstream start codon might be the true start > in CGP-predicted genes. I believe in our publication, we correctly > identified genes 95% of the time, but we predicted the correct start 85% of > the time (and the true start site would have been downstream in 15% of the > predictions). > > > On Tue, Jul 2, 2013 at 5:14 AM, kokwei <kok...@gm...> wrote: > >> Thanks for your information. I am thinking the reference proteins I >> used as database are conserved in their gene model, so hoping that with the >> blast result against database will predict the "real start" (with >> reference) instead of based on ab initio which might have false prediction >> for start codon. With the current protein blast alignment, it will pinned >> to the closest start codon upstream, but the "real start" might be even >> more upstream. >> >> Can I have your opinion on this based on your experience with the >> optimization? Thanks. >> >> >> On 2/7/2013 3:36 AM, Lee Katz wrote: >> >> It looks like you are correct. Over the last couple of years, I >> optimized a few different things and I guess I obviated that subroutine! >> >> Yes, run_prediction will run GeneMark and Glimmer3, and combine with >> blast, making a high-confidence set of genes. To see a graphical >> representation and explanation of our combining strategy, please see our >> 2010 paper: >> http://bioinformatics.oxfordjournals.org/content/26/15/1819.full >> >> >> On Mon, Jul 1, 2013 at 12:49 PM, kokwei <kok...@gm...> wrote: >> >>> >>> Hi, >>> >>> I noticed that during execution of pipeline with command <run_pipeline >>> predict -p project>, the module "sub blastSeqs($$$) {" at line 1817 is >>> not executed. Is that normal or only happen in my case? Attached the log >>> of the prediction for your reference. >>> >>> I guess that module will do the blast of the contig/scaffolds sequences >>> against the database selected and output the homology based gene model >>> to combine with other ab initio gene model by glimmer and genemark, >>> which is what I need and hope that this can be solved. >>> >>> Thanks. >>> >>> >>> >>> >>> >>> >> >> >> -- >> Lee Katz, Ph.D. >> >> >> > > > -- > Lee Katz, Ph.D. > > -- Lee Katz, Ph.D. |
From: Gmail <kok...@gm...> - 2013-07-02 15:16:23
|
Okay, probably it's the incompleteness of my draft genome that caused the predicted start site to be downstream from the true start site. Thanks and your replies are highly appreciated. On 02/07/2013, at 23:07, Lee Katz <ls...@gm...> wrote: > Sure. When developing the strategy for gene prediction, we decided to accept the longest orf possible. Therefore, we always capture the whole gene (if the locus is correctly identified as a gene because there are undoubtedly some false positives), but the 5' region might be a false positive. In other words, a downstream start codon might be the true start in CGP-predicted genes. I believe in our publication, we correctly identified genes 95% of the time, but we predicted the correct start 85% of the time (and the true start site would have been downstream in 15% of the predictions). > > > On Tue, Jul 2, 2013 at 5:14 AM, kokwei <kok...@gm...> wrote: >> Thanks for your information. I am thinking the reference proteins I used as database are conserved in their gene model, so hoping that with the blast result against database will predict the "real start" (with reference) instead of based on ab initio which might have false prediction for start codon. With the current protein blast alignment, it will pinned to the closest start codon upstream, but the "real start" might be even more upstream. >> >> Can I have your opinion on this based on your experience with the optimization? Thanks. >> >> >> On 2/7/2013 3:36 AM, Lee Katz wrote: >>> It looks like you are correct. Over the last couple of years, I optimized a few different things and I guess I obviated that subroutine! >>> >>> Yes, run_prediction will run GeneMark and Glimmer3, and combine with blast, making a high-confidence set of genes. To see a graphical representation and explanation of our combining strategy, please see our 2010 paper: http://bioinformatics.oxfordjournals.org/content/26/15/1819.full >>> >>> >>> On Mon, Jul 1, 2013 at 12:49 PM, kokwei <kok...@gm...> wrote: >>>> >>>> Hi, >>>> >>>> I noticed that during execution of pipeline with command <run_pipeline >>>> predict -p project>, the module "sub blastSeqs($$$) {" at line 1817 is >>>> not executed. Is that normal or only happen in my case? Attached the log >>>> of the prediction for your reference. >>>> >>>> I guess that module will do the blast of the contig/scaffolds sequences >>>> against the database selected and output the homology based gene model >>>> to combine with other ab initio gene model by glimmer and genemark, >>>> which is what I need and hope that this can be solved. >>>> >>>> Thanks. >>>> >>>> >>>> >>>> >>>> >>> >>> >>> >>> -- >>> Lee Katz, Ph.D. >> > > > > -- > Lee Katz, Ph.D. |