From: Hilmar L. <hl...@ne...> - 2010-03-17 14:05:17
|
Bill & Youjun - there are currently 5 mission-critical bugs listed in the tracker as open. Two of these are assigned to Youjun. Youjun - can you update us where they stand. Specifically, will you be able to resolve the PhyloWidget- related bug, which seems like a show-stopper for release. Rutger - where do the consistency tests stand (#2899240). Vladimir is going to try to run those which exist, but I'm not sure about the coverage - is it enough to give us any confidence? There are two priority 9 bugs which are unassigned. Youjun - would you be able to take them on and resolve today? I am going to try to schedule testing by appointment with a few NESCent scholars this afternoon. Can we have a conference call before noon to review our status, for example at 11am EDT? (We are scheduled to flip the switch tomorrow for public release.) -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- informatics.nescent.org : =========================================================== |
From: Vladimir G. <vla...@du...> - 2010-03-17 14:29:28
|
On Mar 17, 2010, at 10:05 AM, Hilmar Lapp wrote: > Rutger - where do the consistency tests stand (#2899240). Vladimir is > going to try to run those which exist, but I'm not sure about the > coverage - is it enough to give us any confidence? To add, these are the only things I detected that I guess might have relevance to data consistency checking: treebase-core/src/main/perl/bin/check treebase-core/src/main/perl/check/check treebase-core/src/main/perl/lib/CIPRES/TreeBase |
From: William P. <wil...@ya...> - 2010-03-17 15:16:16
|
On Mar 17, 2010, at 10:05 AM, Hilmar Lapp wrote: > Can we have a conference call before noon to review our status, for example at 11am EDT? Whoops. We just got your email and it's already after 11. We can all in at 11:30, for example. bp |
From: Vladimir G. <vla...@du...> - 2010-03-17 20:43:25
|
On Mar 17, 2010, at 10:29 AM, Vladimir Gapeyev wrote: > On Mar 17, 2010, at 10:05 AM, Hilmar Lapp wrote: > >> Rutger - where do the consistency tests stand (#2899240). Vladimir is >> going to try to run those which exist, but I'm not sure about the >> coverage - is it enough to give us any confidence? > > To add, these are the only things I detected that I guess might have > relevance to data consistency checking: > treebase-core/src/main/perl/bin/check > treebase-core/src/main/perl/check/check > treebase-core/src/main/perl/lib/CIPRES/TreeBase Here is what I got. The two check scripts are actually the same. The only thing I could get out of them is printing out contents of an object specified by its class/table name and an ID. There is another script, perl/bin/gc. The wiki description for it is "Garbage collector, prints out orphaned objects (e.g. trees without studies), presumably candidates for deletion." A few excerpts from its printout are below -- I am not sure how to interpret them. Anyone in the know, please point me in the correct direction. --Vladimir [vg34@treebasedb-dev ConsistencyChecks]$ perl/bin/gc Database contains 5392 Analysis items Database contains 5397 AnalysisStep items Database contains 12378 AnalyzedData items Database contains 4579 Matrix items Database contains 236604 MatrixRow items Database contains 6613 PhyloTree items Database contains 557909 PhyloTreeNode items Database contains 2454 Study items Database contains 168318 TaxonLabel items S127 8/8 S1801 3/3 S71 2/2 S1648 2/2 S1481 2/2 S10309 4/4 S10122 2/2 S1178 4/4 ..... // I suspect it prints out *all* the studies * Analysis 4762 * Analysis 4764 * Analysis 4821 * Analysis 4842 * AnalysisStep 4821 * Matrix 181 * Matrix 182 * Matrix 183 * Matrix 184 * Matrix 185 * Matrix 186 * Matrix 355 * Matrix 367 * Matrix 990 * Matrix 992 * Matrix 993 * Matrix 994 * Matrix 997 * Matrix 998 * Matrix 999 * Matrix 1000 * Matrix 1001 * Matrix 1617 * Matrix 1618 * Matrix 1903 * Matrix 2146 * Matrix 3702 * Matrix 4070 * Matrix 4110 * Matrix 4130 * Matrix 4150 * Matrix 4227 * Matrix 4280 * Matrix 4456 * Matrix 4528 * Matrix 4778 * Matrix 4893 * MatrixRow 4091 * MatrixRow 4092 * MatrixRow 4093 * MatrixRow 4094 * MatrixRow 4095 * MatrixRow 4096 * MatrixRow 4097 .... //Are these the orphans? These are all Analyses and Matrices from the output, but I skip most MatrixRows, as there are many * MatrixRow 234956 * MatrixRow 234957 * MatrixRow 234958 * MatrixRow 234959 * PhyloTree 85 * PhyloTree 86 * PhyloTree 88 * PhyloTree 181 .... /// It prints out a lot of PhyloTrees, likely all of them * PhyloTree 6978 * PhyloTree 6979 * PhyloTree 6980 * PhyloTree 6981 * PhyloTreeNode 76327 * PhyloTreeNode 76328 * PhyloTreeNode 76329 * PhyloTreeNode 76330 * PhyloTreeNode 76331 * PhyloTreeNode 76332 ..... * PhyloTreeNode 76488 * PhyloTreeNode 76489 * PhyloTreeNode 76490 * PhyloTreeNode 153706 //a sharp jump * PhyloTreeNode 153707 * PhyloTreeNode 153708 * PhyloTreeNode 153709 * PhyloTreeNode 153710 ..... * PhyloTreeNode 559205 * PhyloTreeNode 559206 * PhyloTreeNode 559207 * TaxonLabel 1288 * TaxonLabel 1289 * TaxonLabel 1290 * TaxonLabel 1291 * TaxonLabel 1292 ....... * TaxonLabel 276777 * TaxonLabel 276778 * TaxonLabel 276779 * TaxonLabel 276780 * TaxonLabel 276781 |
From: Rutger V. <rut...@gm...> - 2010-03-18 11:31:01
|
Hi all, sorry about the late response. Here's how it works, (to the extent that I've managed to understand MJD's code): there is a "check" script. This script needs two arguments: a table name (out of which MJD's code creates a perl ORM object) and an ID in that table. The script then tries to construct the logically expected subtended object hierarchy starting from the focal object. Anything unexpected is written two STDERR. The most useful way to use this is to say "check Study $studyID". What I've done in the past is to dump all study IDs to a file "STUDIES", and then running the following shell script: #!/bin/bash studies=`cat STUDIES` for study in $studies; do check Study $study 2> $study.err logfilesize=`wc -l $study.err | cut -f1 -d' '` if [[ $logfilesize > 0 ]] then gzip -9 $study.err else rm $study.err fi done This will create a $studyID.gz file for every inconsistent study. On closer examination of these, most inconsistencies lead back to only a handful of problems, mostly related to incomplete repatriation of objects from dummy study 22 to their destination study. It's therefore more informative to bin the inconsistencies by category as opposed to by study. For this, MJD has written a "digester" script. Assuming you have a directory full of gzipped study reports, you can then run the following shell script to categorize the reports: #!/bin/bash zips=`ls *.gz` for zip in $zips; do gunzip $zip base=`echo $zip | sed -e 's/\.gz//'` dir=`echo $base | sed -e 's/\.err//'` grep '\*' $base | digester -d $dir gzip -9 $base cd $dir logs=`ls *` for log in $logs; do cat $log >> ../$log done cd ../ done This will create files such as "tree_references_tls_but_its_no", which lists the PhyloTree objects that reference TaxonLabelSet X, whereas some of its nodes reference a TaxonLabel that is in TaxonLabelSet Y. In all these cases, X is still linked to Study 22 (so not repatriated correctly) while the individual labels and their Y are in the right place. By the way, the "gc" script is to be ignored. The idea was that this would be a garbage collector that could automatically figure out all inconsistencies and fix them. MJD never quite completed it and/or worked up the confidence and courage to let it loose on a live database. Hope this helps, Rutger On Wed, Mar 17, 2010 at 8:43 PM, Vladimir Gapeyev <vla...@du...> wrote: > > On Mar 17, 2010, at 10:29 AM, Vladimir Gapeyev wrote: > >> On Mar 17, 2010, at 10:05 AM, Hilmar Lapp wrote: >> >>> Rutger - where do the consistency tests stand (#2899240). Vladimir is >>> going to try to run those which exist, but I'm not sure about the >>> coverage - is it enough to give us any confidence? >> >> To add, these are the only things I detected that I guess might have >> relevance to data consistency checking: >> treebase-core/src/main/perl/bin/check >> treebase-core/src/main/perl/check/check >> treebase-core/src/main/perl/lib/CIPRES/TreeBase > > > Here is what I got. > > The two check scripts are actually the same. The only thing I could > get out of them is printing out contents of an object specified by its > class/table name and an ID. > > There is another script, perl/bin/gc. The wiki description for it is > "Garbage collector, prints out orphaned objects (e.g. trees without > studies), presumably candidates for deletion." A few excerpts from > its printout are below -- I am not sure how to interpret them. > > Anyone in the know, please point me in the correct direction. > > --Vladimir > > > [vg34@treebasedb-dev ConsistencyChecks]$ perl/bin/gc > Database contains 5392 Analysis items > Database contains 5397 AnalysisStep items > Database contains 12378 AnalyzedData items > Database contains 4579 Matrix items > Database contains 236604 MatrixRow items > Database contains 6613 PhyloTree items > Database contains 557909 PhyloTreeNode items > Database contains 2454 Study items > Database contains 168318 TaxonLabel items > S127 8/8 > S1801 3/3 > S71 2/2 > S1648 2/2 > S1481 2/2 > S10309 4/4 > S10122 2/2 > S1178 4/4 > ..... // I suspect it prints out *all* the studies > * Analysis 4762 > * Analysis 4764 > * Analysis 4821 > * Analysis 4842 > * AnalysisStep 4821 > * Matrix 181 > * Matrix 182 > * Matrix 183 > * Matrix 184 > * Matrix 185 > * Matrix 186 > * Matrix 355 > * Matrix 367 > * Matrix 990 > * Matrix 992 > * Matrix 993 > * Matrix 994 > * Matrix 997 > * Matrix 998 > * Matrix 999 > * Matrix 1000 > * Matrix 1001 > * Matrix 1617 > * Matrix 1618 > * Matrix 1903 > * Matrix 2146 > * Matrix 3702 > * Matrix 4070 > * Matrix 4110 > * Matrix 4130 > * Matrix 4150 > * Matrix 4227 > * Matrix 4280 > * Matrix 4456 > * Matrix 4528 > * Matrix 4778 > * Matrix 4893 > * MatrixRow 4091 > * MatrixRow 4092 > * MatrixRow 4093 > * MatrixRow 4094 > * MatrixRow 4095 > * MatrixRow 4096 > * MatrixRow 4097 > .... //Are these the orphans? These are all Analyses and Matrices > from the output, but I skip most MatrixRows, as there are many > * MatrixRow 234956 > * MatrixRow 234957 > * MatrixRow 234958 > * MatrixRow 234959 > * PhyloTree 85 > * PhyloTree 86 > * PhyloTree 88 > * PhyloTree 181 > .... /// It prints out a lot of PhyloTrees, likely all of them > * PhyloTree 6978 > * PhyloTree 6979 > * PhyloTree 6980 > * PhyloTree 6981 > * PhyloTreeNode 76327 > * PhyloTreeNode 76328 > * PhyloTreeNode 76329 > * PhyloTreeNode 76330 > * PhyloTreeNode 76331 > * PhyloTreeNode 76332 > ..... > * PhyloTreeNode 76488 > * PhyloTreeNode 76489 > * PhyloTreeNode 76490 > * PhyloTreeNode 153706 //a sharp jump > * PhyloTreeNode 153707 > * PhyloTreeNode 153708 > * PhyloTreeNode 153709 > * PhyloTreeNode 153710 > ..... > * PhyloTreeNode 559205 > * PhyloTreeNode 559206 > * PhyloTreeNode 559207 > * TaxonLabel 1288 > * TaxonLabel 1289 > * TaxonLabel 1290 > * TaxonLabel 1291 > * TaxonLabel 1292 > ....... > * TaxonLabel 276777 > * TaxonLabel 276778 > * TaxonLabel 276779 > * TaxonLabel 276780 > * TaxonLabel 276781 > > > > > ------------------------------------------------------------------------------ > Download Intel® Parallel Studio Eval > Try the new software tools for yourself. Speed compiling, find bugs > proactively, and fine-tune applications for parallel performance. > See why Intel Parallel Studio got high marks during beta. > http://p.sf.net/sfu/intel-sw-dev > _______________________________________________ > Treebase-devel mailing list > Tre...@li... > https://lists.sourceforge.net/lists/listinfo/treebase-devel > -- Dr. Rutger A. Vos School of Biological Sciences Philip Lyle Building, Level 4 University of Reading Reading RG6 6BX United Kingdom Tel: +44 (0) 118 378 7535 http://www.nexml.org http://rutgervos.blogspot.com |
From: Hilmar L. <hl...@ne...> - 2010-03-18 14:28:25
|
Thanks Rutger! Vladimir - can you make sure these scripts and Rutger's documentation below get committed to svn? -hilmar On Mar 18, 2010, at 7:30 AM, Rutger Vos wrote: > Hi all, > > sorry about the late response. Here's how it works, (to the extent > that I've managed to understand MJD's code): there is a "check" > script. This script needs two arguments: a table name (out of which > MJD's code creates a perl ORM object) and an ID in that table. The > script then tries to construct the logically expected subtended object > hierarchy starting from the focal object. Anything unexpected is > written two STDERR. The most useful way to use this is to say "check > Study $studyID". What I've done in the past is to dump all study IDs > to a file "STUDIES", and then running the following shell script: > > #!/bin/bash > studies=`cat STUDIES` > for study in $studies; do > check Study $study 2> $study.err > logfilesize=`wc -l $study.err | cut -f1 -d' '` > if [[ $logfilesize > 0 ]] > then > gzip -9 $study.err > else > rm $study.err > fi > done > > This will create a $studyID.gz file for every inconsistent study. On > closer examination of these, most inconsistencies lead back to only a > handful of problems, mostly related to incomplete repatriation of > objects from dummy study 22 to their destination study. It's therefore > more informative to bin the inconsistencies by category as opposed to > by study. For this, MJD has written a "digester" script. Assuming you > have a directory full of gzipped study reports, you can then run the > following shell script to categorize the reports: > > #!/bin/bash > zips=`ls *.gz` > for zip in $zips; do > gunzip $zip > base=`echo $zip | sed -e 's/\.gz//'` > dir=`echo $base | sed -e 's/\.err//'` > grep '\*' $base | digester -d $dir > gzip -9 $base > cd $dir > logs=`ls *` > for log in $logs; do > cat $log >> ../$log > done > cd ../ > done > > This will create files such as "tree_references_tls_but_its_no", which > lists the PhyloTree objects that reference TaxonLabelSet X, whereas > some of its nodes reference a TaxonLabel that is in TaxonLabelSet Y. > In all these cases, X is still linked to Study 22 (so not repatriated > correctly) while the individual labels and their Y are in the right > place. > > By the way, the "gc" script is to be ignored. The idea was that this > would be a garbage collector that could automatically figure out all > inconsistencies and fix them. MJD never quite completed it and/or > worked up the confidence and courage to let it loose on a live > database. > > Hope this helps, > > Rutger > > On Wed, Mar 17, 2010 at 8:43 PM, Vladimir Gapeyev > <vla...@du...> wrote: >> >> On Mar 17, 2010, at 10:29 AM, Vladimir Gapeyev wrote: >> >>> On Mar 17, 2010, at 10:05 AM, Hilmar Lapp wrote: >>> >>>> Rutger - where do the consistency tests stand (#2899240). >>>> Vladimir is >>>> going to try to run those which exist, but I'm not sure about the >>>> coverage - is it enough to give us any confidence? >>> >>> To add, these are the only things I detected that I guess might have >>> relevance to data consistency checking: >>> treebase-core/src/main/perl/bin/check >>> treebase-core/src/main/perl/check/check >>> treebase-core/src/main/perl/lib/CIPRES/TreeBase >> >> >> Here is what I got. >> >> The two check scripts are actually the same. The only thing I could >> get out of them is printing out contents of an object specified by >> its >> class/table name and an ID. >> >> There is another script, perl/bin/gc. The wiki description for it is >> "Garbage collector, prints out orphaned objects (e.g. trees without >> studies), presumably candidates for deletion." A few excerpts from >> its printout are below -- I am not sure how to interpret them. >> >> Anyone in the know, please point me in the correct direction. >> >> --Vladimir >> >> >> [vg34@treebasedb-dev ConsistencyChecks]$ perl/bin/gc >> Database contains 5392 Analysis items >> Database contains 5397 AnalysisStep items >> Database contains 12378 AnalyzedData items >> Database contains 4579 Matrix items >> Database contains 236604 MatrixRow items >> Database contains 6613 PhyloTree items >> Database contains 557909 PhyloTreeNode items >> Database contains 2454 Study items >> Database contains 168318 TaxonLabel items >> S127 8/8 >> S1801 3/3 >> S71 2/2 >> S1648 2/2 >> S1481 2/2 >> S10309 4/4 >> S10122 2/2 >> S1178 4/4 >> ..... // I suspect it prints out *all* the studies >> * Analysis 4762 >> * Analysis 4764 >> * Analysis 4821 >> * Analysis 4842 >> * AnalysisStep 4821 >> * Matrix 181 >> * Matrix 182 >> * Matrix 183 >> * Matrix 184 >> * Matrix 185 >> * Matrix 186 >> * Matrix 355 >> * Matrix 367 >> * Matrix 990 >> * Matrix 992 >> * Matrix 993 >> * Matrix 994 >> * Matrix 997 >> * Matrix 998 >> * Matrix 999 >> * Matrix 1000 >> * Matrix 1001 >> * Matrix 1617 >> * Matrix 1618 >> * Matrix 1903 >> * Matrix 2146 >> * Matrix 3702 >> * Matrix 4070 >> * Matrix 4110 >> * Matrix 4130 >> * Matrix 4150 >> * Matrix 4227 >> * Matrix 4280 >> * Matrix 4456 >> * Matrix 4528 >> * Matrix 4778 >> * Matrix 4893 >> * MatrixRow 4091 >> * MatrixRow 4092 >> * MatrixRow 4093 >> * MatrixRow 4094 >> * MatrixRow 4095 >> * MatrixRow 4096 >> * MatrixRow 4097 >> .... //Are these the orphans? These are all Analyses and >> Matrices >> from the output, but I skip most MatrixRows, as there are many >> * MatrixRow 234956 >> * MatrixRow 234957 >> * MatrixRow 234958 >> * MatrixRow 234959 >> * PhyloTree 85 >> * PhyloTree 86 >> * PhyloTree 88 >> * PhyloTree 181 >> .... /// It prints out a lot of PhyloTrees, likely all of them >> * PhyloTree 6978 >> * PhyloTree 6979 >> * PhyloTree 6980 >> * PhyloTree 6981 >> * PhyloTreeNode 76327 >> * PhyloTreeNode 76328 >> * PhyloTreeNode 76329 >> * PhyloTreeNode 76330 >> * PhyloTreeNode 76331 >> * PhyloTreeNode 76332 >> ..... >> * PhyloTreeNode 76488 >> * PhyloTreeNode 76489 >> * PhyloTreeNode 76490 >> * PhyloTreeNode 153706 //a sharp jump >> * PhyloTreeNode 153707 >> * PhyloTreeNode 153708 >> * PhyloTreeNode 153709 >> * PhyloTreeNode 153710 >> ..... >> * PhyloTreeNode 559205 >> * PhyloTreeNode 559206 >> * PhyloTreeNode 559207 >> * TaxonLabel 1288 >> * TaxonLabel 1289 >> * TaxonLabel 1290 >> * TaxonLabel 1291 >> * TaxonLabel 1292 >> ....... >> * TaxonLabel 276777 >> * TaxonLabel 276778 >> * TaxonLabel 276779 >> * TaxonLabel 276780 >> * TaxonLabel 276781 >> >> >> >> >> ------------------------------------------------------------------------------ >> Download Intel® Parallel Studio Eval >> Try the new software tools for yourself. Speed compiling, find bugs >> proactively, and fine-tune applications for parallel performance. >> See why Intel Parallel Studio got high marks during beta. >> http://p.sf.net/sfu/intel-sw-dev >> _______________________________________________ >> Treebase-devel mailing list >> Tre...@li... >> https://lists.sourceforge.net/lists/listinfo/treebase-devel >> > > > > -- > Dr. Rutger A. Vos > School of Biological Sciences > Philip Lyle Building, Level 4 > University of Reading > Reading > RG6 6BX > United Kingdom > Tel: +44 (0) 118 378 7535 > http://www.nexml.org > http://rutgervos.blogspot.com > > ------------------------------------------------------------------------------ > Download Intel® Parallel Studio Eval > Try the new software tools for yourself. Speed compiling, find bugs > proactively, and fine-tune applications for parallel performance. > See why Intel Parallel Studio got high marks during beta. > http://p.sf.net/sfu/intel-sw-dev > _______________________________________________ > Treebase-devel mailing list > Tre...@li... > https://lists.sourceforge.net/lists/listinfo/treebase-devel -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- informatics.nescent.org : =========================================================== |