From: Vladimir G. <vla...@du...> - 2010-03-17 20:43:25
|
On Mar 17, 2010, at 10:29 AM, Vladimir Gapeyev wrote: > On Mar 17, 2010, at 10:05 AM, Hilmar Lapp wrote: > >> Rutger - where do the consistency tests stand (#2899240). Vladimir is >> going to try to run those which exist, but I'm not sure about the >> coverage - is it enough to give us any confidence? > > To add, these are the only things I detected that I guess might have > relevance to data consistency checking: > treebase-core/src/main/perl/bin/check > treebase-core/src/main/perl/check/check > treebase-core/src/main/perl/lib/CIPRES/TreeBase Here is what I got. The two check scripts are actually the same. The only thing I could get out of them is printing out contents of an object specified by its class/table name and an ID. There is another script, perl/bin/gc. The wiki description for it is "Garbage collector, prints out orphaned objects (e.g. trees without studies), presumably candidates for deletion." A few excerpts from its printout are below -- I am not sure how to interpret them. Anyone in the know, please point me in the correct direction. --Vladimir [vg34@treebasedb-dev ConsistencyChecks]$ perl/bin/gc Database contains 5392 Analysis items Database contains 5397 AnalysisStep items Database contains 12378 AnalyzedData items Database contains 4579 Matrix items Database contains 236604 MatrixRow items Database contains 6613 PhyloTree items Database contains 557909 PhyloTreeNode items Database contains 2454 Study items Database contains 168318 TaxonLabel items S127 8/8 S1801 3/3 S71 2/2 S1648 2/2 S1481 2/2 S10309 4/4 S10122 2/2 S1178 4/4 ..... // I suspect it prints out *all* the studies * Analysis 4762 * Analysis 4764 * Analysis 4821 * Analysis 4842 * AnalysisStep 4821 * Matrix 181 * Matrix 182 * Matrix 183 * Matrix 184 * Matrix 185 * Matrix 186 * Matrix 355 * Matrix 367 * Matrix 990 * Matrix 992 * Matrix 993 * Matrix 994 * Matrix 997 * Matrix 998 * Matrix 999 * Matrix 1000 * Matrix 1001 * Matrix 1617 * Matrix 1618 * Matrix 1903 * Matrix 2146 * Matrix 3702 * Matrix 4070 * Matrix 4110 * Matrix 4130 * Matrix 4150 * Matrix 4227 * Matrix 4280 * Matrix 4456 * Matrix 4528 * Matrix 4778 * Matrix 4893 * MatrixRow 4091 * MatrixRow 4092 * MatrixRow 4093 * MatrixRow 4094 * MatrixRow 4095 * MatrixRow 4096 * MatrixRow 4097 .... //Are these the orphans? These are all Analyses and Matrices from the output, but I skip most MatrixRows, as there are many * MatrixRow 234956 * MatrixRow 234957 * MatrixRow 234958 * MatrixRow 234959 * PhyloTree 85 * PhyloTree 86 * PhyloTree 88 * PhyloTree 181 .... /// It prints out a lot of PhyloTrees, likely all of them * PhyloTree 6978 * PhyloTree 6979 * PhyloTree 6980 * PhyloTree 6981 * PhyloTreeNode 76327 * PhyloTreeNode 76328 * PhyloTreeNode 76329 * PhyloTreeNode 76330 * PhyloTreeNode 76331 * PhyloTreeNode 76332 ..... * PhyloTreeNode 76488 * PhyloTreeNode 76489 * PhyloTreeNode 76490 * PhyloTreeNode 153706 //a sharp jump * PhyloTreeNode 153707 * PhyloTreeNode 153708 * PhyloTreeNode 153709 * PhyloTreeNode 153710 ..... * PhyloTreeNode 559205 * PhyloTreeNode 559206 * PhyloTreeNode 559207 * TaxonLabel 1288 * TaxonLabel 1289 * TaxonLabel 1290 * TaxonLabel 1291 * TaxonLabel 1292 ....... * TaxonLabel 276777 * TaxonLabel 276778 * TaxonLabel 276779 * TaxonLabel 276780 * TaxonLabel 276781 |