[Treebase-guts] SF.net SVN: treebase:[676] trunk/treebase-core/src/main/perl/check/ README_consiste

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Revision: 676
          http://treebase.svn.sourceforge.net/treebase/?rev=676&view=rev
Author:   vgapeyev
Date:     2010-04-05 19:40:55 +0000 (Mon, 05 Apr 2010)

Log Message:
-----------
Rutger's email on how to use 'check' and 'digester' scripts. 

Added Paths:
-----------
    trunk/treebase-core/src/main/perl/check/README_consistency_checks.txt

Added: trunk/treebase-core/src/main/perl/check/README_consistency_checks.txt
===================================================================

--- trunk/treebase-core/src/main/perl/check/README_consistency_checks.txt	                        (rev 0)
+++ trunk/treebase-core/src/main/perl/check/README_consistency_checks.txt	2010-04-05 19:40:55 UTC (rev 676)
@@ -0,0 +1,73 @@
+	From: 	rut...@gm...
+	Subject: 	Re: [Treebase-devel] Consistency tests...
+	Date: 	March 18, 2010 7:30:58 AM EDT
+	To: 	vla...@du...
+	Cc: 	tre...@li...
+	
+Hi all,
+
+sorry about the late response. Here's how it works, (to the extent
+that I've managed to understand MJD's code): there is a "check"
+script. This script needs two arguments: a table name (out of which
+MJD's code creates a perl ORM object) and an ID in that table. The
+script then tries to construct the logically expected subtended object
+hierarchy starting from the focal object. Anything unexpected is
+written two STDERR. The most useful way to use this is to say "check
+Study $studyID". What I've done in the past is to dump all study IDs
+to a file "STUDIES", and then running the following shell script:
+
+#!/bin/bash
+studies=`cat STUDIES`
+for study in $studies; do
+	check Study $study 2> $study.err
+	logfilesize=`wc -l $study.err | cut -f1 -d' '`
+	if [[ $logfilesize > 0 ]]
+	then
+		gzip -9 $study.err
+	else
+		rm $study.err
+	fi
+done
+
+This will create a $studyID.gz file for every inconsistent study. On
+closer examination of these, most inconsistencies lead back to only a
+handful of problems, mostly related to incomplete repatriation of
+objects from dummy study 22 to their destination study. It's therefore
+more informative to bin the inconsistencies by category as opposed to
+by study. For this, MJD has written a "digester" script. Assuming you
+have a directory full of gzipped study reports, you can then run the
+following shell script to categorize the reports:
+
+#!/bin/bash
+zips=`ls *.gz`
+for zip in $zips; do
+	gunzip $zip
+	base=`echo $zip | sed -e 's/\.gz//'`
+	dir=`echo $base | sed -e 's/\.err//'`
+	grep '\*' $base | digester -d $dir
+	gzip -9 $base
+	cd $dir
+	logs=`ls *`
+	for log in $logs; do
+		cat $log >> ../$log
+	done
+	cd ../
+done
+
+This will create files such as "tree_references_tls_but_its_no", which
+lists the PhyloTree objects that reference TaxonLabelSet X, whereas
+some of its nodes reference a TaxonLabel that is in TaxonLabelSet Y.
+In all these cases, X is still linked to Study 22 (so not repatriated
+correctly) while the individual labels and their Y are in the right
+place.
+
+By the way, the "gc" script is to be ignored. The idea was that this
+would be a garbage collector that could automatically figure out all
+inconsistencies and fix them. MJD never quite completed it and/or
+worked up the confidence and courage to let it loose on a live
+database.
+
+Hope this helps,
+
+Rutger
+	
\ No newline at end of file


This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site.




[Treebase-guts] SF.net SVN: treebase:[676] trunk/treebase-core/src/main/perl/check/ README_consiste

[Treebase-guts] SF.net SVN: treebase:[676] trunk/treebase-core/src/main/perl/check/ README_consistency_checks.txt