You can subscribe to this list here.
2009 |
Jan
|
Feb
|
Mar
(1) |
Apr
(41) |
May
(41) |
Jun
(50) |
Jul
(14) |
Aug
(21) |
Sep
(37) |
Oct
(8) |
Nov
(4) |
Dec
(135) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2010 |
Jan
(145) |
Feb
(110) |
Mar
(216) |
Apr
(101) |
May
(42) |
Jun
(42) |
Jul
(23) |
Aug
(17) |
Sep
(33) |
Oct
(15) |
Nov
(18) |
Dec
(6) |
2011 |
Jan
(8) |
Feb
(10) |
Mar
(8) |
Apr
(41) |
May
(48) |
Jun
(62) |
Jul
(7) |
Aug
(9) |
Sep
(7) |
Oct
(11) |
Nov
(49) |
Dec
(1) |
2012 |
Jan
(17) |
Feb
(63) |
Mar
(4) |
Apr
(13) |
May
(17) |
Jun
(21) |
Jul
(10) |
Aug
(10) |
Sep
|
Oct
|
Nov
|
Dec
(16) |
2013 |
Jan
(10) |
Feb
|
Mar
(1) |
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2014 |
Jan
(5) |
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(5) |
Nov
|
Dec
|
2015 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(1) |
Nov
|
Dec
|
From: William P. <wil...@ya...> - 2010-03-19 15:11:29
|
On Mar 19, 2010, at 10:34 AM, Hilmar Lapp wrote: > A machine will not benefit from this, and I'm not sure how the human user would. Our current Web UI only explicitly advertises the purl-phylows URI to the submitter and only for the study_id. This means that some people, seeking to link to other objects in TreeBASE, will be using right-click-copy as a lazy shortcut instead of reading the documentation. To prevent proliferation of synonymous /phyows/ links getting out there, we decided to use purl.org links wherever /phylows/ links are present. So the lazy "human user" benefits because right-click-copy is a quick way for them to bookmark objects in TreeBASE without having to edit the copied link to comply with our purl.org standard. bp |
From: Rutger V. <rut...@gm...> - 2010-03-19 15:04:41
|
> The question is less clear whether they should be used throughout by the > application wherever it hyperlinks within the UI to a representation (in > HTML, NeXML, or RDF) of a data item. A machine will not benefit from this, > and I'm not sure how the human user would. My worry is that, by placing ugly webapp download links in the public GUI, the Rod Pages of this world will copy & paste those, distribute them, build hacks around them, etc. To discourage that, anything that looks like a context-free link to a resource (as opposed to something reasonably associated with search or submission) should be a purl, in my opinion. |
From: William P. <wil...@ya...> - 2010-03-19 15:02:11
|
On Mar 19, 2010, at 10:42 AM, Ryan Scherle wrote: > IMHO, the Purl should be used anywhere you can reasonably expect a user to copy-and-paste a URL. Definitely in the citations. The Purl should at least appear on each page that a user might link to. The only real "Havoc from purls" comes from people using treebase-stage and treebase-dev. One solution is to have new purl.org/phylo/treebase entries for these instance too (e.g. purl.org/phylo/treebase-stage and purl.org/phylo/treebase-dev) and then for each build, use a config file to enter which purl to use for that particular build. bp |
From: Ryan S. <rsc...@ne...> - 2010-03-19 14:43:04
|
> (3) Havoc from purls. Since sometime recently, *some* links that are crucial for the functionality of the application contain purls instead of urls pointing to the current application instance. IMHO, the Purl should be used anywhere you can reasonably expect a user to copy-and-paste a URL. Definitely in the citations. The Purl should at least appear on each page that a user might link to. One tactic we use with Dryad is to prominently display which server is in use (for example, compare the logo at top-left for http://datadryad.org and http://demo.datadryad.org ). It doesn't keep you from bouncing between servers, but at least you have an idea where you are. A better solution is to replace the Purls with some other URL on the non-production servers, but this adds some complexity to the code. --- Ryan |
From: Hilmar L. <hl...@ne...> - 2010-03-19 14:34:32
|
The PURLs need to be thought of as the canonical, globally unique, and resolvable (yes!) identifier for data items in TreeBASE. The way to look at them I think should be as if they were DOIs, or Handles, or LSIDs. We are dispensing GUIDs for data items in TreeBASE, and rather than DOIs or Handles they happen to be HTTP URIs that resolve. This is the Right Thing (tm) from a SemWeb and Linked Data perspective. A GUID for a data item should resolve to one and only one location, so that they resolve to production is also the Right Thing, and in line with the behavior of DOIs. Publishers cannot test their article DOIs solely on development systems either: DOIs always resolve to their production sites. Dryad faces the same problem. What's important here is not to use canonical GUIDs frivolously. They must be advertised to the user in appropriate locations, and they must be used consistently in output that we emit to be consumed by machines. The question is less clear whether they should be used throughout by the application wherever it hyperlinks within the UI to a representation (in HTML, NeXML, or RDF) of a data item. A machine will not benefit from this, and I'm not sure how the human user would. Publishers don't link to article downloads through the DOI. Dryad does link through the DOI or Handle to data files that are part of a data package. However, that's primarily because the UI for this is just an HTML rendering of the data package's metadata, and the IDs of the constituent data files are their Handles (or DOIs). Bottom line from my perspective: 1) we shouldn't confuse data object GUID HTTP URIs with URLs that the application uses to allow a human user to navigate the web-app; 2) regardless we need to be able to test our HTTP URIs that serve as GUIDs. Re: #1, even if currently we overuse canonical GUIDs in the UI, I don't think we need to address this before release. Re: #2, we already have a configuration parameter to control the base URL for the GUIDs. Eventually I would like us to have our own PURL server at http://purl.treebase.org , but we're not going to accomplish that today or shortly. So instead I created the following alternative base URLs, with obvious redirection: http://purl.org/phylo/treebase/dev/phylows/ http://purl.org/phylo/treebase/stage/phylows/ -hilmar On Mar 19, 2010, at 3:47 AM, Rutger Vos wrote: >> (3) Havoc from purls. Since sometime recently, *some* links that >> are crucial for the functionality of the application contain purls >> instead of urls pointing to the current application instance. Due >> to the current resolution of the purls, this makes all instances to >> eventually drift to the treebase.nescent.org instance. >> For example, in search results > Matrices tab: the columns ID, >> Download NeXML, Download RDF, and Download Reconstructed, and >> Matrix Row List contain purl links (Wrong!), while the columns >> Taxa and Download Original have links to the current instance >> (Right!) Interestingly, in the Trees tab all the similarly >> appearing links point to the current instance. (Right!) > > I implemented this. Since the re-configuration of tomcat to run > multiple instances of the webapp some links had been pointing to > localhost. After discussion on this list it was decided that it would > be OK to have these links be purls instead, and I personally believe > this is not just OK but actually important, especially if these > download links point to RDF. So not "Wrong!", but just not to your > liking. You're welcome to reopen the relevant ticket (2960909), > restart the discussion (I believe agreement was reached roughly here: > https://sourceforge.net/mailarchive/message.php?msg_name=04E...@ne...) > and wait for me to fix the issue again, this time to your > specification. > > ------------------------------------------------------------------------------ > Download Intel® Parallel Studio Eval > Try the new software tools for yourself. Speed compiling, find bugs > proactively, and fine-tune applications for parallel performance. > See why Intel Parallel Studio got high marks during beta. > http://p.sf.net/sfu/intel-sw-dev > _______________________________________________ > Treebase-devel mailing list > Tre...@li... > https://lists.sourceforge.net/lists/listinfo/treebase-devel -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- informatics.nescent.org : =========================================================== |
From: William P. <wil...@ya...> - 2010-03-19 13:27:32
|
On Mar 19, 2010, at 12:13 AM, William Piel wrote: > On Mar 18, 2010, at 11:10 PM, William Piel wrote: > >> only be the time I reached S2256 did the mapping work again. > > It looks to me like the mis-mapping of taxa is concentrated between S2257 and S10215. Before S2257, the data look okay. After S10215, they also perform okay. Vladimir -- are you investigating this issue? bp |
From: William P. <wil...@ya...> - 2010-03-19 13:26:19
|
On Mar 19, 2010, at 4:47 AM, Rutger Vos wrote: > After discussion on this list it was decided that it would > be OK to have these links be purls instead it certainly helps prevent the proliferation of synonymous URIs if all /phylows/ urls have the same domain syntax. bp |
From: Rutger V. <rut...@gm...> - 2010-03-19 08:47:45
|
> (3) Havoc from purls. Since sometime recently, *some* links that are crucial for the functionality of the application contain purls instead of urls pointing to the current application instance. Due to the current resolution of the purls, this makes all instances to eventually drift to the treebase.nescent.org instance. > For example, in search results > Matrices tab: the columns ID, Download NeXML, Download RDF, and Download Reconstructed, and Matrix Row List contain purl links (Wrong!), while the columns Taxa and Download Original have links to the current instance (Right!) Interestingly, in the Trees tab all the similarly appearing links point to the current instance. (Right!) I implemented this. Since the re-configuration of tomcat to run multiple instances of the webapp some links had been pointing to localhost. After discussion on this list it was decided that it would be OK to have these links be purls instead, and I personally believe this is not just OK but actually important, especially if these download links point to RDF. So not "Wrong!", but just not to your liking. You're welcome to reopen the relevant ticket (2960909), restart the discussion (I believe agreement was reached roughly here: https://sourceforge.net/mailarchive/message.php?msg_name=04E...@ne...) and wait for me to fix the issue again, this time to your specification. |
From: William P. <wil...@ya...> - 2010-03-19 04:13:28
|
On Mar 18, 2010, at 11:10 PM, William Piel wrote: > only be the time I reached S2256 did the mapping work again. It looks to me like the mis-mapping of taxa is concentrated between S2257 and S10215. Before S2257, the data look okay. After S10215, they also perform okay. bp |
From: William P. <wil...@ya...> - 2010-03-19 03:36:44
|
On Mar 18, 2010, at 11:10 PM, William Piel wrote: > This is affecting too many records, and I'm worried that it is a harbinger of other problems. I think we need to resolve this. Does anyone known the function of the field called "linked" in the taxonlabel table? select count(*) from taxonlabel where linked = 'f' = 167,090 select count(*) from taxonlabel where linked = 't' = 1,228 On the one hand, this seems like a suspicious candidate for the TI mapping failure. On the other hand, *too* many records have "f" -- so it seemingly can't explain the problem. bp |
From: William P. <wil...@ya...> - 2010-03-19 03:26:08
|
On Mar 18, 2010, at 9:06 PM, Vladimir Gapeyev wrote: > (2) An ominous message when Phylowidget loads: "An applet requires access to your computer. The digital signature could not be verified." It does sound a little scary. Is there a workaround, or would we have to purchase something from Verisign (or whatever)? > (3) Havoc from purls. Since sometime recently, *some* links that are crucial for the functionality of the application contain purls instead of urls pointing to the current application instance. Due to the current resolution of the purls, this makes all instances to eventually drift to the treebase.nescent.org instance. This should not be a problem when we serve from production, right? > Consequently, I think the only legitimate purl occurrences are the places where the purls are spelled out in text Good point. Let's make a mental note of this -- but for now it's probably not a show-stopper. > - The temporary study 10215, which was used for migration uploads, is associated with 492 files in study_nexus file. This might be the missing files, but isn't the number too low? there were only 99 studies in the last migration -- so 492 files (some matrices others blocks of trees) sounds about right. > - On the other hand, migration dumps only contained tree or matrix files, while original files are expected to be mixed in general, right? Then the migration was not supposed to put original files into the DB, how could it? For TB1 data, we don't have the real original files, so we intended to use the migration files instead. We should fix this, but not a show-stopper IMO. > - I checked a few studies that do have files in study_nexusfile. E.g., #823. They do return these files through the the "Download original file" link, but the tree and matrix entries for these studies do not offer links to download NeXML and RDF. I see the NeXML and RDF links: http://treebase.nescent.org/treebase-web/search/study/matrices.html?id=823 > With (3), we can go to production tomorrow by just re-pointing treebase.nescent.org to the production database. However, after that we will not have an instance where to test newer versions of the application (since many links on treebase-dev.nescent.org resolve, via purls, to treebase.nescent.org). If it were up to me, I'd prefer the purls issue resolved before Treebase goes live. I can probably track down and correct the wrong links myself, within about a day of work. However, my understanding of the role of purls was not orthodox in the past. I'm with you regarding fixing the purls, but maybe that's not a show-stopper seeing as the main benefit is to make it easier for us to de-bug the dev deployment. bp |
From: William P. <wil...@ya...> - 2010-03-19 03:10:28
|
On Mar 18, 2010, at 9:06 PM, Vladimir Gapeyev wrote: > (1) Unexpected different results on the Taxa tab -- a feature or a bug? E.g., find a single study, e.g. 10051. > -- Click the study (which goes to Citation tab), then to Taxa tab ==> "Nothing to display" > -- Go to Matrices tab; click on "View Taxa" in the table ==> it goes back to the Taxa tab, showing lots of stuff > -- Go back to Citation tab; then Taxa tab ==> "Nothing found to display" > -- Go to Trees tab; click on "View Taxa" in the table ==> it goes back to the Taxa tab, showing lots of stuff. It looks as though the taxon intel data did not get uploaded properly. Study 10051 has the taxon label "Polystichum acutidens DQ202419" which in my latest TI dump maps to the taxon variant "Polystichum acutidens" (namebankid 5977443) and that, in turn, maps to the taxon "Polystichum acutidens" (taxid 265680). But going through the browser shows no results under the taxon tab. 10052 has no taxon mappings (even though my TI tables do) 10053 has no taxon mappings (even though my TI tables do) etc. I queried treebase-stage with this: select * from taxonlabel join taxonvariant using (taxonvariant_id) where taxonlabel = 'Coniothyrium clematidis-rectae CBS 507.63' Because 'Coniothyrium clematidis-rectae CBS 507.63' is one of the taxon labels in 10053 that fails to map to a taxonvariant. To my surprise, the results of the query show that the mapping is in place -- taxonlabel 'Coniothyrium clematidis-rectae CBS 507.63' is mapped to taxonvariant_id 562599 in the database. So something is quite wrong here. It looks like the March migration starts at S9931 -- I examined that one and the browser shows no taxon mapping. So then I looked prior to that, and only be the time I reached S2256 did the mapping work again. This is affecting too many records, and I'm worried that it is a harbinger of other problems. I think we need to resolve this. bp |
From: Vladimir G. <vga...@ne...> - 2010-03-19 01:07:07
|
I had a Treebase session with a NESCenet postdoc, Eric Schuettpelz. Besides several usability issues that might be interesting for future, here are the more prominent issues that might bear on the release decisions, in increasing order of my perceived importance. A note: treebase.nescent.org is currently the only properly operating instance of the application, see (3). It is currently connected to the development database at treebase-dev.nescent.org/treebasedev. (1) Unexpected different results on the Taxa tab -- a feature or a bug? E.g., find a single study, e.g. 10051. -- Click the study (which goes to Citation tab), then to Taxa tab ==> "Nothing to display" -- Go to Matrices tab; click on "View Taxa" in the table ==> it goes back to the Taxa tab, showing lots of stuff -- Go back to Citation tab; then Taxa tab ==> "Nothing found to display" -- Go to Trees tab; click on "View Taxa" in the table ==> it goes back to the Taxa tab, showing lots of stuff. (2) An ominous message when Phylowidget loads: "An applet requires access to your computer. The digital signature could not be verified." (3) Havoc from purls. Since sometime recently, *some* links that are crucial for the functionality of the application contain purls instead of urls pointing to the current application instance. Due to the current resolution of the purls, this makes all instances to eventually drift to the treebase.nescent.org instance. For example, in search results > Matrices tab: the columns ID, Download NeXML, Download RDF, and Download Reconstructed, and Matrix Row List contain purl links (Wrong!), while the columns Taxa and Download Original have links to the current instance (Right!) Interestingly, in the Trees tab all the similarly appearing links point to the current instance. (Right!) My wrong/right assessment above derives from this: - A purl is just a canonical unique identifier for a data object. Resolvability of a purl is a bonus, but should not be required. - A purl occurrence on a page of an application is for informational purposes only, i.e. operation of an application instance should not depend on the purl being resolvable. Consequently, I think the only legitimate purl occurrences are the places where the purls are spelled out in text, as e.g., the "Canonical resource URI" for the study on the Citation tab. (4) We noticed that some many tree and matrix entries had original files missing. It seemed like a big data problem to me initially, but the more I look into it, the less I understand how things are intended to be, so I'll just enumerate the bothersome issues. - In some studies, e.g. #10065, the "Download original file" link, both for matrices and trees, brings up a text file that contains something like "File Not Found. File Name is: M4552.nex". This study does no have an entry in the study_nexusfile table (which, I think, is supposed to contain the original files), and there are many such studies. - The temporary study 10215, which was used for migration uploads, is associated with 492 files in study_nexus file. This might be the missing files, but isn't the number too low? - On the other hand, migration dumps only contained tree or matrix files, while original files are expected to be mixed in general, right? Then the migration was not supposed to put original files into the DB, how could it? - I checked a few studies that do have files in study_nexusfile. E.g., #823. They do return these files through the the "Download original file" link, but the tree and matrix entries for these studies do not offer links to download NeXML and RDF. Of these, (1) and (2) might be passable for the release. With (4), it is either a huge problem or not a big deal. With (3), we can go to production tomorrow by just re-pointing treebase.nescent.org to the production database. However, after that we will not have an instance where to test newer versions of the application (since many links on treebase-dev.nescent.org resolve, via purls, to treebase.nescent.org). If it were up to me, I'd prefer the purls issue resolved before Treebase goes live. I can probably track down and correct the wrong links myself, within about a day of work. However, my understanding of the role of purls was not orthodox in the past. --Vladimir |
From: Jon A. <jon...@ne...> - 2010-03-18 17:29:06
|
I'm starting the data dump from treebasestage now. When that is done, I will start restoring to production. Please note that any data changes on treebasestage after now will not be reflected on production. -Jon On Mar 18, 2010, at 1:05 PM, William Piel wrote: > > On Mar 18, 2010, at 9:51 AM, Hilmar Lapp wrote: > >> Bill - >> >> are you finished with the help test items in the database? Jon and >> Vladimir are waiting for a signal to move staging to production. >> >> -hilmar > > > Sorry -- I've been in meetings all morning. The help items are not all done, but I don't see some missing help items as show stoppers. I vote that we go ahead with a new build for the production side and a dump-ingest data movement from stage to production. The more critical (ie show stopper) items include updating our web page verbiage and solving the phylowidget problem -- those efforts must continue in parallel. > > bp > > > ------------------------------------------------------------------------------ > Download Intel® Parallel Studio Eval > Try the new software tools for yourself. Speed compiling, find bugs > proactively, and fine-tune applications for parallel performance. > See why Intel Parallel Studio got high marks during beta. > http://p.sf.net/sfu/intel-sw-dev > _______________________________________________ > Treebase-devel mailing list > Tre...@li... > https://lists.sourceforge.net/lists/listinfo/treebase-devel ------------------------------------------------------- Jon Auman Systems Administrator National Evolutionary Synthesis Center Duke University http:www.nescent.org jon...@ne... ------------------------------------------------------ |
From: William P. <wil...@ya...> - 2010-03-18 17:05:42
|
On Mar 18, 2010, at 9:51 AM, Hilmar Lapp wrote: > Bill - > > are you finished with the help test items in the database? Jon and > Vladimir are waiting for a signal to move staging to production. > > -hilmar Sorry -- I've been in meetings all morning. The help items are not all done, but I don't see some missing help items as show stoppers. I vote that we go ahead with a new build for the production side and a dump-ingest data movement from stage to production. The more critical (ie show stopper) items include updating our web page verbiage and solving the phylowidget problem -- those efforts must continue in parallel. bp |
From: Hilmar L. <hl...@ne...> - 2010-03-18 14:28:25
|
Thanks Rutger! Vladimir - can you make sure these scripts and Rutger's documentation below get committed to svn? -hilmar On Mar 18, 2010, at 7:30 AM, Rutger Vos wrote: > Hi all, > > sorry about the late response. Here's how it works, (to the extent > that I've managed to understand MJD's code): there is a "check" > script. This script needs two arguments: a table name (out of which > MJD's code creates a perl ORM object) and an ID in that table. The > script then tries to construct the logically expected subtended object > hierarchy starting from the focal object. Anything unexpected is > written two STDERR. The most useful way to use this is to say "check > Study $studyID". What I've done in the past is to dump all study IDs > to a file "STUDIES", and then running the following shell script: > > #!/bin/bash > studies=`cat STUDIES` > for study in $studies; do > check Study $study 2> $study.err > logfilesize=`wc -l $study.err | cut -f1 -d' '` > if [[ $logfilesize > 0 ]] > then > gzip -9 $study.err > else > rm $study.err > fi > done > > This will create a $studyID.gz file for every inconsistent study. On > closer examination of these, most inconsistencies lead back to only a > handful of problems, mostly related to incomplete repatriation of > objects from dummy study 22 to their destination study. It's therefore > more informative to bin the inconsistencies by category as opposed to > by study. For this, MJD has written a "digester" script. Assuming you > have a directory full of gzipped study reports, you can then run the > following shell script to categorize the reports: > > #!/bin/bash > zips=`ls *.gz` > for zip in $zips; do > gunzip $zip > base=`echo $zip | sed -e 's/\.gz//'` > dir=`echo $base | sed -e 's/\.err//'` > grep '\*' $base | digester -d $dir > gzip -9 $base > cd $dir > logs=`ls *` > for log in $logs; do > cat $log >> ../$log > done > cd ../ > done > > This will create files such as "tree_references_tls_but_its_no", which > lists the PhyloTree objects that reference TaxonLabelSet X, whereas > some of its nodes reference a TaxonLabel that is in TaxonLabelSet Y. > In all these cases, X is still linked to Study 22 (so not repatriated > correctly) while the individual labels and their Y are in the right > place. > > By the way, the "gc" script is to be ignored. The idea was that this > would be a garbage collector that could automatically figure out all > inconsistencies and fix them. MJD never quite completed it and/or > worked up the confidence and courage to let it loose on a live > database. > > Hope this helps, > > Rutger > > On Wed, Mar 17, 2010 at 8:43 PM, Vladimir Gapeyev > <vla...@du...> wrote: >> >> On Mar 17, 2010, at 10:29 AM, Vladimir Gapeyev wrote: >> >>> On Mar 17, 2010, at 10:05 AM, Hilmar Lapp wrote: >>> >>>> Rutger - where do the consistency tests stand (#2899240). >>>> Vladimir is >>>> going to try to run those which exist, but I'm not sure about the >>>> coverage - is it enough to give us any confidence? >>> >>> To add, these are the only things I detected that I guess might have >>> relevance to data consistency checking: >>> treebase-core/src/main/perl/bin/check >>> treebase-core/src/main/perl/check/check >>> treebase-core/src/main/perl/lib/CIPRES/TreeBase >> >> >> Here is what I got. >> >> The two check scripts are actually the same. The only thing I could >> get out of them is printing out contents of an object specified by >> its >> class/table name and an ID. >> >> There is another script, perl/bin/gc. The wiki description for it is >> "Garbage collector, prints out orphaned objects (e.g. trees without >> studies), presumably candidates for deletion." A few excerpts from >> its printout are below -- I am not sure how to interpret them. >> >> Anyone in the know, please point me in the correct direction. >> >> --Vladimir >> >> >> [vg34@treebasedb-dev ConsistencyChecks]$ perl/bin/gc >> Database contains 5392 Analysis items >> Database contains 5397 AnalysisStep items >> Database contains 12378 AnalyzedData items >> Database contains 4579 Matrix items >> Database contains 236604 MatrixRow items >> Database contains 6613 PhyloTree items >> Database contains 557909 PhyloTreeNode items >> Database contains 2454 Study items >> Database contains 168318 TaxonLabel items >> S127 8/8 >> S1801 3/3 >> S71 2/2 >> S1648 2/2 >> S1481 2/2 >> S10309 4/4 >> S10122 2/2 >> S1178 4/4 >> ..... // I suspect it prints out *all* the studies >> * Analysis 4762 >> * Analysis 4764 >> * Analysis 4821 >> * Analysis 4842 >> * AnalysisStep 4821 >> * Matrix 181 >> * Matrix 182 >> * Matrix 183 >> * Matrix 184 >> * Matrix 185 >> * Matrix 186 >> * Matrix 355 >> * Matrix 367 >> * Matrix 990 >> * Matrix 992 >> * Matrix 993 >> * Matrix 994 >> * Matrix 997 >> * Matrix 998 >> * Matrix 999 >> * Matrix 1000 >> * Matrix 1001 >> * Matrix 1617 >> * Matrix 1618 >> * Matrix 1903 >> * Matrix 2146 >> * Matrix 3702 >> * Matrix 4070 >> * Matrix 4110 >> * Matrix 4130 >> * Matrix 4150 >> * Matrix 4227 >> * Matrix 4280 >> * Matrix 4456 >> * Matrix 4528 >> * Matrix 4778 >> * Matrix 4893 >> * MatrixRow 4091 >> * MatrixRow 4092 >> * MatrixRow 4093 >> * MatrixRow 4094 >> * MatrixRow 4095 >> * MatrixRow 4096 >> * MatrixRow 4097 >> .... //Are these the orphans? These are all Analyses and >> Matrices >> from the output, but I skip most MatrixRows, as there are many >> * MatrixRow 234956 >> * MatrixRow 234957 >> * MatrixRow 234958 >> * MatrixRow 234959 >> * PhyloTree 85 >> * PhyloTree 86 >> * PhyloTree 88 >> * PhyloTree 181 >> .... /// It prints out a lot of PhyloTrees, likely all of them >> * PhyloTree 6978 >> * PhyloTree 6979 >> * PhyloTree 6980 >> * PhyloTree 6981 >> * PhyloTreeNode 76327 >> * PhyloTreeNode 76328 >> * PhyloTreeNode 76329 >> * PhyloTreeNode 76330 >> * PhyloTreeNode 76331 >> * PhyloTreeNode 76332 >> ..... >> * PhyloTreeNode 76488 >> * PhyloTreeNode 76489 >> * PhyloTreeNode 76490 >> * PhyloTreeNode 153706 //a sharp jump >> * PhyloTreeNode 153707 >> * PhyloTreeNode 153708 >> * PhyloTreeNode 153709 >> * PhyloTreeNode 153710 >> ..... >> * PhyloTreeNode 559205 >> * PhyloTreeNode 559206 >> * PhyloTreeNode 559207 >> * TaxonLabel 1288 >> * TaxonLabel 1289 >> * TaxonLabel 1290 >> * TaxonLabel 1291 >> * TaxonLabel 1292 >> ....... >> * TaxonLabel 276777 >> * TaxonLabel 276778 >> * TaxonLabel 276779 >> * TaxonLabel 276780 >> * TaxonLabel 276781 >> >> >> >> >> ------------------------------------------------------------------------------ >> Download Intel® Parallel Studio Eval >> Try the new software tools for yourself. Speed compiling, find bugs >> proactively, and fine-tune applications for parallel performance. >> See why Intel Parallel Studio got high marks during beta. >> http://p.sf.net/sfu/intel-sw-dev >> _______________________________________________ >> Treebase-devel mailing list >> Tre...@li... >> https://lists.sourceforge.net/lists/listinfo/treebase-devel >> > > > > -- > Dr. Rutger A. Vos > School of Biological Sciences > Philip Lyle Building, Level 4 > University of Reading > Reading > RG6 6BX > United Kingdom > Tel: +44 (0) 118 378 7535 > http://www.nexml.org > http://rutgervos.blogspot.com > > ------------------------------------------------------------------------------ > Download Intel® Parallel Studio Eval > Try the new software tools for yourself. Speed compiling, find bugs > proactively, and fine-tune applications for parallel performance. > See why Intel Parallel Studio got high marks during beta. > http://p.sf.net/sfu/intel-sw-dev > _______________________________________________ > Treebase-devel mailing list > Tre...@li... > https://lists.sourceforge.net/lists/listinfo/treebase-devel -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- informatics.nescent.org : =========================================================== |
From: Hilmar L. <hl...@ne...> - 2010-03-18 14:08:09
|
Youjun: On Mar 17, 2010, at 10:13 PM, youjun guo wrote: > The new Phylowidget.jar was submitted to sourceforge under treebase- > web/src/webapp/main/test/phylowidget/ Maybe I'm missing some context here, but why under test/ ? -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- informatics.nescent.org : =========================================================== |
From: Hilmar L. <hl...@du...> - 2010-03-18 13:51:13
|
Bill - are you finished with the help test items in the database? Jon and Vladimir are waiting for a signal to move staging to production. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at duke dot edu : =========================================================== |
From: Rutger V. <rut...@gm...> - 2010-03-18 11:31:01
|
Hi all, sorry about the late response. Here's how it works, (to the extent that I've managed to understand MJD's code): there is a "check" script. This script needs two arguments: a table name (out of which MJD's code creates a perl ORM object) and an ID in that table. The script then tries to construct the logically expected subtended object hierarchy starting from the focal object. Anything unexpected is written two STDERR. The most useful way to use this is to say "check Study $studyID". What I've done in the past is to dump all study IDs to a file "STUDIES", and then running the following shell script: #!/bin/bash studies=`cat STUDIES` for study in $studies; do check Study $study 2> $study.err logfilesize=`wc -l $study.err | cut -f1 -d' '` if [[ $logfilesize > 0 ]] then gzip -9 $study.err else rm $study.err fi done This will create a $studyID.gz file for every inconsistent study. On closer examination of these, most inconsistencies lead back to only a handful of problems, mostly related to incomplete repatriation of objects from dummy study 22 to their destination study. It's therefore more informative to bin the inconsistencies by category as opposed to by study. For this, MJD has written a "digester" script. Assuming you have a directory full of gzipped study reports, you can then run the following shell script to categorize the reports: #!/bin/bash zips=`ls *.gz` for zip in $zips; do gunzip $zip base=`echo $zip | sed -e 's/\.gz//'` dir=`echo $base | sed -e 's/\.err//'` grep '\*' $base | digester -d $dir gzip -9 $base cd $dir logs=`ls *` for log in $logs; do cat $log >> ../$log done cd ../ done This will create files such as "tree_references_tls_but_its_no", which lists the PhyloTree objects that reference TaxonLabelSet X, whereas some of its nodes reference a TaxonLabel that is in TaxonLabelSet Y. In all these cases, X is still linked to Study 22 (so not repatriated correctly) while the individual labels and their Y are in the right place. By the way, the "gc" script is to be ignored. The idea was that this would be a garbage collector that could automatically figure out all inconsistencies and fix them. MJD never quite completed it and/or worked up the confidence and courage to let it loose on a live database. Hope this helps, Rutger On Wed, Mar 17, 2010 at 8:43 PM, Vladimir Gapeyev <vla...@du...> wrote: > > On Mar 17, 2010, at 10:29 AM, Vladimir Gapeyev wrote: > >> On Mar 17, 2010, at 10:05 AM, Hilmar Lapp wrote: >> >>> Rutger - where do the consistency tests stand (#2899240). Vladimir is >>> going to try to run those which exist, but I'm not sure about the >>> coverage - is it enough to give us any confidence? >> >> To add, these are the only things I detected that I guess might have >> relevance to data consistency checking: >> treebase-core/src/main/perl/bin/check >> treebase-core/src/main/perl/check/check >> treebase-core/src/main/perl/lib/CIPRES/TreeBase > > > Here is what I got. > > The two check scripts are actually the same. The only thing I could > get out of them is printing out contents of an object specified by its > class/table name and an ID. > > There is another script, perl/bin/gc. The wiki description for it is > "Garbage collector, prints out orphaned objects (e.g. trees without > studies), presumably candidates for deletion." A few excerpts from > its printout are below -- I am not sure how to interpret them. > > Anyone in the know, please point me in the correct direction. > > --Vladimir > > > [vg34@treebasedb-dev ConsistencyChecks]$ perl/bin/gc > Database contains 5392 Analysis items > Database contains 5397 AnalysisStep items > Database contains 12378 AnalyzedData items > Database contains 4579 Matrix items > Database contains 236604 MatrixRow items > Database contains 6613 PhyloTree items > Database contains 557909 PhyloTreeNode items > Database contains 2454 Study items > Database contains 168318 TaxonLabel items > S127 8/8 > S1801 3/3 > S71 2/2 > S1648 2/2 > S1481 2/2 > S10309 4/4 > S10122 2/2 > S1178 4/4 > ..... // I suspect it prints out *all* the studies > * Analysis 4762 > * Analysis 4764 > * Analysis 4821 > * Analysis 4842 > * AnalysisStep 4821 > * Matrix 181 > * Matrix 182 > * Matrix 183 > * Matrix 184 > * Matrix 185 > * Matrix 186 > * Matrix 355 > * Matrix 367 > * Matrix 990 > * Matrix 992 > * Matrix 993 > * Matrix 994 > * Matrix 997 > * Matrix 998 > * Matrix 999 > * Matrix 1000 > * Matrix 1001 > * Matrix 1617 > * Matrix 1618 > * Matrix 1903 > * Matrix 2146 > * Matrix 3702 > * Matrix 4070 > * Matrix 4110 > * Matrix 4130 > * Matrix 4150 > * Matrix 4227 > * Matrix 4280 > * Matrix 4456 > * Matrix 4528 > * Matrix 4778 > * Matrix 4893 > * MatrixRow 4091 > * MatrixRow 4092 > * MatrixRow 4093 > * MatrixRow 4094 > * MatrixRow 4095 > * MatrixRow 4096 > * MatrixRow 4097 > .... //Are these the orphans? These are all Analyses and Matrices > from the output, but I skip most MatrixRows, as there are many > * MatrixRow 234956 > * MatrixRow 234957 > * MatrixRow 234958 > * MatrixRow 234959 > * PhyloTree 85 > * PhyloTree 86 > * PhyloTree 88 > * PhyloTree 181 > .... /// It prints out a lot of PhyloTrees, likely all of them > * PhyloTree 6978 > * PhyloTree 6979 > * PhyloTree 6980 > * PhyloTree 6981 > * PhyloTreeNode 76327 > * PhyloTreeNode 76328 > * PhyloTreeNode 76329 > * PhyloTreeNode 76330 > * PhyloTreeNode 76331 > * PhyloTreeNode 76332 > ..... > * PhyloTreeNode 76488 > * PhyloTreeNode 76489 > * PhyloTreeNode 76490 > * PhyloTreeNode 153706 //a sharp jump > * PhyloTreeNode 153707 > * PhyloTreeNode 153708 > * PhyloTreeNode 153709 > * PhyloTreeNode 153710 > ..... > * PhyloTreeNode 559205 > * PhyloTreeNode 559206 > * PhyloTreeNode 559207 > * TaxonLabel 1288 > * TaxonLabel 1289 > * TaxonLabel 1290 > * TaxonLabel 1291 > * TaxonLabel 1292 > ....... > * TaxonLabel 276777 > * TaxonLabel 276778 > * TaxonLabel 276779 > * TaxonLabel 276780 > * TaxonLabel 276781 > > > > > ------------------------------------------------------------------------------ > Download Intel® Parallel Studio Eval > Try the new software tools for yourself. Speed compiling, find bugs > proactively, and fine-tune applications for parallel performance. > See why Intel Parallel Studio got high marks during beta. > http://p.sf.net/sfu/intel-sw-dev > _______________________________________________ > Treebase-devel mailing list > Tre...@li... > https://lists.sourceforge.net/lists/listinfo/treebase-devel > -- Dr. Rutger A. Vos School of Biological Sciences Philip Lyle Building, Level 4 University of Reading Reading RG6 6BX United Kingdom Tel: +44 (0) 118 378 7535 http://www.nexml.org http://rutgervos.blogspot.com |
From: Vladimir G. <vla...@du...> - 2010-03-17 20:43:25
|
On Mar 17, 2010, at 10:29 AM, Vladimir Gapeyev wrote: > On Mar 17, 2010, at 10:05 AM, Hilmar Lapp wrote: > >> Rutger - where do the consistency tests stand (#2899240). Vladimir is >> going to try to run those which exist, but I'm not sure about the >> coverage - is it enough to give us any confidence? > > To add, these are the only things I detected that I guess might have > relevance to data consistency checking: > treebase-core/src/main/perl/bin/check > treebase-core/src/main/perl/check/check > treebase-core/src/main/perl/lib/CIPRES/TreeBase Here is what I got. The two check scripts are actually the same. The only thing I could get out of them is printing out contents of an object specified by its class/table name and an ID. There is another script, perl/bin/gc. The wiki description for it is "Garbage collector, prints out orphaned objects (e.g. trees without studies), presumably candidates for deletion." A few excerpts from its printout are below -- I am not sure how to interpret them. Anyone in the know, please point me in the correct direction. --Vladimir [vg34@treebasedb-dev ConsistencyChecks]$ perl/bin/gc Database contains 5392 Analysis items Database contains 5397 AnalysisStep items Database contains 12378 AnalyzedData items Database contains 4579 Matrix items Database contains 236604 MatrixRow items Database contains 6613 PhyloTree items Database contains 557909 PhyloTreeNode items Database contains 2454 Study items Database contains 168318 TaxonLabel items S127 8/8 S1801 3/3 S71 2/2 S1648 2/2 S1481 2/2 S10309 4/4 S10122 2/2 S1178 4/4 ..... // I suspect it prints out *all* the studies * Analysis 4762 * Analysis 4764 * Analysis 4821 * Analysis 4842 * AnalysisStep 4821 * Matrix 181 * Matrix 182 * Matrix 183 * Matrix 184 * Matrix 185 * Matrix 186 * Matrix 355 * Matrix 367 * Matrix 990 * Matrix 992 * Matrix 993 * Matrix 994 * Matrix 997 * Matrix 998 * Matrix 999 * Matrix 1000 * Matrix 1001 * Matrix 1617 * Matrix 1618 * Matrix 1903 * Matrix 2146 * Matrix 3702 * Matrix 4070 * Matrix 4110 * Matrix 4130 * Matrix 4150 * Matrix 4227 * Matrix 4280 * Matrix 4456 * Matrix 4528 * Matrix 4778 * Matrix 4893 * MatrixRow 4091 * MatrixRow 4092 * MatrixRow 4093 * MatrixRow 4094 * MatrixRow 4095 * MatrixRow 4096 * MatrixRow 4097 .... //Are these the orphans? These are all Analyses and Matrices from the output, but I skip most MatrixRows, as there are many * MatrixRow 234956 * MatrixRow 234957 * MatrixRow 234958 * MatrixRow 234959 * PhyloTree 85 * PhyloTree 86 * PhyloTree 88 * PhyloTree 181 .... /// It prints out a lot of PhyloTrees, likely all of them * PhyloTree 6978 * PhyloTree 6979 * PhyloTree 6980 * PhyloTree 6981 * PhyloTreeNode 76327 * PhyloTreeNode 76328 * PhyloTreeNode 76329 * PhyloTreeNode 76330 * PhyloTreeNode 76331 * PhyloTreeNode 76332 ..... * PhyloTreeNode 76488 * PhyloTreeNode 76489 * PhyloTreeNode 76490 * PhyloTreeNode 153706 //a sharp jump * PhyloTreeNode 153707 * PhyloTreeNode 153708 * PhyloTreeNode 153709 * PhyloTreeNode 153710 ..... * PhyloTreeNode 559205 * PhyloTreeNode 559206 * PhyloTreeNode 559207 * TaxonLabel 1288 * TaxonLabel 1289 * TaxonLabel 1290 * TaxonLabel 1291 * TaxonLabel 1292 ....... * TaxonLabel 276777 * TaxonLabel 276778 * TaxonLabel 276779 * TaxonLabel 276780 * TaxonLabel 276781 |
From: Hilmar L. <hl...@ne...> - 2010-03-17 19:38:05
|
I think I found the links from the analysis tab and commented them out in the analyses.jsp and analysis.jsp pages. This isn't commenting out the link from the summary page (and it looks like there is one too). I think we should leave that one in, as apparently it works for most analyses, and doesn't scramble the taxon labels? Would you be OK with that, Bill? BTW to see the effect, code needs to be redeployed (and rebuilt) first. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- informatics.nescent.org : =========================================================== |
From: William P. <wil...@ya...> - 2010-03-17 15:16:16
|
On Mar 17, 2010, at 10:05 AM, Hilmar Lapp wrote: > Can we have a conference call before noon to review our status, for example at 11am EDT? Whoops. We just got your email and it's already after 11. We can all in at 11:30, for example. bp |
From: Vladimir G. <vla...@du...> - 2010-03-17 14:29:28
|
On Mar 17, 2010, at 10:05 AM, Hilmar Lapp wrote: > Rutger - where do the consistency tests stand (#2899240). Vladimir is > going to try to run those which exist, but I'm not sure about the > coverage - is it enough to give us any confidence? To add, these are the only things I detected that I guess might have relevance to data consistency checking: treebase-core/src/main/perl/bin/check treebase-core/src/main/perl/check/check treebase-core/src/main/perl/lib/CIPRES/TreeBase |
From: Hilmar L. <hl...@ne...> - 2010-03-17 14:05:17
|
Bill & Youjun - there are currently 5 mission-critical bugs listed in the tracker as open. Two of these are assigned to Youjun. Youjun - can you update us where they stand. Specifically, will you be able to resolve the PhyloWidget- related bug, which seems like a show-stopper for release. Rutger - where do the consistency tests stand (#2899240). Vladimir is going to try to run those which exist, but I'm not sure about the coverage - is it enough to give us any confidence? There are two priority 9 bugs which are unassigned. Youjun - would you be able to take them on and resolve today? I am going to try to schedule testing by appointment with a few NESCent scholars this afternoon. Can we have a conference call before noon to review our status, for example at 11am EDT? (We are scheduled to flip the switch tomorrow for public release.) -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- informatics.nescent.org : =========================================================== |
From: Vladimir G. <vla...@du...> - 2010-03-16 20:20:17
|
On Mar 16, 2010, at 3:54 PM, William Piel wrote: > > On Mar 16, 2010, at 3:44 PM, Vladimir Gapeyev wrote: > >> There were 1002 trees with the null treetype, now there are only 33 >> (below). > > For the remaining 33, can you give them "single" trees (which is > kind of our default) done --vg |