You can subscribe to this list here.
2009 |
Jan
|
Feb
|
Mar
(1) |
Apr
(41) |
May
(41) |
Jun
(50) |
Jul
(14) |
Aug
(21) |
Sep
(37) |
Oct
(8) |
Nov
(4) |
Dec
(135) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2010 |
Jan
(145) |
Feb
(110) |
Mar
(216) |
Apr
(101) |
May
(42) |
Jun
(42) |
Jul
(23) |
Aug
(17) |
Sep
(33) |
Oct
(15) |
Nov
(18) |
Dec
(6) |
2011 |
Jan
(8) |
Feb
(10) |
Mar
(8) |
Apr
(41) |
May
(48) |
Jun
(62) |
Jul
(7) |
Aug
(9) |
Sep
(7) |
Oct
(11) |
Nov
(49) |
Dec
(1) |
2012 |
Jan
(17) |
Feb
(63) |
Mar
(4) |
Apr
(13) |
May
(17) |
Jun
(21) |
Jul
(10) |
Aug
(10) |
Sep
|
Oct
|
Nov
|
Dec
(16) |
2013 |
Jan
(10) |
Feb
|
Mar
(1) |
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2014 |
Jan
(5) |
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(5) |
Nov
|
Dec
|
2015 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(1) |
Nov
|
Dec
|
From: Hilmar L. <hl...@ne...> - 2010-01-22 23:24:38
|
On Jan 22, 2010, at 5:06 PM, Vladimir Gapeyev wrote: > If these tables are supposed to be involved in any functionality to be > offered by TB2, this functionality has probably not been tested. Well, ideally unit tests clean up after themselves, so lack of data could also mean that ... -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- informatics.nescent.org : =========================================================== |
From: Vladimir G. <vla...@du...> - 2010-01-22 22:07:05
|
As part of the exercise finding out the largest ID used in treebasedev tables, I noticed that about 1/4 tables have no data in them: cstreenode distancematrixelement geneticcode geneticcoderecord geneticcodeset itemvalue nodeattribute statechangeset statemodifier taxonauthority taxonlabelgroup taxonlabelpartition taxonlink taxonset stateset ancestralstate ancstateset treeattribute treegroup treenodeedge treepartition treeset typeset usertype usertyperecord (These are among tables that use sequences; I did not check others.) If these tables are supposed to be involved in any functionality to be offered by TB2, this functionality has probably not been tested. --Vladimir |
From: Vladimir G. <vla...@du...> - 2010-01-22 21:49:06
|
It appears that someone at Yale is currently running a couple transactions against treebasedev. I need to run alter commands on most of the tables (to redirect them to use hibernate_sequence), which requires brief undisturbed access to the whole DB. Please let us know if any of these transactions are critical and should not be interrupted. -- Jon would restart the server shortly after it's ok with everyone. Thanks, --Vladimir |
From: William P. <wil...@ya...> - 2010-01-22 19:58:18
|
Hi all, I'm a bit confused by an issue that has cropped up. The data model contains two tables, study and submission in which the submission table contains a study_id column as FK. I cannot fathom why one study could have two or more submissions, so I think we are to assume a one-to-one connection between these tables. I'm not clear why we need a separate study and submission table at all -- possibly it is because an early iteration envisioned a scenario where new submissions are tracked with a submission_id, and the submitting person is not informed of the study_id until the data get vetted. Or perhaps we envisioned that the editing of a new submission gets taken over by a co-author on the same existing study_id record. At any rate, for whatever reason, we have study_ids and submission_ids to contend with. To date, all migrated data, and all new test submissions created at SDSC, have their study_id equal to their submission_id. So up until now, I hadn't really noticed an issue because study number and submission number were completely interchangeable. I had assumed that when a new submission is created, and the submission_id says "1234", then if it ever gets published, the study_id will also be "1234". Now, however, this has changed -- the two ids have become out of sync. And once we change over to a common sequence table, there is no hope of the two being in sync ever again. In terms of database integrity, there's probably no problem. In terms of confusion with our users, however, we may have problems. For example, I think it was Rutger who kindly added a radio button to the searchBySubmissionID.html page so that an Admin person can find a submission/study to edit it (see attached image). The radio button seems to say "search by TreeBASE2 Id" -- but in fact, it is really searching by submission_id, which happens to work for 99% of studies (because the two ids are in sync), but will not work from hereafter. For this to be useful, the dialog box should be changed to offer three choices: TreeBASE2 study_id, TreeBASE2 submission_id, and TreeBASE1 legacy study_id. When a user creates a new submission record, a new study record is also instantly created. The submission summary page lists the submission_id (e.g. 10050, below) as well as the study_id imbedded in the "right-click and copy me" link (in this case "S10060" -- out of sync by a small but significant value from 10050). When an editor clicks the "Contact Submitter" link, an email with the subject header "TreeBASE Submission S10050" is generated on a local email client. So, in all email discussions between submitter and editor, the submission_id is being used to make it clear what the discussion is about. The risk here is that some submitters will go ahead and cite the meaningless "S10050" number in their manuscripts instead of the "S10060" number (or a phylows URL containing that number). I guess we have a couple of solutions: 1. mechanically make it so that submission_id and study_id are always in sync, thereby making them interchangeable and the whole issue moot. The problem is that this may be difficult to fix without having to change lots of code. 2. hide the submission_id from *everyone* -- submitters, users, and editors alike. Wherever the submission_id is shown, have it show the study_id instead. In email discussions between submitter and editor, only the study_id is cited. The downside is that this will probably take a lot of work. 3. fix the searchBySubmissionID.html page so that there are three radio buttons to choose from (study_id, submission_id, and legacy_id). Then add a "Study Accession URL" to the submission summary page (in addition to the Reviewer Access URL) so that the user knows exactly what to cite and does not confuse submission_id with study_id. I guess I vote for solution 3, since that requires the least amount of coding. The risk is that users will contact editors with questions like "what happened to the data in S1234" and we won't know if they mean submission S1234 or study S1234 -- the result is an additional back-and-forth of emails to clarify. What might mitigate this possibility, is if everywhere that the submission_id appears on our web pages, we write it like "Sub1234" -- making it more likely that in future communication we know what the integer means. Bill This is the Search Submission page which is used by editors and admin users. Currently the "TreeBASE2 Id" actually searches the submission_id. Either both ids should be totally in sync or we should add another radio button to distinguish the two. Summary for current study page for submitting users and for admin editors. Here the reviewer access URL correctly uses the study_id, whereas the submission_id is presented to the user. Either both ids should be totally in sync, or we should be clear on what the final "Study Accession URL" will look like (with the study_id embedded therein). |
From: William P. <wil...@ya...> - 2010-01-22 18:49:30
|
Thanks, Jon. On Jan 22, 2010, at 1:23 PM, Jon Auman wrote: > To recap, we now have three instances of the application located at: > > http://treebase-dev.nescent.org \\ connects to current dev database, contains junk data > http://treebase-stage.nescent.org \\ connects to staging database, which is a restore of the database on Dec. 8, 2009. > http://treebase-stage.nescent.org \\ currently connects to staging database, but will connect to production once ready For the last one, did you mean something like "treebase-production.nescent.org" ? > 1) verify that we've got a clean database. > * Bill, can you verify that the staging database is clean and/or let us know what we need to do to clean it? The data behind http://treebase-stage.nescent.org/treebase-web/ looks the same as the data behind http://treebasedb-dev.nescent.org:6666/treebase-web/ . (e.g. the tb1 account contains a number of auto-generated test records. > 3) along the matter of persistent urls, do we want to redirect treebase.org to the new server? That will happen when we shut down TreeBASE1. bp |
From: Jon A. <jon...@du...> - 2010-01-22 18:27:33
|
Production Treebase server has been setup with Tomcat, Apache, and Postgresql. All we need now is a cleaned version of the database. To recap, we now have three instances of the application located at: http://treebase-dev.nescent.org \\ connects to current dev database, contains junk data http://treebase-stage.nescent.org \\ connects to staging database, which is a restore of the database on Dec. 8, 2009. http://treebase-stage.nescent.org \\ currently connects to staging database, but will connect to production once ready We are using the same war file on all instances that Vladimir successfully built on his developer laptop. As I see it, the remaining issues before bringing treebase into live production are: 1) verify that we've got a clean database. * Bill, can you verify that the staging database is clean and/or let us know what we need to do to clean it? 2) settle the matter on persistent url. I saw a lot of discussion about this while I was out, but am not sure if there was a conclusion. 3) along the matter of persistent urls, do we want to redirect treebase.org to the new server? -Jon ------------------------------------------------------- Jon Auman Systems Administrator National Evolutionary Synthesis Center Duke University http:www.nescent.org jon...@ne... ------------------------------------------------------ |
From: Rutger V. <rut...@gm...> - 2010-01-19 18:43:55
|
No it isn't. The download controller doesn't know it's dealing with an analysis step (as opposed to a tree/matrix/treeblock/study) so it doesn't know how to traverse the object hierarchy to get to the study. On Tue, Jan 19, 2010 at 6:09 PM, Hilmar Lapp <hl...@dr...> wrote: > Isn't the study ID implied in the analysisStep or analysis? I.e., why > does this need to be passed along (in which case it can be tampered > with), when the server-side application should be able to figure it > out by itself? > > Or am I missing something? > > -hilmar > > Begin forwarded message: > >> From: rv...@us... >> Date: January 19, 2010 12:54:20 PM EST >> To: tre...@li... >> Subject: [Treebase-guts] SF.net SVN: treebase:[472] trunk/treebase- >> web/src/main/webapp/WEB-INF/pages /algorithm.jsp >> >> Revision: 472 >> http://treebase.svn.sourceforge.net/treebase/? >> rev=472&view=rev >> Author: rvos >> Date: 2010-01-19 17:54:19 +0000 (Tue, 19 Jan 2010) >> >> Log Message: >> ----------- >> Added study id so that reviewer study-level access can be verified. >> >> Modified Paths: >> -------------- >> trunk/treebase-web/src/main/webapp/WEB-INF/pages/algorithm.jsp >> >> Modified: trunk/treebase-web/src/main/webapp/WEB-INF/pages/ >> algorithm.jsp >> =================================================================== >> --- trunk/treebase-web/src/main/webapp/WEB-INF/pages/algorithm.jsp >> 2010-01-19 16:24:47 UTC (rev 471) >> +++ trunk/treebase-web/src/main/webapp/WEB-INF/pages/algorithm.jsp >> 2010-01-19 17:54:19 UTC (rev 472) >> @@ -44,7 +44,7 @@ >> title="<fmt:message key="analysis.notvalidated"/>" >> alt="<fmt:message key="analysis.notvalidated"/>"/> >> </span> >> - <a href="/treebase-web/search/downloadAnAnalysisStep.html? >> analysisid=${analysisStepCommand.id}"> >> + <a href="/treebase-web/search/downloadAnAnalysisStep.html? >> analysisid=${analysisStepCommand.id}&id=$ >> {analysisStepCommand.analysis.study.id}"> >> <img >> class="iconButton" >> src="<fmt:message key="icons.download.reconstructed"/>" >> >> >> This was sent by the SourceForge.net collaborative development >> platform, the world's largest Open Source development site. >> >> ------------------------------------------------------------------------------ >> Throughout its 18-year history, RSA Conference consistently attracts >> the >> world's best and brightest in the field, creating opportunities for >> Conference >> attendees to learn about information security's most important >> issues through >> interactions with peers, luminaries and emerging and established >> companies. >> http://p.sf.net/sfu/rsaconf-dev2dev >> _______________________________________________ >> Treebase-guts mailing list >> Tre...@li... >> https://lists.sourceforge.net/lists/listinfo/treebase-guts > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : > =========================================================== > > > > > > ------------------------------------------------------------------------------ > Throughout its 18-year history, RSA Conference consistently attracts the > world's best and brightest in the field, creating opportunities for Conference > attendees to learn about information security's most important issues through > interactions with peers, luminaries and emerging and established companies. > http://p.sf.net/sfu/rsaconf-dev2dev > _______________________________________________ > Treebase-devel mailing list > Tre...@li... > https://lists.sourceforge.net/lists/listinfo/treebase-devel > -- Dr. Rutger A. Vos School of Biological Sciences Philip Lyle Building, Level 4 University of Reading Reading RG6 6BX United Kingdom Tel: +44 (0) 118 378 7535 http://www.nexml.org http://rutgervos.blogspot.com |
From: Hilmar L. <hl...@dr...> - 2010-01-19 18:37:07
|
Isn't the study ID implied in the analysisStep or analysis? I.e., why does this need to be passed along (in which case it can be tampered with), when the server-side application should be able to figure it out by itself? Or am I missing something? -hilmar Begin forwarded message: > From: rv...@us... > Date: January 19, 2010 12:54:20 PM EST > To: tre...@li... > Subject: [Treebase-guts] SF.net SVN: treebase:[472] trunk/treebase- > web/src/main/webapp/WEB-INF/pages /algorithm.jsp > > Revision: 472 > http://treebase.svn.sourceforge.net/treebase/? > rev=472&view=rev > Author: rvos > Date: 2010-01-19 17:54:19 +0000 (Tue, 19 Jan 2010) > > Log Message: > ----------- > Added study id so that reviewer study-level access can be verified. > > Modified Paths: > -------------- > trunk/treebase-web/src/main/webapp/WEB-INF/pages/algorithm.jsp > > Modified: trunk/treebase-web/src/main/webapp/WEB-INF/pages/ > algorithm.jsp > =================================================================== > --- trunk/treebase-web/src/main/webapp/WEB-INF/pages/algorithm.jsp > 2010-01-19 16:24:47 UTC (rev 471) > +++ trunk/treebase-web/src/main/webapp/WEB-INF/pages/algorithm.jsp > 2010-01-19 17:54:19 UTC (rev 472) > @@ -44,7 +44,7 @@ > title="<fmt:message key="analysis.notvalidated"/>" > alt="<fmt:message key="analysis.notvalidated"/>"/> > </span> > - <a href="/treebase-web/search/downloadAnAnalysisStep.html? > analysisid=${analysisStepCommand.id}"> > + <a href="/treebase-web/search/downloadAnAnalysisStep.html? > analysisid=${analysisStepCommand.id}&id=$ > {analysisStepCommand.analysis.study.id}"> > <img > class="iconButton" > src="<fmt:message key="icons.download.reconstructed"/>" > > > This was sent by the SourceForge.net collaborative development > platform, the world's largest Open Source development site. > > ------------------------------------------------------------------------------ > Throughout its 18-year history, RSA Conference consistently attracts > the > world's best and brightest in the field, creating opportunities for > Conference > attendees to learn about information security's most important > issues through > interactions with peers, luminaries and emerging and established > companies. > http://p.sf.net/sfu/rsaconf-dev2dev > _______________________________________________ > Treebase-guts mailing list > Tre...@li... > https://lists.sourceforge.net/lists/listinfo/treebase-guts -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : =========================================================== |
From: youjun g. <you...@ya...> - 2010-01-19 16:20:35
|
> > do find some classes refer to this type. The classes are located in "test" > directory, but do not look like test classes. The package name is > org.cipres.treebase.migtation. > > Also a AbstractPhyloDataSet and all classes extends this class will not > work anymore > Since the efforts we already paid, If no one complain with the two above issues, I will keep working and expect to finish this week. YOujun |
From: Hilmar L. <hl...@ne...> - 2010-01-19 16:10:30
|
Hi Youjun, thanks for this investigation - this makes things a lot clearer. With this understanding below, I would have actually been OK with postponing fixing it to after the release, but I see you've forged ahead already, which is great too. -hilmar On Jan 15, 2010, at 2:51 PM, youjun guo wrote: > Ok, here is my test report: > > A group of TreeBase classes use cipres framework to establish the > gateway to access NCL facility. Those classes need to delete if we > delete the framework. > > NCL is a nexus file parsing tool similar to the Mesquite. According > to Bill NCL is not used by TreeBase anymore(It's hard to tell from > the code), since we had Mesquite. > > There are 7 NCL related classes in main code need be deleted. In > order to get the test result I already deleted them from my eclipse > > But there are code take both NCL and Mesquite as nexus parser > choices. Those code need to be fix if we take NCL option off. > > Besides, deleting cipres framework and NCL stuff cause 100 complains > across about 20 test classes . > > Youjun -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- informatics.nescent.org : =========================================================== |
From: Rutger V. <rut...@gm...> - 2010-01-19 15:09:32
|
OK, I notice right now some test classes fail to compile in eclipse. I assume you're not done with your commits yet. On Tue, Jan 19, 2010 at 3:06 PM, youjun guo <you...@ya...> wrote: > Dear All, > > I am working on the deletion of cipres framework, please be noted that after > this deletion all the TB code which refer to the type PhyloDataset > (org.cipres.datatypes) will not work and have to be deleted also. > > I do find some classes refer to this type. The classes are located in > "test" directory, but do not look like test classes. The package name is > org.cipres.treebase.migtation. > > Also a AbstractPhyloDataSet and all classes extends this class will not work > anymore. > > Youjun. -- Dr. Rutger A. Vos School of Biological Sciences Philip Lyle Building, Level 4 University of Reading Reading RG6 6BX United Kingdom Tel: +44 (0) 118 378 7535 http://www.nexml.org http://rutgervos.blogspot.com |
From: youjun g. <you...@ya...> - 2010-01-19 15:06:53
|
Dear All, I am working on the deletion of cipres framework, please be noted that after this deletion all the TB code which refer to the type PhyloDataset (org.cipres.datatypes) will not work and have to be deleted also. I do find some classes refer to this type. The classes are located in "test" directory, but do not look like test classes. The package name is org.cipres.treebase.migtation. Also a AbstractPhyloDataSet and all classes extends this class will not work anymore. Youjun. |
From: Rutger V. <rut...@gm...> - 2010-01-18 11:10:10
|
Actually, I do seem to recall that the import program attached more and more authors over incremental runs. On Tue, Jan 12, 2010 at 8:14 PM, William Piel <wil...@ya...> wrote: > > On Jan 12, 2010, at 2:57 PM, Hilmar Lapp wrote: > >> Wouldn't that complicate the data import in some way? I'm OK if it doesn't, but if it does I think we may need to defer until after the release. > > If the citation-update-scirpt can take the Endnote output and update 284 records, I don't see why it can't take a bigger Endnote output and update 2,350 records (which is total number of citations now in TreeBASE1). I don't recall that this is a slow process, and it should have been designed for multi- and incremental updates. > > If Mark didn't design it for multiple & incremental updating (e.g. if each time you run it it creates more and more duplicate author records), than yes, we should only do the 284-record update. Perhaps Vladimir can examine the code and confirm that it is designed for multiple runs of the same script on the same set of records. > > bp > > > > ------------------------------------------------------------------------------ > This SF.Net email is sponsored by the Verizon Developer Community > Take advantage of Verizon's best-in-class app development support > A streamlined, 14 day to market process makes app distribution fast and easy > Join now and get one step closer to millions of Verizon customers > http://p.sf.net/sfu/verizon-dev2dev > _______________________________________________ > Treebase-devel mailing list > Tre...@li... > https://lists.sourceforge.net/lists/listinfo/treebase-devel > -- Dr. Rutger A. Vos School of Biological Sciences Philip Lyle Building, Level 4 University of Reading Reading RG6 6BX United Kingdom Tel: +44 (0) 118 378 7535 http://www.nexml.org http://rutgervos.blogspot.com |
From: Hilmar L. <hl...@ne...> - 2010-01-15 21:25:45
|
On Jan 15, 2010, at 4:17 PM, Vladimir Gapeyev wrote: > > On Jan 14, 2010, at 4:38 PM, Hilmar Lapp wrote: > >> I don't see a problem if for the time being PURL URIs always go to >> production regardless of origin. That is not different, for example, >> for Dryad, where all handle URIs resolve to production, whether they >> originate from development or production, or if we used DOIs. > > This would resolve my concerns. (That is, in all instances, > treebase.domain.name / treebase.purl.domain always contains the same > value, "purl.org" or "treebase.org"). Right. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- informatics.nescent.org : =========================================================== |
From: Vladimir G. <vla...@du...> - 2010-01-15 21:17:32
|
On Jan 14, 2010, at 4:38 PM, Hilmar Lapp wrote: > I don't see a problem if for the time being PURL URIs always go to > production regardless of origin. That is not different, for example, > for Dryad, where all handle URIs resolve to production, whether they > originate from development or production, or if we used DOIs. This would resolve my concerns. (That is, in all instances, treebase.domain.name / treebase.purl.domain always contains the same value, "purl.org" or "treebase.org"). --VG |
From: youjun g. <you...@ya...> - 2010-01-15 19:51:14
|
Ok, here is my test report: A group of TreeBase classes use cipres framework to establish the gateway to access NCL facility. Those classes need to delete if we delete the framework. NCL is a nexus file parsing tool similar to the Mesquite. According to Bill NCL is not used by TreeBase anymore(It's hard to tell from the code), since we had Mesquite. There are 7 NCL related classes in main code need be deleted. In order to get the test result I already deleted them from my eclipse But there are code take both NCL and Mesquite as nexus parser choices. Those code need to be fix if we take NCL option off. Besides, deleting cipres framework and NCL stuff cause 100 complains across about 20 test classes . Youjun |
From: William P. <wil...@ya...> - 2010-01-15 17:26:20
|
On Jan 15, 2010, at 10:55 AM, Vladimir Gapeyev wrote: > Pardon me for being so incremental. > I does not look like taxonlabels were pruned - the table has 232770 rows at the moment. > So, Bill, could you send me the pruning script, as well as the null->tb1 script (if you recorded it). > --Vladimir For the pruning script, follow the link that was in the message. (this one: http://www.treebase.org/~piel/taxlabels_fix.zip) For the other one, I no longer have it -- but it's a simple UPDATE statement that we can redo in a jiffy. bp |
From: Vladimir G. <vla...@du...> - 2010-01-15 15:56:05
|
Pardon me for being so incremental. I does not look like taxonlabels were pruned - the table has 232770 rows at the moment. So, Bill, could you send me the pruning script, as well as the null- >tb1 script (if you recorded it). --Vladimir On Jan 15, 2010, at 10:28 AM, Vladimir Gapeyev wrote: > Thanks! Then Jon and I will try to find and restore from a backup > between Nov 30 and Dec 8-10, when I may have accidentally ran the > tests while setting up my environment. > --Vladimir > > > On Jan 14, 2010, at 11:14 PM, William Piel wrote: > >> >> On Jan 14, 2010, at 4:44 PM, Hilmar Lapp wrote: >> >>> Vladimir - I think you can use the first backup that there is. Or >>> one >>> before the first tests were run. >> >> Maybe the one before the first tests were run would be best. The >> reasoning being that I have made some ad-hoc changes to the data >> (but very few -- it not a big deal if those edits were lost). But >> more importantly, we did make some bigger changes to the data: >> >> - One change we made (some verbiage below about it) occurred around >> Nov 30 of 2009. We had discovered tons of duplicate taxonlabel >> records. I wrote a script to prune these, which drops the number of >> records in taxonlabel from 232,766 to 131,357. (actually, I'm not >> positive that this script was run on the data -- we could still >> have the duplicate records -- I think Rutger said he'd run it, but >> I'm not sure that it happened). >> >> - Another change we made was to take all migrated studies (i.e. >> almost all studies currently in TreeBASE2) in and give them a phony >> "owner" (I think the username "tb1"). These were missing an owner >> because they were migrated in instead of created by a submitter. >> The absence of an owner id was causing (I think) a unit test >> failure. Hence I did a massive update, giving tb1's ID to all >> submissions with a user_id of NULL. >> >> I guess this is just to say that we have, at times, been fixing >> data problems. So the safest backup to resort to is the most recent >> one that does not include the test stuff. >> >> bp >> >> >> >> >> Begin forwarded message: >> >>> I slapped together a script that I think works. I've tested it on >>> a synopsis of TB2 that only includes tables that use taxonlabel_id >>> as a FK. The following zip file contains the perl script and the >>> "before" and "after" versions of the TB2 synopsis. You can find it >>> here: >>> >>> http://www.treebase.org/~piel/taxlabels_fix.zip >>> >>> You can test it by creating a local database and executing the >>> originalsynops.sql file to populate it, then running the >>> fixtaxonlabels.pl script on it. It should remove all duplicate >>> taxonlabels. The number of records in the taxonlabel table drop >>> from 232,766 to 131,357. >>> >>> You'll need to run it on the treebasedev database because I cannot >>> make a backup myself (pg_dump will not work with the username I'm >>> using because the auto-sequences are owned by rvos, which the >>> treebase_app user cannot access). The fixtaxonlabels.pl script >>> was not written using transactions, so you'll definitely want to >>> have a pg_dump backup handy. >>> >>> I think the steps are the following: >>> >>> 1. create an index for the taxonlabel field: >>> >>> CREATE INDEX taxonlabel_i ON taxonlabel USING btree (taxonlabel); >>> >>> 2. make a backup (e.g. with pg_dump) >>> >>> 3. uncomment lines 26-28 and 142 in the fixtaxonlabels.pl script >>> >>> 4. run the fixtaxonlabels.pl script, probably with nohup because >>> it takes about 6 hours on my macbook. >>> >>> I think this should work -- at least I have not noticed any >>> glaring problems. Course, I can't actually test it on real data, >>> only the synopsis. After running this on treebasedev, we should >>> check that the database has not been corrupted -- any problems and >>> we just drop the database and reload from the pg_dump backup. >>> >>> I'm not fully clear why we have a sub_taxonlabel table. At any >>> rate, I ended up deleting all records in sub_taxonlabel that match >>> duplicate records in taxonlabel that require deleting. >>> >>> take care, >>> >>> Bill >> >> ------------------------------------------------------------------------------ >> Throughout its 18-year history, RSA Conference consistently >> attracts the >> world's best and brightest in the field, creating opportunities for >> Conference >> attendees to learn about information security's most important >> issues through >> interactions with peers, luminaries and emerging and established >> companies. >> http://p.sf.net/sfu/rsaconf-dev2dev_______________________________________________ >> Treebase-devel mailing list >> Tre...@li... >> https://lists.sourceforge.net/lists/listinfo/treebase-devel > > ------------------------------------------------------------------------------ > Throughout its 18-year history, RSA Conference consistently attracts > the > world's best and brightest in the field, creating opportunities for > Conference > attendees to learn about information security's most important > issues through > interactions with peers, luminaries and emerging and established > companies. > http://p.sf.net/sfu/rsaconf-dev2dev_______________________________________________ > Treebase-devel mailing list > Tre...@li... > https://lists.sourceforge.net/lists/listinfo/treebase-devel |
From: youjun g. <you...@ya...> - 2010-01-15 15:55:28
|
Hilmar, The difficult part is identify the "dead code" and the "living code". In most of the case "dead code" and "living code" are mixed in same java file. Every deletion can break some other java file which refer to the deleted code. The "Investigation" is actually almost the same efforts as delete and fix. I will prefer deletion over investigation. I will try to delete a nclXXX class which Bill Identify as definitely a dead class and see how it is going Youjun |
From: Rutger V. <rut...@gm...> - 2010-01-15 15:32:10
|
OK, let's hear first of all which classes are affected. I've never seen them in action so I doubt they're exposed through the UI - in which case spending a week on it isn't time we should be spending *right now*. I understand what you're saying, but my gut feeling is simply that they don't affect the web app or we would have seen bugs that involve these classes earlier. On Fri, Jan 15, 2010 at 2:46 PM, Hilmar Lapp <hl...@ne...> wrote: > Maybe I haven't made myself clear. Code that uses those classes *will not > work*. We know that. It may compile, but it *will not work*. > > So I don't understand what the difference can be for the running application > if we delete all code segments that we know *will not work*. How can it be > better to leave that in? > > I'm willing to be educated here, but what I'm hearing is that there is a > potentially large part of the TB2 web-application that we know does not > work. Or is it a small part only? It seems to me that we don't know. I don't > understand what is the benefit of going live with this. This is a > show-stopper in my eyes. > > Either the code that doesn't work isn't needed to begin with, and then we > might as well delete it, or it is needed. If it is needed, nobody has been > able so far to state which functions are affected so we can't make a > criticality assessment. > > So my rather strong vote is to investigate this to the point that we at > least understand what is affected. If the problem is primarily that removing > that guaranteed to be dysfunctional code simply takes up time to make > everything compile again but none of it is exposed through the UI, then I > can totally live with that for the release. But based on what I'm hearing > we're not even close to understanding that this is the situation we are > having. > > -hilmar > > On Jan 15, 2010, at 8:17 AM, Rutger Vos wrote: > >> I'm against doing that now, given the time constraints we're under. >> >> On Friday, January 15, 2010, youjun guo <you...@ya...> wrote: >>> >>> Hilmar, >>> >>> This code refactoring may take a week or a bit longer, should I proceed? >>> >>> >>> >>> Youjun >>> >>> >>> >>> On Thu, Jan 14, 2010 at 4:43 PM, Hilmar Lapp <hl...@ne...> wrote: >>> Can we give those 7 java classes some more attention as to why they are >>> needing the CIPRES framework to compile? Let's remember, we know for a fact >>> that code that relies on that framework is guaranteed not to function. >>> >>> We should at least know what it is in those 7 classes that we know can't >>> be functional because it needs the CIPRES framework. Leaving that code in >>> there because we need it is the same as saying, there is code in there that >>> we need but that we know doesn't work. I'm not comfortable with this. >>> >>> -hilmar >>> >>> On Jan 14, 2010, at 3:46 PM, youjun guo wrote: >>> >>> I checked into TreeBase code and found 7 java classes still need cipres >>> framework 1.1 to compile. I suggest we keep it for now. >>> >>> Youjun >>> >>> ------------------------------------------------------------------------------ >>> Throughout its 18-year history, RSA Conference consistently attracts the >>> world's best and brightest in the field, creating opportunities for >>> Conference >>> attendees to learn about information security's most important issues >>> through >>> interactions with peers, luminaries and emerging and established >>> companies. >>> >>> http://p.sf.net/sfu/rsaconf-dev2dev_______________________________________________ >>> Treebase-devel mailing list >>> Tre...@li... >>> https://lists.sourceforge.net/lists/listinfo/treebase-devel >>> >>> >>> -- =========================================================== >>> : Hilmar Lapp -:- Durham, NC -:- informatics.nescent.org >>> :=========================================================== >>> >>> >>> >>> >>> >> >> -- >> Dr. Rutger A. Vos >> School of Biological Sciences >> Philip Lyle Building, Level 4 >> University of Reading >> Reading >> RG6 6BX >> United Kingdom >> Tel: +44 (0) 118 378 7535 >> http://www.nexml.org >> http://rutgervos.blogspot.com > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- informatics.nescent.org : > =========================================================== > > > > -- Dr. Rutger A. Vos School of Biological Sciences Philip Lyle Building, Level 4 University of Reading Reading RG6 6BX United Kingdom Tel: +44 (0) 118 378 7535 http://www.nexml.org http://rutgervos.blogspot.com |
From: Vladimir G. <vla...@du...> - 2010-01-15 15:28:29
|
Thanks! Then Jon and I will try to find and restore from a backup between Nov 30 and Dec 8-10, when I may have accidentally ran the tests while setting up my environment. --Vladimir On Jan 14, 2010, at 11:14 PM, William Piel wrote: > > On Jan 14, 2010, at 4:44 PM, Hilmar Lapp wrote: > >> Vladimir - I think you can use the first backup that there is. Or one >> before the first tests were run. > > Maybe the one before the first tests were run would be best. The > reasoning being that I have made some ad-hoc changes to the data > (but very few -- it not a big deal if those edits were lost). But > more importantly, we did make some bigger changes to the data: > > - One change we made (some verbiage below about it) occurred around > Nov 30 of 2009. We had discovered tons of duplicate taxonlabel > records. I wrote a script to prune these, which drops the number of > records in taxonlabel from 232,766 to 131,357. (actually, I'm not > positive that this script was run on the data -- we could still have > the duplicate records -- I think Rutger said he'd run it, but I'm > not sure that it happened). > > - Another change we made was to take all migrated studies (i.e. > almost all studies currently in TreeBASE2) in and give them a phony > "owner" (I think the username "tb1"). These were missing an owner > because they were migrated in instead of created by a submitter. The > absence of an owner id was causing (I think) a unit test failure. > Hence I did a massive update, giving tb1's ID to all submissions > with a user_id of NULL. > > I guess this is just to say that we have, at times, been fixing data > problems. So the safest backup to resort to is the most recent one > that does not include the test stuff. > > bp > > > > > Begin forwarded message: > >> I slapped together a script that I think works. I've tested it on a >> synopsis of TB2 that only includes tables that use taxonlabel_id as >> a FK. The following zip file contains the perl script and the >> "before" and "after" versions of the TB2 synopsis. You can find it >> here: >> >> http://www.treebase.org/~piel/taxlabels_fix.zip >> >> You can test it by creating a local database and executing the >> originalsynops.sql file to populate it, then running the >> fixtaxonlabels.pl script on it. It should remove all duplicate >> taxonlabels. The number of records in the taxonlabel table drop >> from 232,766 to 131,357. >> >> You'll need to run it on the treebasedev database because I cannot >> make a backup myself (pg_dump will not work with the username I'm >> using because the auto-sequences are owned by rvos, which the >> treebase_app user cannot access). The fixtaxonlabels.pl script was >> not written using transactions, so you'll definitely want to have a >> pg_dump backup handy. >> >> I think the steps are the following: >> >> 1. create an index for the taxonlabel field: >> >> CREATE INDEX taxonlabel_i ON taxonlabel USING btree (taxonlabel); >> >> 2. make a backup (e.g. with pg_dump) >> >> 3. uncomment lines 26-28 and 142 in the fixtaxonlabels.pl script >> >> 4. run the fixtaxonlabels.pl script, probably with nohup because it >> takes about 6 hours on my macbook. >> >> I think this should work -- at least I have not noticed any glaring >> problems. Course, I can't actually test it on real data, only the >> synopsis. After running this on treebasedev, we should check that >> the database has not been corrupted -- any problems and we just >> drop the database and reload from the pg_dump backup. >> >> I'm not fully clear why we have a sub_taxonlabel table. At any >> rate, I ended up deleting all records in sub_taxonlabel that match >> duplicate records in taxonlabel that require deleting. >> >> take care, >> >> Bill > > ------------------------------------------------------------------------------ > Throughout its 18-year history, RSA Conference consistently attracts > the > world's best and brightest in the field, creating opportunities for > Conference > attendees to learn about information security's most important > issues through > interactions with peers, luminaries and emerging and established > companies. > http://p.sf.net/sfu/rsaconf-dev2dev_______________________________________________ > Treebase-devel mailing list > Tre...@li... > https://lists.sourceforge.net/lists/listinfo/treebase-devel |
From: Hilmar L. <hl...@ne...> - 2010-01-15 14:46:23
|
Maybe I haven't made myself clear. Code that uses those classes *will not work*. We know that. It may compile, but it *will not work*. So I don't understand what the difference can be for the running application if we delete all code segments that we know *will not work*. How can it be better to leave that in? I'm willing to be educated here, but what I'm hearing is that there is a potentially large part of the TB2 web-application that we know does not work. Or is it a small part only? It seems to me that we don't know. I don't understand what is the benefit of going live with this. This is a show-stopper in my eyes. Either the code that doesn't work isn't needed to begin with, and then we might as well delete it, or it is needed. If it is needed, nobody has been able so far to state which functions are affected so we can't make a criticality assessment. So my rather strong vote is to investigate this to the point that we at least understand what is affected. If the problem is primarily that removing that guaranteed to be dysfunctional code simply takes up time to make everything compile again but none of it is exposed through the UI, then I can totally live with that for the release. But based on what I'm hearing we're not even close to understanding that this is the situation we are having. -hilmar On Jan 15, 2010, at 8:17 AM, Rutger Vos wrote: > I'm against doing that now, given the time constraints we're under. > > On Friday, January 15, 2010, youjun guo <you...@ya...> wrote: >> Hilmar, >> >> This code refactoring may take a week or a bit longer, should I >> proceed? >> >> >> >> Youjun >> >> >> >> On Thu, Jan 14, 2010 at 4:43 PM, Hilmar Lapp <hl...@ne...> >> wrote: >> Can we give those 7 java classes some more attention as to why they >> are needing the CIPRES framework to compile? Let's remember, we >> know for a fact that code that relies on that framework is >> guaranteed not to function. >> >> We should at least know what it is in those 7 classes that we know >> can't be functional because it needs the CIPRES framework. Leaving >> that code in there because we need it is the same as saying, there >> is code in there that we need but that we know doesn't work. I'm >> not comfortable with this. >> >> -hilmar >> >> On Jan 14, 2010, at 3:46 PM, youjun guo wrote: >> >> I checked into TreeBase code and found 7 java classes still need >> cipres framework 1.1 to compile. I suggest we keep it for now. >> >> Youjun >> ------------------------------------------------------------------------------ >> Throughout its 18-year history, RSA Conference consistently >> attracts the >> world's best and brightest in the field, creating opportunities for >> Conference >> attendees to learn about information security's most important >> issues through >> interactions with peers, luminaries and emerging and established >> companies. >> http://p.sf.net/sfu/rsaconf-dev2dev_______________________________________________ >> Treebase-devel mailing list >> Tre...@li... >> https://lists.sourceforge.net/lists/listinfo/treebase-devel >> >> >> -- =========================================================== >> : Hilmar Lapp -:- Durham, NC -:- >> informatics >> .nescent >> .org :=========================================================== >> >> >> >> >> > > -- > Dr. Rutger A. Vos > School of Biological Sciences > Philip Lyle Building, Level 4 > University of Reading > Reading > RG6 6BX > United Kingdom > Tel: +44 (0) 118 378 7535 > http://www.nexml.org > http://rutgervos.blogspot.com -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- informatics.nescent.org : =========================================================== |
From: Rutger V. <rut...@gm...> - 2010-01-15 13:17:12
|
I'm against doing that now, given the time constraints we're under. On Friday, January 15, 2010, youjun guo <you...@ya...> wrote: > Hilmar, > > This code refactoring may take a week or a bit longer, should I proceed? > > > > Youjun > > > > On Thu, Jan 14, 2010 at 4:43 PM, Hilmar Lapp <hl...@ne...> wrote: > Can we give those 7 java classes some more attention as to why they are needing the CIPRES framework to compile? Let's remember, we know for a fact that code that relies on that framework is guaranteed not to function. > > We should at least know what it is in those 7 classes that we know can't be functional because it needs the CIPRES framework. Leaving that code in there because we need it is the same as saying, there is code in there that we need but that we know doesn't work. I'm not comfortable with this. > > -hilmar > > On Jan 14, 2010, at 3:46 PM, youjun guo wrote: > > I checked into TreeBase code and found 7 java classes still need cipres framework 1.1 to compile. I suggest we keep it for now. > > Youjun > ------------------------------------------------------------------------------ > Throughout its 18-year history, RSA Conference consistently attracts the > world's best and brightest in the field, creating opportunities for Conference > attendees to learn about information security's most important issues through > interactions with peers, luminaries and emerging and established companies. > http://p.sf.net/sfu/rsaconf-dev2dev_______________________________________________ > Treebase-devel mailing list > Tre...@li... > https://lists.sourceforge.net/lists/listinfo/treebase-devel > > > -- =========================================================== > : Hilmar Lapp -:- Durham, NC -:- informatics.nescent.org :=========================================================== > > > > > -- Dr. Rutger A. Vos School of Biological Sciences Philip Lyle Building, Level 4 University of Reading Reading RG6 6BX United Kingdom Tel: +44 (0) 118 378 7535 http://www.nexml.org http://rutgervos.blogspot.com |
From: youjun g. <you...@ya...> - 2010-01-15 12:59:00
|
Hilmar, This code refactoring may take a week or a bit longer, should I proceed? Youjun On Thu, Jan 14, 2010 at 4:43 PM, Hilmar Lapp <hl...@ne...> wrote: > Can we give those 7 java classes some more attention as to why they are > needing the CIPRES framework to compile? Let's remember, we know for a fact > that code that relies on that framework is guaranteed not to function. > > We should at least know what it is in those 7 classes that we know can't be > functional because it needs the CIPRES framework. Leaving that code in there > because we need it is the same as saying, there is code in there that we > need but that we know doesn't work. I'm not comfortable with this. > > -hilmar > > > On Jan 14, 2010, at 3:46 PM, youjun guo wrote: > > I checked into TreeBase code and found 7 java classes still need cipres > framework 1.1 to compile. I suggest we keep it for now. > > Youjun > > ------------------------------------------------------------------------------ > Throughout its 18-year history, RSA Conference consistently attracts the > world's best and brightest in the field, creating opportunities for > Conference > attendees to learn about information security's most important issues > through > interactions with peers, luminaries and emerging and established companies. > > http://p.sf.net/sfu/rsaconf-dev2dev_______________________________________________ > Treebase-devel mailing list > Tre...@li... > https://lists.sourceforge.net/lists/listinfo/treebase-devel > > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- informatics.nescent.org : > =========================================================== > > > > |
From: William P. <wil...@ya...> - 2010-01-15 04:14:46
|
On Jan 14, 2010, at 4:44 PM, Hilmar Lapp wrote: > Vladimir - I think you can use the first backup that there is. Or one > before the first tests were run. Maybe the one before the first tests were run would be best. The reasoning being that I have made some ad-hoc changes to the data (but very few -- it not a big deal if those edits were lost). But more importantly, we did make some bigger changes to the data: - One change we made (some verbiage below about it) occurred around Nov 30 of 2009. We had discovered tons of duplicate taxonlabel records. I wrote a script to prune these, which drops the number of records in taxonlabel from 232,766 to 131,357. (actually, I'm not positive that this script was run on the data -- we could still have the duplicate records -- I think Rutger said he'd run it, but I'm not sure that it happened). - Another change we made was to take all migrated studies (i.e. almost all studies currently in TreeBASE2) in and give them a phony "owner" (I think the username "tb1"). These were missing an owner because they were migrated in instead of created by a submitter. The absence of an owner id was causing (I think) a unit test failure. Hence I did a massive update, giving tb1's ID to all submissions with a user_id of NULL. I guess this is just to say that we have, at times, been fixing data problems. So the safest backup to resort to is the most recent one that does not include the test stuff. bp Begin forwarded message: > I slapped together a script that I think works. I've tested it on a synopsis of TB2 that only includes tables that use taxonlabel_id as a FK. The following zip file contains the perl script and the "before" and "after" versions of the TB2 synopsis. You can find it here: > > http://www.treebase.org/~piel/taxlabels_fix.zip > > You can test it by creating a local database and executing the originalsynops.sql file to populate it, then running the fixtaxonlabels.pl script on it. It should remove all duplicate taxonlabels. The number of records in the taxonlabel table drop from 232,766 to 131,357. > > You'll need to run it on the treebasedev database because I cannot make a backup myself (pg_dump will not work with the username I'm using because the auto-sequences are owned by rvos, which the treebase_app user cannot access). The fixtaxonlabels.pl script was not written using transactions, so you'll definitely want to have a pg_dump backup handy. > > I think the steps are the following: > > 1. create an index for the taxonlabel field: > > CREATE INDEX taxonlabel_i ON taxonlabel USING btree (taxonlabel); > > 2. make a backup (e.g. with pg_dump) > > 3. uncomment lines 26-28 and 142 in the fixtaxonlabels.pl script > > 4. run the fixtaxonlabels.pl script, probably with nohup because it takes about 6 hours on my macbook. > > I think this should work -- at least I have not noticed any glaring problems. Course, I can't actually test it on real data, only the synopsis. After running this on treebasedev, we should check that the database has not been corrupted -- any problems and we just drop the database and reload from the pg_dump backup. > > I'm not fully clear why we have a sub_taxonlabel table. At any rate, I ended up deleting all records in sub_taxonlabel that match duplicate records in taxonlabel that require deleting. > > take care, > > Bill |