From: William P. <wil...@ya...> - 2010-03-20 12:57:38
|
Last night I added a bug report about taxon labels that get duplicated, triplicated, etc: http://sourceforge.net/tracker/?func=detail&aid=2973661&group_id=248804&atid=1126676 I gave it an "8" so as not to slow us down. But I just wanted to mention that this is a new thing that did not happen even as recently as the build before last. For example, the bug does not exist if you go here: http://treebasedb-dev.nescent.org:6666/treebase-web/ But it does exist on production and port 80 dev. In the attached example, you can see that each taxon label is repeated four times. And then looking at the very same data through 6666, each taxon label is shown once (as it should be). So while this is not a show-stopper, I wanted to mention it because maybe this is an easy, quick fix -- perhaps someone can recall a very recent change he made to the code that might affect the taxon labels page? Or perhaps some configuration file changed? bp |
From: William P. <wil...@ya...> - 2010-03-19 19:48:27
|
On Mar 19, 2010, at 3:40 PM, Hilmar Lapp wrote: > Sounds good to me. Okay, I just did it. I checked back with study_id 10051, and it seems to have fixed the problem. Unless Vladimir has other placeholder study ids used in the migration scripts, this fixes it. Below is the query for Jon to run on production. bp begin work; UPDATE taxonlabelset SET study_id = mx.study_id FROM matrix mx JOIN taxonlabelset tls USING (taxonlabelset_id) WHERE tls.study_id = 2264 AND mx.study_id <> 2264 AND taxonlabelset.taxonlabelset_id = tls.taxonlabelset_id -- should result in 570 changes commit; |
From: William P. <wil...@ya...> - 2010-03-19 19:58:13
|
On Mar 19, 2010, at 3:48 PM, William Piel wrote: > Unless Vladimir has other placeholder study ids used in the migration scripts, this fixes it. Oops... just noticed that S2377 has taxon labels that don't map. There must be some other placeholder ids. Vladimir? bp |
From: Vladimir G. <vla...@du...> - 2010-03-19 20:00:32
|
On Mar 19, 2010, at 3:25 PM, William Piel wrote: > > On Mar 19, 2010, at 12:07 PM, William Piel wrote: > >> >> On Mar 19, 2010, at 11:44 AM, Hilmar Lapp wrote: >> >>> On Mar 18, 2010, at 8:06 PM, Vladimir Gapeyev wrote: >>> >>>> (1) Unexpected different results on the Taxa tab -- a feature or >>>> a bug? E.g., find a single study, e.g. 10051. >>>> -- Click the study (which goes to Citation tab), then to Taxa >>>> tab ==> "Nothing to display" >>>> -- Go to Matrices tab; click on "View Taxa" in the table ==> >>>> it goes back to the Taxa tab, showing lots of stuff >>>> -- Go back to Citation tab; then Taxa tab ==> "Nothing found >>>> to display" >>>> -- Go to Trees tab; click on "View Taxa" in the table ==> it >>>> goes back to the Taxa tab, showing lots of stuff. >>> >>> This is for Bill to judge in terms of what kind of problem it may >>> signal, and hence how severe it is. >> >> This may be severe -- let's at least understand why it is >> happening. It is affecting a lot of studies. > > I think I've figured out what's going on here. I took one taxon > label that maps nicely and compared it with another taxon label that > does not map nicely, and then I ran queries on tables related to > these taxon labels. The main difference is that in the good case the > taxonlabelset table has the correct study_id, while in the bad case > the taxonlabelset has "2264" as study_id instead of "10053", which > is what it should be. > > Last weekend I had solved this problem using an update query that > assumed that study_id 10215 was the only "placeholder" for migrated > records. Turns out that there seems to be another one: 2264. > Vladimir, can you confirm that study_id 2264 is another placeholder? > And if so, can you let me know if there are any other placeholder > study_ids that I'm not aware of. > > If this is the problem as I think it is, the solution is the > following update: > > UPDATE taxonlabelset SET study_id = mx.study_id > FROM matrix mx JOIN taxonlabelset tls USING (taxonlabelset_id) > WHERE tls.study_id = 2264 > AND mx.study_id <> 2264 > AND taxonlabelset.taxonlabelset_id = tls.taxonlabelset_id I followed a similar path and was reaching a similar conclusion. Here is a query that shows all TaxonLabelSets whose tls.study_id is not the same as the study reachable from the TLS via a matrix: select s.study_id, m.matrix_id, tls.* from study s, taxonlabelset tls, matrix m where s.study_id = m.study_id and tls.taxonlabelset_id = m.taxonlabelset_id and s.study_id <> tls.study_id order by s.study_id The only Study IDs showing up as tls.study_id in the result are 22 and 2264. #2264 was picked up by the migration scripts during my 1st migration -- not something I could have known ahead of time, since the scripts were creating a fresh dummy study in my testing daabase! By the time of the 2nd migration batch I realized it was going on and created the 10215 study. I see Bill has already run a fix on the 2264 TLSs. Should we do it for 22-TLSs as well? > The other difference is that the matrix associated with the bad one > has version 0, while the other is version 1. Is there any rhyme or > reason for the version fields that appear in lots of tables? My guess is that version fields are an artifact of Hibernate. Apparently, each time a data object is updated, the version is incremented. |
From: Jon A. <jon...@ne...> - 2010-03-19 20:03:22
|
Bill, I think this seems like a reasonable operating procedure: 1) When youjun has finished with his changes, email the list 2) Vladimir will checkout the code from subversion and rebuild the war file 3) After Vladimir notifies me, I will change the application to point towards production and deploy the new war file 4) Once that has been done, I will email the list 5) Once the powers that be give me their blessings, I will change DNS (unless you'd rather handle the editing of the A record, Bill?) As there are several folks involved, there may be somewhat of a delay between each step, but I think that's OK. Does this sound reasonable to everyone? -Jon On Mar 19, 2010, at 3:40 PM, William Piel wrote: > > I'm optimistic that this last sql update statement will solve the show-stopper bug, and that can be done today. > > Youjun tells me that with the new pages, CSS, etc, that he's working on, he predicts that he will be done by tomorrow at noon. > > So... Jon -- is it possible to get a rebuild on Saturday afternoon (this time with treebase.nescent.org pointing using the production database)? That would imply that after a last read-through of the web pages, I put in for a DNS change Saturday evening or Sunday. I think I have credentials to change the PURL records too -- else Hilmar can take care of that. > > bp > > > > ------------------------------------------------------------------------------ > Download Intel® Parallel Studio Eval > Try the new software tools for yourself. Speed compiling, find bugs > proactively, and fine-tune applications for parallel performance. > See why Intel Parallel Studio got high marks during beta. > http://p.sf.net/sfu/intel-sw-dev > _______________________________________________ > Treebase-devel mailing list > Tre...@li... > https://lists.sourceforge.net/lists/listinfo/treebase-devel ------------------------------------------------------- Jon Auman Systems Administrator National Evolutionary Synthesis Center Duke University http:www.nescent.org jon...@ne... ------------------------------------------------------ |
From: Hilmar L. <hl...@ne...> - 2010-03-19 20:11:25
|
Sounds good. Looks like I'll be online here in Chicago most of the day, so I'll make the call. -hilmar On Mar 19, 2010, at 3:03 PM, Jon Auman wrote: > Bill, > > I think this seems like a reasonable operating procedure: > > 1) When youjun has finished with his changes, email the list > 2) Vladimir will checkout the code from subversion and rebuild the > war file > 3) After Vladimir notifies me, I will change the application to > point towards production and deploy the new war file > 4) Once that has been done, I will email the list > 5) Once the powers that be give me their blessings, I will change > DNS (unless you'd rather handle the editing of the A record, Bill?) > > As there are several folks involved, there may be somewhat of a > delay between each step, but I think that's OK. > > Does this sound reasonable to everyone? > > -Jon > > > On Mar 19, 2010, at 3:40 PM, William Piel wrote: > >> >> I'm optimistic that this last sql update statement will solve the >> show-stopper bug, and that can be done today. >> >> Youjun tells me that with the new pages, CSS, etc, that he's >> working on, he predicts that he will be done by tomorrow at noon. >> >> So... Jon -- is it possible to get a rebuild on Saturday >> afternoon (this time with treebase.nescent.org pointing using the >> production database)? That would imply that after a last read- >> through of the web pages, I put in for a DNS change Saturday >> evening or Sunday. I think I have credentials to change the PURL >> records too -- else Hilmar can take care of that. >> >> bp >> >> >> >> ------------------------------------------------------------------------------ >> Download Intel® Parallel Studio Eval >> Try the new software tools for yourself. Speed compiling, find bugs >> proactively, and fine-tune applications for parallel performance. >> See why Intel Parallel Studio got high marks during beta. >> http://p.sf.net/sfu/intel-sw-dev >> _______________________________________________ >> Treebase-devel mailing list >> Tre...@li... >> https://lists.sourceforge.net/lists/listinfo/treebase-devel > > ------------------------------------------------------- > Jon Auman > Systems Administrator > National Evolutionary Synthesis Center > Duke University > http:www.nescent.org > jon...@ne... > ------------------------------------------------------ > > > > ------------------------------------------------------------------------------ > Download Intel® Parallel Studio Eval > Try the new software tools for yourself. Speed compiling, find bugs > proactively, and fine-tune applications for parallel performance. > See why Intel Parallel Studio got high marks during beta. > http://p.sf.net/sfu/intel-sw-dev_______________________________________________ > Treebase-devel mailing list > Tre...@li... > https://lists.sourceforge.net/lists/listinfo/treebase-devel -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- informatics.nescent.org : =========================================================== |
From: Jon A. <jon...@ne...> - 2010-03-19 20:06:13
|
After all these discrepancies have been worked out, Vladimir will send me the sql to make the changes on production. Vladimir should be the gatekeeper to all changes made on the production database and application. He commits these changes to subversion and notifies me that its time to update. -Jon On Mar 19, 2010, at 3:58 PM, William Piel wrote: > > On Mar 19, 2010, at 3:48 PM, William Piel wrote: > >> Unless Vladimir has other placeholder study ids used in the migration scripts, this fixes it. > > Oops... just noticed that S2377 has taxon labels that don't map. There must be some other placeholder ids. Vladimir? > > bp > > > > ------------------------------------------------------------------------------ > Download Intel® Parallel Studio Eval > Try the new software tools for yourself. Speed compiling, find bugs > proactively, and fine-tune applications for parallel performance. > See why Intel Parallel Studio got high marks during beta. > http://p.sf.net/sfu/intel-sw-dev > _______________________________________________ > Treebase-devel mailing list > Tre...@li... > https://lists.sourceforge.net/lists/listinfo/treebase-devel ------------------------------------------------------- Jon Auman Systems Administrator National Evolutionary Synthesis Center Duke University http:www.nescent.org jon...@ne... ------------------------------------------------------ |
From: Hilmar L. <hl...@ne...> - 2010-03-19 20:10:29
|
On Mar 19, 2010, at 3:06 PM, Jon Auman wrote: > He commits these changes to subversion Great. I was going to point that out, hoping that it is stating the obvious, and an glad to hear that indeed it is. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- informatics.nescent.org : =========================================================== |
From: William P. <wil...@ya...> - 2010-03-19 20:44:28
|
On Mar 19, 2010, at 4:00 PM, Vladimir Gapeyev wrote: > I see Bill has already run a fix on the 2264 TLSs. Should we do it for 22-TLSs as well? I just did that. Unfortunately, it doesn't quite solve the problem. If I search on S2377 and then click on the Taxa tab, I get nothing listed. If I search on S2377 and then click on Matrices and then click on Taxa, I get a long list. If I search on S2377 and then click on Tree and then click on Taxa, I get a long list. With this query: select taxonlabelset_id, title, study_id from taxonlabelset_taxonlabel join taxonlabelset using (taxonlabelset_id) where taxonlabel_id = 72150; I see that not all records with the same taxonlabel_id got taxonlabelset_id | title | study_id ------------------+------------------------+---------- 3105 | M3562 | 2377 8626 | Untitled Block of Taxa | 22 (2 rows) I see that not all taxonlabelsets got changed from id 22 to whatever they should be -- the taxonlabelsets created by trees blocks did not get changed. This is for a misbehaving taxonlabel. Suppose I compare this with a behaving one from an earlier migration: select taxonlabelset_id, title, study_id from taxonlabelset_taxonlabel join taxonlabelset using (taxonlabelset_id) where taxonlabel_id = 95894; taxonlabelset_id | title | study_id ------------------+------------------------+---------- 3841 | M713 | 22 7509 | Untitled Block of Taxa | 22 10400 | TaxonLabelSet10400 | 654 (3 rows) ... there's an extra row, with the two older ones still mapping to the dud study_id 22. This new row looks like it was created by TreeBASE2, probably for the explicit purpose of having both trees and matrices map to the same taxonlabelset. Both matrix and treeblock tables have records with taxonlabelset_id = 10400, which has the correct study_id of 654. That seems to be missing for some of our more recently migrated data. This looks complicated to fix, but it may only affect a few studies. If so, perhaps we could just delete and re-enter these. I guess I'll look to see how many are affected in this way. bp |
From: William P. <wil...@ya...> - 2010-03-19 21:21:44
|
1. I noticed that somehow we generated a lot of duplicates here: http://purl.org/phylo/treebase/phylows/study/find?query=tb.identifier.study=66+or+tb.identifier.study=67+or+tb.identifier.study=68+or+tb.identifier.study=69+or+tb.identifier.study=70+or+tb.identifier.study=71+or+tb.identifier.study=72+or+tb.identifier.study=73&recordSchema=study These should all be deleted because they are copies of study_id 84 (which is the only one to have the legacy id number properly written). Not a show-stopper. 2. After deleting the duplicates in (1), we have a total of 2417 studies in TreeBASE2. That includes two new ones that were never in TreeBASE1. TreeBASE1, however, has 2427 studies -- so that means we have 12 missing studies. Not a show-stopper. After release, we will identify these and re-enter them manually. bp |
From: Vladimir G. <vla...@du...> - 2010-03-19 21:46:21
|
On Mar 19, 2010, at 4:44 PM, William Piel wrote: > I see that not all taxonlabelsets got changed from id 22 to whatever > they should be -- the taxonlabelsets created by trees blocks did not > get changed. > Here are the counts of the dummy studies that still show up in TaxonLabelSet: select study_id, count(*) from taxonlabelset where study_id in (22, 2264, 10215) group by study_id study_id | count ----------+------- 10215 | 285 2264 | 690 22 | 8615 So, there are quite a few. Unlike we did with matrices, they do not seem to be reachable from legitimate studies via the Study -> Phylotree -> Treeblock -> Taxonlabelset chain, i.e. select s.study_id, pt.phylotree_id, tb.treeblock_id, tls.* from study s, phylotree pt, treeblock tb, taxonlabelset tls where s.study_id = pt.study_id and pt.treeblock_id = tb.treeblock_id and tb.taxonlabelset_id = tls.taxonlabelset_id and s.study_id <> tls.study_id order by s.study_id returns zero results. I can continue poking in this, but I admit I don't quite grasp what is the end result we are looking for... Please keep me updated on how I could be useful with data cleaning. If you think a phone call could be useful, please call -- I am here till about 6 - 6:10 pm Scripts that I collect for Jon to apply to production are in SVN in treebase-core/db/cleaning/pre-release_hotfixes --Vladimir |
From: Hilmar L. <hl...@ne...> - 2010-03-19 21:49:56
|
On Mar 19, 2010, at 4:46 PM, Vladimir Gapeyev wrote: > I don't quite grasp what is the end result we are looking for... The end result we are looking for is a series of SQL scripts that, once applied, provide for a state of the database that we are comfortable releasing to the public. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- informatics.nescent.org : =========================================================== |
From: Hilmar L. <hl...@ne...> - 2010-03-19 21:52:53
|
On Mar 19, 2010, at 4:21 PM, William Piel wrote: > 2. After deleting the duplicates in (1), we have a total of 2417 > studies in TreeBASE2. That includes two new ones that were never in > TreeBASE1. TreeBASE1, however, has 2427 studies -- so that means we > have 12 missing studies. Not a show-stopper. After release, we will > identify these and re-enter them manually. At the risk of stating the obvious, can you add that to the bug tracker, priority 8, so it doesn't get lost? -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- informatics.nescent.org : =========================================================== |
From: Vladimir G. <vla...@du...> - 2010-03-19 22:00:57
|
On Mar 19, 2010, at 5:52 PM, Hilmar Lapp wrote: > > On Mar 19, 2010, at 4:21 PM, William Piel wrote: > >> 2. After deleting the duplicates in (1), we have a total of 2417 >> studies in TreeBASE2. That includes two new ones that were never in >> TreeBASE1. TreeBASE1, however, has 2427 studies -- so that means we >> have 12 missing studies. Not a show-stopper. After release, we will >> identify these and re-enter them manually. > > At the risk of stating the obvious, can you add that to the bug > tracker, priority 8, so it doesn't get lost? Bill, I'll take care of that. --VG |
From: Hilmar L. <hl...@ne...> - 2010-03-20 18:54:12
|
It looks to me like we are ready to flip the switch. Jon - this is your signal, unless Bill overrides it before 3.30pm. Let me know when you switch so I can adjust the PURL. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- informatics.nescent.org : =========================================================== |
From: William P. <met...@gm...> - 2010-03-20 19:07:46
|
I'm still away from my computer - i defer to your judgements. Feel free to go ahead. Bill On 3/20/10, Hilmar Lapp <hl...@ne...> wrote: > It looks to me like we are ready to flip the switch. Jon - this is > your signal, unless Bill overrides it before 3.30pm. Let me know when > you switch so I can adjust the PURL. > > -hilmar > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- informatics.nescent.org : > =========================================================== > > > > |