From: Vladimir G. <vla...@du...> - 2010-01-15 15:28:29
|
Thanks! Then Jon and I will try to find and restore from a backup between Nov 30 and Dec 8-10, when I may have accidentally ran the tests while setting up my environment. --Vladimir On Jan 14, 2010, at 11:14 PM, William Piel wrote: > > On Jan 14, 2010, at 4:44 PM, Hilmar Lapp wrote: > >> Vladimir - I think you can use the first backup that there is. Or one >> before the first tests were run. > > Maybe the one before the first tests were run would be best. The > reasoning being that I have made some ad-hoc changes to the data > (but very few -- it not a big deal if those edits were lost). But > more importantly, we did make some bigger changes to the data: > > - One change we made (some verbiage below about it) occurred around > Nov 30 of 2009. We had discovered tons of duplicate taxonlabel > records. I wrote a script to prune these, which drops the number of > records in taxonlabel from 232,766 to 131,357. (actually, I'm not > positive that this script was run on the data -- we could still have > the duplicate records -- I think Rutger said he'd run it, but I'm > not sure that it happened). > > - Another change we made was to take all migrated studies (i.e. > almost all studies currently in TreeBASE2) in and give them a phony > "owner" (I think the username "tb1"). These were missing an owner > because they were migrated in instead of created by a submitter. The > absence of an owner id was causing (I think) a unit test failure. > Hence I did a massive update, giving tb1's ID to all submissions > with a user_id of NULL. > > I guess this is just to say that we have, at times, been fixing data > problems. So the safest backup to resort to is the most recent one > that does not include the test stuff. > > bp > > > > > Begin forwarded message: > >> I slapped together a script that I think works. I've tested it on a >> synopsis of TB2 that only includes tables that use taxonlabel_id as >> a FK. The following zip file contains the perl script and the >> "before" and "after" versions of the TB2 synopsis. You can find it >> here: >> >> http://www.treebase.org/~piel/taxlabels_fix.zip >> >> You can test it by creating a local database and executing the >> originalsynops.sql file to populate it, then running the >> fixtaxonlabels.pl script on it. It should remove all duplicate >> taxonlabels. The number of records in the taxonlabel table drop >> from 232,766 to 131,357. >> >> You'll need to run it on the treebasedev database because I cannot >> make a backup myself (pg_dump will not work with the username I'm >> using because the auto-sequences are owned by rvos, which the >> treebase_app user cannot access). The fixtaxonlabels.pl script was >> not written using transactions, so you'll definitely want to have a >> pg_dump backup handy. >> >> I think the steps are the following: >> >> 1. create an index for the taxonlabel field: >> >> CREATE INDEX taxonlabel_i ON taxonlabel USING btree (taxonlabel); >> >> 2. make a backup (e.g. with pg_dump) >> >> 3. uncomment lines 26-28 and 142 in the fixtaxonlabels.pl script >> >> 4. run the fixtaxonlabels.pl script, probably with nohup because it >> takes about 6 hours on my macbook. >> >> I think this should work -- at least I have not noticed any glaring >> problems. Course, I can't actually test it on real data, only the >> synopsis. After running this on treebasedev, we should check that >> the database has not been corrupted -- any problems and we just >> drop the database and reload from the pg_dump backup. >> >> I'm not fully clear why we have a sub_taxonlabel table. At any >> rate, I ended up deleting all records in sub_taxonlabel that match >> duplicate records in taxonlabel that require deleting. >> >> take care, >> >> Bill > > ------------------------------------------------------------------------------ > Throughout its 18-year history, RSA Conference consistently attracts > the > world's best and brightest in the field, creating opportunities for > Conference > attendees to learn about information security's most important > issues through > interactions with peers, luminaries and emerging and established > companies. > http://p.sf.net/sfu/rsaconf-dev2dev_______________________________________________ > Treebase-devel mailing list > Tre...@li... > https://lists.sourceforge.net/lists/listinfo/treebase-devel |