[Treebase-guts] SF.net SVN: treebase:[511] trunk/treebase-core/db

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Revision: 511
          http://treebase.svn.sourceforge.net/treebase/?rev=511&view=rev
Author:   vgapeyev
Date:     2010-02-16 21:15:56 +0000 (Tue, 16 Feb 2010)

Log Message:
-----------
Documenting the schema patching setup. 

Modified Paths:
--------------
    trunk/treebase-core/db/README.txt
    trunk/treebase-core/db/TODO.txt
    trunk/treebase-core/db/versionhistory.sql

Modified: trunk/treebase-core/db/README.txt
===================================================================

--- trunk/treebase-core/db/README.txt	2010-02-16 14:37:32 UTC (rev 510)
+++ trunk/treebase-core/db/README.txt	2010-02-16 21:15:56 UTC (rev 511)
@@ -1,60 +1,90 @@
-[This will contain instructions on how to do and commit patches. 
-For now, it's just a message from my email 2010-02-03  --VG]
+This directory keeps track of schema patches applied to the DB as the application evolves, 
+as well as occasional snapshots of the schema and pre-loaded dictionary-like data. 
 
-"A setup/discipline for schema changes"
+Outline
+======= 
 
-This brings up something that was on my back burner for a while:  
-setting up a workflow and a file/directory structure to maintain  
-schema patches. That would be something similar to migration scripts  
-in Ruby or Python (but with scripts written in SQL and no automation).
+The setup is inspired by database migrations in Ruby and Python, but is not nearly as comprehensive. 
 
-The idea is that we store in SVN the initial snapshot of the DB schema  
-and sequential patches that we have been developing w.r.t  
-treebasedev.  These patches will be used to bring up-to-date  
-treebasestage, and then treebaseprod.   After that we can erase the  
-patches, refresh the snapshot, and continue for another cycle.   
-Official TB2 code releases are to be declared at the start of a cycle,  
-when there are no patches yet.  The snapshot is what should be used to  
-create a fresh TB2 instance by someone who wants to run their own. 
+The 'patches' directory stores SQL scripts (nnnn_descr_label.sql) that move the schema 
+from one version to the next.  Occasionally, we put into the 'snaphots' subdirectory 
+nnnn_SCHEMA...sql and nnnn_DATA...sql that capture the state of the DB after the 
+patches/nnnn.sql patch has been applied.
 
-Patches are to be sequentially numbered (by hand).  To keep track  
-which DB instance has already been brought up to which patch version,  
-I suggest creating a special table with a number column.  Each patch  
-will have an UPDATE setting the number in the table to the patch's  
-number.
+Patches are sequentially numbered (by hand). If a snapshot nnnn is present, it reflects 
+the state of the DB after patch nnnn was applied. 
 
-If this sounds ok, I hope to set this up by end of tomorrow.
+The schema (staring with the very first 0000 snapshot) contains the 'versionhistory' table, 
+which is used to keep track which patches have already been applied to a particular DB instance. 
+This is particularly useful in multi-instance installations (e.g., when there are 
+development, staging, production, etc., instances).  [See versionhistory.sql for the table's definition.]
 
 
-Some more details: 
+Task-specific instructions
+==========================
 
-* Patch naming: 0001_short_name.sql
-* Each patch starts by incrementing the number in the version table.
-* Each patch will be executed as a transaction, but there is no need to insert transaction commands into the patch.  
-* Patches and snapshots must not include any ownership or permissions commands.
-* Version table schema: 
-CREATE TABLE versionhistory
-(
-  id serial NOT NULL,
-  patchnumber integer,
-  patchlabel character varying(63) NOT NULL,
-  patchdescription character varying(1023),
-  applied timestamp (0) without time zone DEFAULT now(),
-  CONSTRAINT versionhistory_pkey PRIMARY KEY (id),
-  CONSTRAINT versionhistory_unique_patchnumber UNIQUE (patchnumber)
-)
-COMMENT ON TABLE versionhistory IS 'VersionHistory table is NOT a part of the TreeBase application. It is only used within the development and deployment process to keep track of schema patches applied to a DB instance. ';
+Creating a patch 
+----------------
 
+A patch is generally committed into SVN together with Java code for the application.  
+It is expected that, for a given SVN version, the application code expects to work
+w.r.t. the database containing all the patches from this version.  
 
-To get a current schema snapshot
-================================
+Use the next sequential number and a short descriptive label to name a patch. 
+To avoid numbering conflicts, developers must coordinate their patching activity. 
 
-pg_dump -h treebase-dev.nescent.org -U treebase_app --format=p --no-owner --no-privileges --schema-only  treebasestage > schema_snapshot_treebasestage_2010-02-01.sql
+Each patch must start by an INSERT of a row describing the patch into the versionhistory table.  
+(See existing patches for examples.)
 
+Each patch will be executed as a transaction, so there is no need to insert 
+transaction commands into the patch.  
 
-To restore the schema into a new DB: 
-====================================
+Patches must not include any ownership or permissions commands: ownership and 
+permissions are installation-specific, and therefore should be handled by the 
+instance maintainer.
 
+When an error is discovered in a recent patch, it may be ok to fix it and commit 
+a new version under the same patch number, provided it is possible to adjust all DB 
+instances to conform with the new patch version.  This can be doable when the patch
+was only applied to the development DB instance.  When the patch with an error was already 
+applied to production databases, it could be more prudent to develop a new, error-fixing patch. 
+
+
+Applying a patch
+----------------
+
+Apply each patch within a transaction, e.g.,  
+psql -d yourdb -U yourusername
+yourdb=> begin transaction;
+yourdb=> \i nnnn_your_patch.sql
+yourdb=> commit;  OR rollback;
+
+After that, adjust ownership and permissions on newly created objects, if any, 
+as required by the DB instance.  
+
+Use your judgment on what to do in case the patch application generated errors 
+and you had to roll back.  Most likely, someone has to work on improving the patch. 
+
+
+Creating a current snapshot
+---------------------------
+
+Schema snapshots must not include any ownership or permissions commands, since these 
+are installation-specific.  This command,  
+
+pg_dump -h your.host.url -U your_username --format=p --no-owner --no-privileges --schema-only  yourdb >  nnnn_your_label.sql
+
+creates a reasonably lean dump of the schema.  However, if the DB contained any objects 
+not created by prior snapshots and patches, they should be removed by hand. 
+
+Data snapshot is trickier.  The best bet right now is probably by hand-modifying the previous data snapshot. 
+
+
+Creating a fresh DB
+------------------
+
+Use the most recent snapshot: 
+
 psql -d yourdb -U yourusername -f nnn_SCHEMA_xxx.sql
 psql -d yourdb -U yourusername -f nnn_DATA_xxx.sql
 
@@ -64,4 +94,35 @@
 yourdb=> begin transaction;
 yourdb=> \i nnn_SCHEMA_xxx.sql
 yourdb=> \i nnn_DATA_xxx.sql
-yourdb=> end transaction 
+yourdb=> commit;  
+
+
+Recommended development - production - staging workflow 
+=======================================================
+
+Local instances:  It is recommended that each developer has his own DB instance to use for writing 
+and initially testing new versions of the application and schema patches. This ensures 
+at least some basic compatibility between SQL and Java within a given SVN commit. 
+These instances may contain only minimal data, as useful for the developer. 
+
+Development instance:  The dev DB instance is frequently re-built to track 
+the most recent SVN version of the application and DB patches. It is used to verify correctness 
+of new versions and to communicate with non-programmer project participants. 
+This instance should contain sizable and representative amount of data, 
+but not necessarily as much or as good quality as the production instance. 
+
+Staging instance:  The stage instance is a testbed for rolling out a new version on production.  
+When development on dev reaches  release, a staging instance is created by cloning the 
+current production instance.  This ensures that the possible new data quirks accumulated 
+since the previous release are present.  Then patches spanning from the previous to the current 
+release are applied, new version of the application is installed, and the whole system is tested
+and verified as necessary.  If all works well, then the release roll-out may proceed on production. 
+Otherwise, back to development.    
+
+Production instance: This is the instance with the authoritative data that is constantly being 
+accessed by external uses and can tolerate only brief periods of downtime.  Make a backup before
+rolling out a new release and only roll out releases that were successfully verified on staging. 
+After a new release roll-out, it makes sense to use the prod instance to create a new 
+schema-and-dictionaries snapshot. 
+
+  
\ No newline at end of file

Modified: trunk/treebase-core/db/TODO.txt
===================================================================
--- trunk/treebase-core/db/TODO.txt	2010-02-16 14:37:32 UTC (rev 510)
+++ trunk/treebase-core/db/TODO.txt	2010-02-16 21:15:56 UTC (rev 511)
@@ -1,43 +1,29 @@
 Pending framework issues 
 ========================
 
-* Prior to taking the 1st schema snapshot (from treestage or treedev?): 
-  - Create table versionhistory
-
-* Develop a script for getting a current snapshot of the initial (dictionary) data, 
+* Would like a script for getting a current snapshot of the initial (dictionary) data, 
   to be used alongside with the schema snapshot command. 
+  [A problem: simply designating some tables as "dictionaries" and others as "facts" would not work -- 
+   at least some tables (Phylochar, DiscreteCharState) contains both "dictionary" and "fact" records.]
+   
+* Automation scripts for tasks, especially patch application, so that the sysadmin does not forget to wrap a transaction. 
+  
+* A better way for handling ownership and permissions in particular DB instances -- there is 
+  too much manual work at the moment.
+  
+* Less-intrusive permissions/ownership management.  Currently, ownerships and permissions are handled by pg_owner, pg_admin functions that the 
+  sysadmin creates in DB after the initial installation.  Consequently, they get into the schema snapshops and have to be removed by hand.
 
 
 Pending schema patches: 
 ======================
 
-[To be put in as patches after the storage structure for them is in place in this directory -- VG 2010-02-04]:
+* Maybe some indices around matrices (in addition to the indices around trees that are already there). 
 
-* Give explicit names to all constraints
 
-* Drop geospot_id_sequence -- there is no corresponding table
+Pending data cleanups (on staging):
+===================================
 
-* Correct PK name in stepmatrixelement and create stepmatrixelement_id_sequence
-
-* Create indexes, as suggested by Bill 2010-02-03 (and maybe later)
-        Table				Field
-        --------			-------
-        taxon				name
-        taxonvariant		fullname
-        taxonlabel			taxonlabel
-        citation			title
-        citation			abstract
-        person				lastname
-        phylotreenode    	taxonlabel_id, phylotree_id 
-    See: 
-      -  Bill's message of 2010-02-03:    
-      -  https://sourceforge.net/tracker/?func=detail&atid=1126676&aid=2945391&group_id=248804, 
-      -  Also, compare to the end-of-Jan schema snapshot of treebasedev   
-
-
-Pending data cleanups (mostly, on staging):
-===========================================
-
 * Remove junk, particularly that created by tests
 
 * Known junk: Submission 22 and its related records

Modified: trunk/treebase-core/db/versionhistory.sql
===================================================================
--- trunk/treebase-core/db/versionhistory.sql	2010-02-16 14:37:32 UTC (rev 510)
+++ trunk/treebase-core/db/versionhistory.sql	2010-02-16 21:15:56 UTC (rev 511)
@@ -1,3 +1,6 @@
+-- Table for tracking which schema patches have been applied to the current DB instance. 
+-- This table is already in the snapshot 0000_SCHEMA_before_patches_start.sql 
+
 CREATE TABLE versionhistory
 (
   id serial NOT NULL,


This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site.