|
From: <vga...@us...> - 2010-02-16 21:16:05
|
Revision: 511
http://treebase.svn.sourceforge.net/treebase/?rev=511&view=rev
Author: vgapeyev
Date: 2010-02-16 21:15:56 +0000 (Tue, 16 Feb 2010)
Log Message:
-----------
Documenting the schema patching setup.
Modified Paths:
--------------
trunk/treebase-core/db/README.txt
trunk/treebase-core/db/TODO.txt
trunk/treebase-core/db/versionhistory.sql
Modified: trunk/treebase-core/db/README.txt
===================================================================
--- trunk/treebase-core/db/README.txt 2010-02-16 14:37:32 UTC (rev 510)
+++ trunk/treebase-core/db/README.txt 2010-02-16 21:15:56 UTC (rev 511)
@@ -1,60 +1,90 @@
-[This will contain instructions on how to do and commit patches.
-For now, it's just a message from my email 2010-02-03 --VG]
+This directory keeps track of schema patches applied to the DB as the application evolves,
+as well as occasional snapshots of the schema and pre-loaded dictionary-like data.
-"A setup/discipline for schema changes"
+Outline
+=======
-This brings up something that was on my back burner for a while:
-setting up a workflow and a file/directory structure to maintain
-schema patches. That would be something similar to migration scripts
-in Ruby or Python (but with scripts written in SQL and no automation).
+The setup is inspired by database migrations in Ruby and Python, but is not nearly as comprehensive.
-The idea is that we store in SVN the initial snapshot of the DB schema
-and sequential patches that we have been developing w.r.t
-treebasedev. These patches will be used to bring up-to-date
-treebasestage, and then treebaseprod. After that we can erase the
-patches, refresh the snapshot, and continue for another cycle.
-Official TB2 code releases are to be declared at the start of a cycle,
-when there are no patches yet. The snapshot is what should be used to
-create a fresh TB2 instance by someone who wants to run their own.
+The 'patches' directory stores SQL scripts (nnnn_descr_label.sql) that move the schema
+from one version to the next. Occasionally, we put into the 'snaphots' subdirectory
+nnnn_SCHEMA...sql and nnnn_DATA...sql that capture the state of the DB after the
+patches/nnnn.sql patch has been applied.
-Patches are to be sequentially numbered (by hand). To keep track
-which DB instance has already been brought up to which patch version,
-I suggest creating a special table with a number column. Each patch
-will have an UPDATE setting the number in the table to the patch's
-number.
+Patches are sequentially numbered (by hand). If a snapshot nnnn is present, it reflects
+the state of the DB after patch nnnn was applied.
-If this sounds ok, I hope to set this up by end of tomorrow.
+The schema (staring with the very first 0000 snapshot) contains the 'versionhistory' table,
+which is used to keep track which patches have already been applied to a particular DB instance.
+This is particularly useful in multi-instance installations (e.g., when there are
+development, staging, production, etc., instances). [See versionhistory.sql for the table's definition.]
-Some more details:
+Task-specific instructions
+==========================
-* Patch naming: 0001_short_name.sql
-* Each patch starts by incrementing the number in the version table.
-* Each patch will be executed as a transaction, but there is no need to insert transaction commands into the patch.
-* Patches and snapshots must not include any ownership or permissions commands.
-* Version table schema:
-CREATE TABLE versionhistory
-(
- id serial NOT NULL,
- patchnumber integer,
- patchlabel character varying(63) NOT NULL,
- patchdescription character varying(1023),
- applied timestamp (0) without time zone DEFAULT now(),
- CONSTRAINT versionhistory_pkey PRIMARY KEY (id),
- CONSTRAINT versionhistory_unique_patchnumber UNIQUE (patchnumber)
-)
-COMMENT ON TABLE versionhistory IS 'VersionHistory table is NOT a part of the TreeBase application. It is only used within the development and deployment process to keep track of schema patches applied to a DB instance. ';
+Creating a patch
+----------------
+A patch is generally committed into SVN together with Java code for the application.
+It is expected that, for a given SVN version, the application code expects to work
+w.r.t. the database containing all the patches from this version.
-To get a current schema snapshot
-================================
+Use the next sequential number and a short descriptive label to name a patch.
+To avoid numbering conflicts, developers must coordinate their patching activity.
-pg_dump -h treebase-dev.nescent.org -U treebase_app --format=p --no-owner --no-privileges --schema-only treebasestage > schema_snapshot_treebasestage_2010-02-01.sql
+Each patch must start by an INSERT of a row describing the patch into the versionhistory table.
+(See existing patches for examples.)
+Each patch will be executed as a transaction, so there is no need to insert
+transaction commands into the patch.
-To restore the schema into a new DB:
-====================================
+Patches must not include any ownership or permissions commands: ownership and
+permissions are installation-specific, and therefore should be handled by the
+instance maintainer.
+When an error is discovered in a recent patch, it may be ok to fix it and commit
+a new version under the same patch number, provided it is possible to adjust all DB
+instances to conform with the new patch version. This can be doable when the patch
+was only applied to the development DB instance. When the patch with an error was already
+applied to production databases, it could be more prudent to develop a new, error-fixing patch.
+
+
+Applying a patch
+----------------
+
+Apply each patch within a transaction, e.g.,
+psql -d yourdb -U yourusername
+yourdb=> begin transaction;
+yourdb=> \i nnnn_your_patch.sql
+yourdb=> commit; OR rollback;
+
+After that, adjust ownership and permissions on newly created objects, if any,
+as required by the DB instance.
+
+Use your judgment on what to do in case the patch application generated errors
+and you had to roll back. Most likely, someone has to work on improving the patch.
+
+
+Creating a current snapshot
+---------------------------
+
+Schema snapshots must not include any ownership or permissions commands, since these
+are installation-specific. This command,
+
+pg_dump -h your.host.url -U your_username --format=p --no-owner --no-privileges --schema-only yourdb > nnnn_your_label.sql
+
+creates a reasonably lean dump of the schema. However, if the DB contained any objects
+not created by prior snapshots and patches, they should be removed by hand.
+
+Data snapshot is trickier. The best bet right now is probably by hand-modifying the previous data snapshot.
+
+
+Creating a fresh DB
+------------------
+
+Use the most recent snapshot:
+
psql -d yourdb -U yourusername -f nnn_SCHEMA_xxx.sql
psql -d yourdb -U yourusername -f nnn_DATA_xxx.sql
@@ -64,4 +94,35 @@
yourdb=> begin transaction;
yourdb=> \i nnn_SCHEMA_xxx.sql
yourdb=> \i nnn_DATA_xxx.sql
-yourdb=> end transaction
+yourdb=> commit;
+
+
+Recommended development - production - staging workflow
+=======================================================
+
+Local instances: It is recommended that each developer has his own DB instance to use for writing
+and initially testing new versions of the application and schema patches. This ensures
+at least some basic compatibility between SQL and Java within a given SVN commit.
+These instances may contain only minimal data, as useful for the developer.
+
+Development instance: The dev DB instance is frequently re-built to track
+the most recent SVN version of the application and DB patches. It is used to verify correctness
+of new versions and to communicate with non-programmer project participants.
+This instance should contain sizable and representative amount of data,
+but not necessarily as much or as good quality as the production instance.
+
+Staging instance: The stage instance is a testbed for rolling out a new version on production.
+When development on dev reaches release, a staging instance is created by cloning the
+current production instance. This ensures that the possible new data quirks accumulated
+since the previous release are present. Then patches spanning from the previous to the current
+release are applied, new version of the application is installed, and the whole system is tested
+and verified as necessary. If all works well, then the release roll-out may proceed on production.
+Otherwise, back to development.
+
+Production instance: This is the instance with the authoritative data that is constantly being
+accessed by external uses and can tolerate only brief periods of downtime. Make a backup before
+rolling out a new release and only roll out releases that were successfully verified on staging.
+After a new release roll-out, it makes sense to use the prod instance to create a new
+schema-and-dictionaries snapshot.
+
+
\ No newline at end of file
Modified: trunk/treebase-core/db/TODO.txt
===================================================================
--- trunk/treebase-core/db/TODO.txt 2010-02-16 14:37:32 UTC (rev 510)
+++ trunk/treebase-core/db/TODO.txt 2010-02-16 21:15:56 UTC (rev 511)
@@ -1,43 +1,29 @@
Pending framework issues
========================
-* Prior to taking the 1st schema snapshot (from treestage or treedev?):
- - Create table versionhistory
-
-* Develop a script for getting a current snapshot of the initial (dictionary) data,
+* Would like a script for getting a current snapshot of the initial (dictionary) data,
to be used alongside with the schema snapshot command.
+ [A problem: simply designating some tables as "dictionaries" and others as "facts" would not work --
+ at least some tables (Phylochar, DiscreteCharState) contains both "dictionary" and "fact" records.]
+
+* Automation scripts for tasks, especially patch application, so that the sysadmin does not forget to wrap a transaction.
+
+* A better way for handling ownership and permissions in particular DB instances -- there is
+ too much manual work at the moment.
+
+* Less-intrusive permissions/ownership management. Currently, ownerships and permissions are handled by pg_owner, pg_admin functions that the
+ sysadmin creates in DB after the initial installation. Consequently, they get into the schema snapshops and have to be removed by hand.
Pending schema patches:
======================
-[To be put in as patches after the storage structure for them is in place in this directory -- VG 2010-02-04]:
+* Maybe some indices around matrices (in addition to the indices around trees that are already there).
-* Give explicit names to all constraints
-* Drop geospot_id_sequence -- there is no corresponding table
+Pending data cleanups (on staging):
+===================================
-* Correct PK name in stepmatrixelement and create stepmatrixelement_id_sequence
-
-* Create indexes, as suggested by Bill 2010-02-03 (and maybe later)
- Table Field
- -------- -------
- taxon name
- taxonvariant fullname
- taxonlabel taxonlabel
- citation title
- citation abstract
- person lastname
- phylotreenode taxonlabel_id, phylotree_id
- See:
- - Bill's message of 2010-02-03:
- - https://sourceforge.net/tracker/?func=detail&atid=1126676&aid=2945391&group_id=248804,
- - Also, compare to the end-of-Jan schema snapshot of treebasedev
-
-
-Pending data cleanups (mostly, on staging):
-===========================================
-
* Remove junk, particularly that created by tests
* Known junk: Submission 22 and its related records
Modified: trunk/treebase-core/db/versionhistory.sql
===================================================================
--- trunk/treebase-core/db/versionhistory.sql 2010-02-16 14:37:32 UTC (rev 510)
+++ trunk/treebase-core/db/versionhistory.sql 2010-02-16 21:15:56 UTC (rev 511)
@@ -1,3 +1,6 @@
+-- Table for tracking which schema patches have been applied to the current DB instance.
+-- This table is already in the snapshot 0000_SCHEMA_before_patches_start.sql
+
CREATE TABLE versionhistory
(
id serial NOT NULL,
This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site.
|