[Treebase-devel] DB2 => PG migration status

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hi,

we are nearly at the point of being able to make gzip-compressed
archives of insert statements for all tables available for download
from SDSC (i.e. data dumps). Mark and I have been working on some
scripts that fetch all the data for all tables, reformat it to inserts
and pipe it into gzip. I ran some tests on a local pg install and the
data imports correctly.

However, I haven't been able to test this for all tables because I
couldn't run the dumper to completion on gigahertz.sdsc.edu - because
of disk quota overruns. Hence, one action item is to ask Mark Miller
if we can temporarily get some more space to dump these archives (if I
recall correctly he promised that would be possible).

Assuming we can serve it up at SDSC the next step is to get it to
NESCent, and this may entail a download of dozens of gigabytes. Not an
impossibility, but maybe we want to serve up checksum files (md5) as
well so we can check whether everything arrived correctly.

The second thing that needs to happen is to port the schema from db2's
version of SQL to postgresql's. Attached is a script that does that:
the lines prefixed with --'s are the commands to create a db2
instance, the uncommented lines next to the equivalent blocks do the
same for pg. I've had to make some choices:

* pg likes lower case, so all table names and column names have become
lower case (whereas they are all upper on DB2)
* USER (or user) is a reserved word for pg, so it always has to be
quoted "user" (same for "type", "version", "name", "input", "value",
"nchar" and "password")
* DOUBLE is called DOUBLE PRECISION on pg
* CLOBs are mapped onto TEXT
* the big integers that are used for identifiers are created as follows on db2:

"BIGINT GENERATED BY DEFAULT AS IDENTITY (START WITH 1, INCREMENT BY
1, CACHE 20) NOT NULL,"

...but in the pg version they are simply BIGINT NOT NULL (I'm sure we
can improve on this.). Other than these issues pg quite happily
accepted the same SQL for creating indices, primary and foreign keys
as needed for db2 instances.

The third thing that needs to happen is to test the treebase hibernate
code against pg. Given the case changes (table and column names have
gone from uppercase to lowercase), maybe some of the mappings (java
annotations) need to be changed. The treebase code is pretty
well-behaved (no raw sql) so beyond those annotations there's probably
not that much work to do inside the java code.

Rutger

-- 
Dr. Rutger A. Vos
Department of zoology
University of British Columbia
http://www.nexml.org
http://rutgervos.blogspot.com