From: Rutger V. <rut...@gm...> - 2009-06-12 11:04:51
|
Hi, we are nearly at the point of being able to make gzip-compressed archives of insert statements for all tables available for download from SDSC (i.e. data dumps). Mark and I have been working on some scripts that fetch all the data for all tables, reformat it to inserts and pipe it into gzip. I ran some tests on a local pg install and the data imports correctly. However, I haven't been able to test this for all tables because I couldn't run the dumper to completion on gigahertz.sdsc.edu - because of disk quota overruns. Hence, one action item is to ask Mark Miller if we can temporarily get some more space to dump these archives (if I recall correctly he promised that would be possible). Assuming we can serve it up at SDSC the next step is to get it to NESCent, and this may entail a download of dozens of gigabytes. Not an impossibility, but maybe we want to serve up checksum files (md5) as well so we can check whether everything arrived correctly. The second thing that needs to happen is to port the schema from db2's version of SQL to postgresql's. Attached is a script that does that: the lines prefixed with --'s are the commands to create a db2 instance, the uncommented lines next to the equivalent blocks do the same for pg. I've had to make some choices: * pg likes lower case, so all table names and column names have become lower case (whereas they are all upper on DB2) * USER (or user) is a reserved word for pg, so it always has to be quoted "user" (same for "type", "version", "name", "input", "value", "nchar" and "password") * DOUBLE is called DOUBLE PRECISION on pg * CLOBs are mapped onto TEXT * the big integers that are used for identifiers are created as follows on db2: "BIGINT GENERATED BY DEFAULT AS IDENTITY (START WITH 1, INCREMENT BY 1, CACHE 20) NOT NULL," ...but in the pg version they are simply BIGINT NOT NULL (I'm sure we can improve on this.). Other than these issues pg quite happily accepted the same SQL for creating indices, primary and foreign keys as needed for db2 instances. The third thing that needs to happen is to test the treebase hibernate code against pg. Given the case changes (table and column names have gone from uppercase to lowercase), maybe some of the mappings (java annotations) need to be changed. The treebase code is pretty well-behaved (no raw sql) so beyond those annotations there's probably not that much work to do inside the java code. Rutger -- Dr. Rutger A. Vos Department of zoology University of British Columbia http://www.nexml.org http://rutgervos.blogspot.com |