From: Rutger V. <rut...@gm...> - 2009-06-12 11:04:51
Attachments:
DB2_CREATE_TBASE2_JRUAN.sql
|
Hi, we are nearly at the point of being able to make gzip-compressed archives of insert statements for all tables available for download from SDSC (i.e. data dumps). Mark and I have been working on some scripts that fetch all the data for all tables, reformat it to inserts and pipe it into gzip. I ran some tests on a local pg install and the data imports correctly. However, I haven't been able to test this for all tables because I couldn't run the dumper to completion on gigahertz.sdsc.edu - because of disk quota overruns. Hence, one action item is to ask Mark Miller if we can temporarily get some more space to dump these archives (if I recall correctly he promised that would be possible). Assuming we can serve it up at SDSC the next step is to get it to NESCent, and this may entail a download of dozens of gigabytes. Not an impossibility, but maybe we want to serve up checksum files (md5) as well so we can check whether everything arrived correctly. The second thing that needs to happen is to port the schema from db2's version of SQL to postgresql's. Attached is a script that does that: the lines prefixed with --'s are the commands to create a db2 instance, the uncommented lines next to the equivalent blocks do the same for pg. I've had to make some choices: * pg likes lower case, so all table names and column names have become lower case (whereas they are all upper on DB2) * USER (or user) is a reserved word for pg, so it always has to be quoted "user" (same for "type", "version", "name", "input", "value", "nchar" and "password") * DOUBLE is called DOUBLE PRECISION on pg * CLOBs are mapped onto TEXT * the big integers that are used for identifiers are created as follows on db2: "BIGINT GENERATED BY DEFAULT AS IDENTITY (START WITH 1, INCREMENT BY 1, CACHE 20) NOT NULL," ...but in the pg version they are simply BIGINT NOT NULL (I'm sure we can improve on this.). Other than these issues pg quite happily accepted the same SQL for creating indices, primary and foreign keys as needed for db2 instances. The third thing that needs to happen is to test the treebase hibernate code against pg. Given the case changes (table and column names have gone from uppercase to lowercase), maybe some of the mappings (java annotations) need to be changed. The treebase code is pretty well-behaved (no raw sql) so beyond those annotations there's probably not that much work to do inside the java code. Rutger -- Dr. Rutger A. Vos Department of zoology University of British Columbia http://www.nexml.org http://rutgervos.blogspot.com |
From: Mark D. <mj...@ge...> - 2009-06-12 13:30:54
|
Rutger Vos wrote: > Hence, one action item is to ask Mark Miller > if we can temporarily get some more space to dump these archives (if I > recall correctly he promised that would be possible). I requested this on 8 June, and sent a followup yesterday. If I don't hear back by midday Monday I will get Mark Miller involved. > > Assuming we can serve it up at SDSC the next step is to get it to > NESCent, and this may entail a download of dozens of gigabytes. Unlikely. The whole thing is only 27GB uncompressed, and it seems to compress at better than 90%. |
From: Val T. <va...@ci...> - 2009-06-12 15:09:24
|
> Rutger writes: > annotations) need to be changed. The treebase code is pretty > well-behaved (no raw sql) Are you sure about this? I was under the impression that Jin had added some optimizations that use SQL directly. Val |
From: Rutger V. <rut...@gm...> - 2009-06-12 20:11:58
|
I've never come across those optimizations, so I'm not sure. Raw HQL, yes, but SQL, haven't seen it. I might be missing something though; Mark, do you know what Val is talking about? On Fri, Jun 12, 2009 at 8:09 AM, Val Tannen<va...@ci...> wrote: >> > > Rutger writes: > >> annotations) need to be changed. The treebase code is pretty >> well-behaved (no raw sql) > > Are you sure about this? I was under the impression that Jin > had added some optimizations that use SQL directly. > Val > > > > ------------------------------------------------------------------------------ > Crystal Reports - New Free Runtime and 30 Day Trial > Check out the new simplified licensing option that enables unlimited > royalty-free distribution of the report engine for externally facing > server and web deployment. > http://p.sf.net/sfu/businessobjects > _______________________________________________ > Treebase-devel mailing list > Tre...@li... > https://lists.sourceforge.net/lists/listinfo/treebase-devel > -- Dr. Rutger A. Vos Department of zoology University of British Columbia http://www.nexml.org http://rutgervos.blogspot.com |
From: William P. <wil...@ya...> - 2009-06-12 20:33:50
|
On Jun 12, 2009, at 4:11 PM, Rutger Vos wrote: > I've never come across those optimizations, so I'm not sure. Raw HQL, > yes, but SQL, haven't seen it. I might be missing something though; > Mark, do you know what Val is talking about? I seem to remember that the inserting of character records (after parsing a nexus file) was a big choke point, and that Jin had to "bypass" Hibernate to increase speed. I don't know whether this means "raw HQL" or passing SQL via JDBC... bp |
From: Mark D. <mj...@ge...> - 2009-06-15 15:39:17
|
Rutger Vos wrote: > Mark, do you know what Val is talking about? Check out org.cipres.treebase.dao.jdbc. |