From: Rutger V. <rut...@gm...> - 2009-05-27 17:07:21
|
Hi, Mark-Jason and I just met to discuss how to transfer data from DB2 (sdsc) to pg (nescent). We considered the following options: 1. DB2 uses an export file format called IXF, which pg doesn't support. There are different versions of IXF, googling for tools that might convert/import IXF only turns up things that look dodgy. An unattractive possibility is that we write our own IXF parser. 2. Perhaps Hibernate has a facility to dump a database's contents to a format that can be imported (say, select statements). You'd think that Hibernate would have this, but we haven't found it. 3. Write our own exporter. Mark-Jason did some preliminary work on a java-based "dump table to delimited file" exporter. Since we can run DBD::DB2 on gigahertz.sdsc.edu we might do something along those lines in perl. We note Hilmar's remarks about the dangers with this: need to keep track of character encodings and escape sequences. I am now investigating option 2. We may have to open door 3, but we'd like to hear your suggestions/remarks. Rutger -- Dr. Rutger A. Vos Department of zoology University of British Columbia http://www.nexml.org http://rutgervos.blogspot.com |
From: Mark D. <mj...@ge...> - 2009-05-27 17:28:25
|
I'll re-send the earlier messages that I sent to the list on this subject. |
From: William P. <wil...@ya...> - 2009-05-27 17:31:36
|
What do you think of this tool: http://www.sobolsoft.com/postgresqldb2/ Too dodgy? But for only $30, perhaps it's worth giving it a shot. Bill On May 27, 2009, at 1:07 PM, Rutger Vos wrote: > Hi, > > Mark-Jason and I just met to discuss how to transfer data from DB2 > (sdsc) to pg (nescent). We considered the following options: > > 1. DB2 uses an export file format called IXF, which pg doesn't > support. There are different versions of IXF, googling for tools that > might convert/import IXF only turns up things that look dodgy. An > unattractive possibility is that we write our own IXF parser. > > 2. Perhaps Hibernate has a facility to dump a database's contents to a > format that can be imported (say, select statements). You'd think that > Hibernate would have this, but we haven't found it. > > 3. Write our own exporter. Mark-Jason did some preliminary work on a > java-based "dump table to delimited file" exporter. Since we can run > DBD::DB2 on gigahertz.sdsc.edu we might do something along those lines > in perl. We note Hilmar's remarks about the dangers with this: need to > keep track of character encodings and escape sequences. > > I am now investigating option 2. We may have to open door 3, but we'd > like to hear your suggestions/remarks. > > Rutger > > -- > Dr. Rutger A. Vos > Department of zoology > University of British Columbia > http://www.nexml.org > http://rutgervos.blogspot.com > > ------------------------------------------------------------------------------ > Register Now for Creativity and Technology (CaT), June 3rd, NYC. CaT > is a gathering of tech-side developers & brand creativity > professionals. Meet > the minds behind Google Creative Lab, Visual Complexity, Processing, & > iPhoneDevCamp as they present alongside digital heavyweights like > Barbarian > Group, R/GA, & Big Spaceship. http://p.sf.net/sfu/creativitycat-com > _______________________________________________ > Treebase-devel mailing list > Tre...@li... > https://lists.sourceforge.net/lists/listinfo/treebase-devel |
From: Jon A. <jon...@du...> - 2009-05-27 17:50:11
|
The sobolsoft software looks worth a try to me. It follows the spirit of Mark's suggestion, that is, transfer the data across the wire without creating intermediate files. However, it is Windows only and my gut feeling is that it will blow up on the very large tables. Then again, maybe not. I found another solution that could work called RazorSQL (http://www.razorsql.com/features/db2_export.html ). Though it has a gui, it also has a command line interface which we could easily pipe the output into a psql statement to import directly into a PostgreSQL database (http://www.razorsql.com/docs/command_line.html ). Something like: java -Xms512M -Xmx2048M -jar razorsql.jar -export "[Treebase_DB2_db]" "select * from table1" sql | psql -Utreebase -hdarwin.nescent.org -d treebase-dev -f - I can either give someone access to a database on our Postgresql server, or I can take a stab at from this end if it is possible to get read only access to the DB2 database. -Jon ------------------------------------------------------- Jon Auman Systems Administrator National Evolutionary Synthesis Center Duke University http:www.nescent.org jon...@ne... ------------------------------------------------------ On May 27, 2009, at 1:31 PM, William Piel wrote: > > What do you think of this tool: http://www.sobolsoft.com/ > postgresqldb2/ > > Too dodgy? But for only $30, perhaps it's worth giving it a shot. > > Bill > > > > On May 27, 2009, at 1:07 PM, Rutger Vos wrote: > >> Hi, >> >> Mark-Jason and I just met to discuss how to transfer data from DB2 >> (sdsc) to pg (nescent). We considered the following options: >> >> 1. DB2 uses an export file format called IXF, which pg doesn't >> support. There are different versions of IXF, googling for tools that >> might convert/import IXF only turns up things that look dodgy. An >> unattractive possibility is that we write our own IXF parser. >> >> 2. Perhaps Hibernate has a facility to dump a database's contents >> to a >> format that can be imported (say, select statements). You'd think >> that >> Hibernate would have this, but we haven't found it. >> >> 3. Write our own exporter. Mark-Jason did some preliminary work on a >> java-based "dump table to delimited file" exporter. Since we can run >> DBD::DB2 on gigahertz.sdsc.edu we might do something along those >> lines >> in perl. We note Hilmar's remarks about the dangers with this: need >> to >> keep track of character encodings and escape sequences. >> >> I am now investigating option 2. We may have to open door 3, but we'd >> like to hear your suggestions/remarks. >> >> Rutger >> >> -- >> Dr. Rutger A. Vos >> Department of zoology >> University of British Columbia >> http://www.nexml.org >> http://rutgervos.blogspot.com >> >> ------------------------------------------------------------------------------ >> Register Now for Creativity and Technology (CaT), June 3rd, NYC. CaT >> is a gathering of tech-side developers & brand creativity >> professionals. Meet >> the minds behind Google Creative Lab, Visual Complexity, >> Processing, & >> iPhoneDevCamp as they present alongside digital heavyweights like >> Barbarian >> Group, R/GA, & Big Spaceship. http://p.sf.net/sfu/creativitycat-com >> _______________________________________________ >> Treebase-devel mailing list >> Tre...@li... >> https://lists.sourceforge.net/lists/listinfo/treebase-devel > > > > > > ------------------------------------------------------------------------------ > Register Now for Creativity and Technology (CaT), June 3rd, NYC. CaT > is a gathering of tech-side developers & brand creativity > professionals. Meet > the minds behind Google Creative Lab, Visual Complexity, Processing, & > iPhoneDevCamp as they present alongside digital heavyweights like > Barbarian > Group, R/GA, & Big Spaceship. http://p.sf.net/sfu/creativitycat-com > _______________________________________________ > Treebase-devel mailing list > Tre...@li... > https://lists.sourceforge.net/lists/listinfo/treebase-devel |
From: Hilmar L. <hl...@du...> - 2009-05-27 18:17:28
|
Feel free to override me on this, but I wouldn't spend time or efforts on an approach that involves sucking the data over the wire at the same time it is being loaded. Given the size of the database, this will have poor or no repeatability, and a high likelihood of crashing or data corruption. Keep in mind that optimistically we are trying to transfer about 50GB; at a throughput of 1Mbit this will take 4 days and 18 hours, and I'm pretty sure that neither JDBC nor DBI::Pg have 1Mbit throughput, let alone the network between here and SDSC. If the networks drops for only one second during that time, the load is toast and you can start over. Conversely, there are a lot of possibilities and tools that deal with speeding up, nicely chunking, and resuming interrupted downloads of plain old files. And for repeatability we need file(s) anyway here. All else failing, the dump can be chunked using unix command line tools and can be shipped on 5 DVDs (costs a few dollars), or on a single hard disk. It's hard to imagine why those options would not work, they're low tech, and we already know the tools, and how to do it. I'm not sure why I would not start with that. Of course, that assumes we have a dump file to begin with. -hilmar On May 27, 2009, at 1:31 PM, William Piel wrote: > > What do you think of this tool: http://www.sobolsoft.com/ > postgresqldb2/ > > Too dodgy? But for only $30, perhaps it's worth giving it a shot. > > Bill > > > > On May 27, 2009, at 1:07 PM, Rutger Vos wrote: > >> Hi, >> >> Mark-Jason and I just met to discuss how to transfer data from DB2 >> (sdsc) to pg (nescent). We considered the following options: >> >> 1. DB2 uses an export file format called IXF, which pg doesn't >> support. There are different versions of IXF, googling for tools that >> might convert/import IXF only turns up things that look dodgy. An >> unattractive possibility is that we write our own IXF parser. >> >> 2. Perhaps Hibernate has a facility to dump a database's contents >> to a >> format that can be imported (say, select statements). You'd think >> that >> Hibernate would have this, but we haven't found it. >> >> 3. Write our own exporter. Mark-Jason did some preliminary work on a >> java-based "dump table to delimited file" exporter. Since we can run >> DBD::DB2 on gigahertz.sdsc.edu we might do something along those >> lines >> in perl. We note Hilmar's remarks about the dangers with this: need >> to >> keep track of character encodings and escape sequences. >> >> I am now investigating option 2. We may have to open door 3, but we'd >> like to hear your suggestions/remarks. >> >> Rutger >> >> -- >> Dr. Rutger A. Vos >> Department of zoology >> University of British Columbia >> http://www.nexml.org >> http://rutgervos.blogspot.com >> >> ------------------------------------------------------------------------------ >> Register Now for Creativity and Technology (CaT), June 3rd, NYC. CaT >> is a gathering of tech-side developers & brand creativity >> professionals. Meet >> the minds behind Google Creative Lab, Visual Complexity, >> Processing, & >> iPhoneDevCamp as they present alongside digital heavyweights like >> Barbarian >> Group, R/GA, & Big Spaceship. http://p.sf.net/sfu/creativitycat-com >> _______________________________________________ >> Treebase-devel mailing list >> Tre...@li... >> https://lists.sourceforge.net/lists/listinfo/treebase-devel > > > > > > ------------------------------------------------------------------------------ > Register Now for Creativity and Technology (CaT), June 3rd, NYC. CaT > is a gathering of tech-side developers & brand creativity > professionals. Meet > the minds behind Google Creative Lab, Visual Complexity, Processing, & > iPhoneDevCamp as they present alongside digital heavyweights like > Barbarian > Group, R/GA, & Big Spaceship. http://p.sf.net/sfu/creativitycat-com > _______________________________________________ > Treebase-devel mailing list > Tre...@li... > https://lists.sourceforge.net/lists/listinfo/treebase-devel -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at duke dot edu : =========================================================== |
From: William P. <wil...@ya...> - 2009-05-27 17:37:45
|
Don't know if you saw this document ("DB2 UDB To PostgreSQL Conversion Guide"): http://wiki.postgresql.org/images/d/d1/DB2UDB-to-PG.pdf bp |
From: Mark D. <mj...@ge...> - 2009-05-27 17:55:11
|
On Wed, 2009-05-27 at 13:37 -0400, William Piel wrote: > Don't know if you saw this document ("DB2 UDB To PostgreSQL Conversion > Guide"): I hadn't. Thanks. |
From: Hilmar L. <hl...@du...> - 2009-05-27 18:06:06
|
That's a useful document but doesn't address data migration based on what I can see. It's going to be a nice reference though for the schema and code migration. -hilmar On May 27, 2009, at 1:37 PM, William Piel wrote: > > Don't know if you saw this document ("DB2 UDB To PostgreSQL Conversion > Guide"): > > http://wiki.postgresql.org/images/d/d1/DB2UDB-to-PG.pdf > > bp > > > > ------------------------------------------------------------------------------ > Register Now for Creativity and Technology (CaT), June 3rd, NYC. CaT > is a gathering of tech-side developers & brand creativity > professionals. Meet > the minds behind Google Creative Lab, Visual Complexity, Processing, & > iPhoneDevCamp as they present alongside digital heavyweights like > Barbarian > Group, R/GA, & Big Spaceship. http://p.sf.net/sfu/creativitycat-com > _______________________________________________ > Treebase-devel mailing list > Tre...@li... > https://lists.sourceforge.net/lists/listinfo/treebase-devel -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at duke dot edu : =========================================================== |
From: Hilmar L. <hl...@du...> - 2009-05-27 18:26:47
|
Hi Mark - the mailing list is archived :-) You can easily refer to a previous email or thread by its URL. (though I agree that can get a bit messy through Sf.net's interface). -hilmar On May 27, 2009, at 1:27 PM, Mark Dominus wrote: > > I'll re-send the earlier messages that I sent to the list on this > subject. > > > > ------------------------------------------------------------------------------ > Register Now for Creativity and Technology (CaT), June 3rd, NYC. CaT > is a gathering of tech-side developers & brand creativity > professionals. Meet > the minds behind Google Creative Lab, Visual Complexity, Processing, & > iPhoneDevCamp as they present alongside digital heavyweights like > Barbarian > Group, R/GA, & Big Spaceship. http://p.sf.net/sfu/creativitycat-com > _______________________________________________ > Treebase-devel mailing list > Tre...@li... > https://lists.sourceforge.net/lists/listinfo/treebase-devel -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at duke dot edu : =========================================================== |
From: Mark D. <mj...@ge...> - 2009-05-27 19:10:57
|
On Wed, 2009-05-27 at 14:25 -0400, Hilmar Lapp wrote: > You can easily refer to a > previous email or thread by its URL. Yes, but they cannot easily be replied to. Since people were repeating the same points that I had already made twice, I hoped to re-inject them into the discussion so that we would not rehash absolutely everything. |
From: Hilmar L. <hl...@du...> - 2009-05-27 18:29:07
|
I would be surprised if Hibernate supports database dumps really well. (It's an ORM framework.) How does the IXF format look like? Is it text-based? Is it column- oriented and delimited? Can you post a small sample? (PostgreSQL supports delimited files for import in various formats) -hilmar On May 27, 2009, at 1:07 PM, Rutger Vos wrote: > Hi, > > Mark-Jason and I just met to discuss how to transfer data from DB2 > (sdsc) to pg (nescent). We considered the following options: > > 1. DB2 uses an export file format called IXF, which pg doesn't > support. There are different versions of IXF, googling for tools that > might convert/import IXF only turns up things that look dodgy. An > unattractive possibility is that we write our own IXF parser. > > 2. Perhaps Hibernate has a facility to dump a database's contents to a > format that can be imported (say, select statements). You'd think that > Hibernate would have this, but we haven't found it. > > 3. Write our own exporter. Mark-Jason did some preliminary work on a > java-based "dump table to delimited file" exporter. Since we can run > DBD::DB2 on gigahertz.sdsc.edu we might do something along those lines > in perl. We note Hilmar's remarks about the dangers with this: need to > keep track of character encodings and escape sequences. > > I am now investigating option 2. We may have to open door 3, but we'd > like to hear your suggestions/remarks. > > Rutger > > -- > Dr. Rutger A. Vos > Department of zoology > University of British Columbia > http://www.nexml.org > http://rutgervos.blogspot.com > > ------------------------------------------------------------------------------ > Register Now for Creativity and Technology (CaT), June 3rd, NYC. CaT > is a gathering of tech-side developers & brand creativity > professionals. Meet > the minds behind Google Creative Lab, Visual Complexity, Processing, & > iPhoneDevCamp as they present alongside digital heavyweights like > Barbarian > Group, R/GA, & Big Spaceship. http://p.sf.net/sfu/creativitycat-com > _______________________________________________ > Treebase-devel mailing list > Tre...@li... > https://lists.sourceforge.net/lists/listinfo/treebase-devel -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at duke dot edu : =========================================================== |
From: Mark D. <mj...@ge...> - 2009-05-27 19:06:14
Attachments:
DUMP.ixf
|
On Wed, 2009-05-27 at 14:28 -0400, Hilmar Lapp wrote: > How does the IXF format look like? Is it text-based? It's a binary format. I have attached an IXF dump of the USERROLE table, whose contents are as follows: AUTHORITY | USERROLE_ID | VERSION ---------------------------------------- User | 2 | 1 Associate Editor | 3 | 1 TEST | 4 | 0 Admin | 1 | 1 TEST | 24 | 0 TEST | 25 | 0 > PostgreSQL > supports delimited files for import in various formats) DB2 will dump a delimited file or a CSV file, but it truncates the data to 250 characters. |
From: Mark D. <mj...@ge...> - 2009-05-28 21:10:39
|
Rutger, Val and I just had a meeting in which we decided on a plan for how to do this. Either Rutger or I will write a program that dumps out the database tables as a series of SQL 'insert' commands. We will be responsible for making sure that the data is properly escaped. The files will be UTF-8 encoded, so that the Unicode data that is currently in the database will be properly represented. The same program can also emit some SQL commands that set the sequence numbers, grants, or whatever. SDSC has assured us that they can provide enough disk space at SDSC to do this. Files can be copied from SDSC to NESCent with any of several methods, including rsync. Once at NESCent, someone (who?) will be responsible for executing these large SQL batch files to import the data into the Pg database. If done properly, the Unicode data will be transferred faithfully. If this sounds like a bad idea, or if there are unanswered questions, now is the time to speak up. |
From: Hilmar L. <hl...@du...> - 2009-05-28 21:32:19
|
This sounds great! -hilmar On May 28, 2009, at 5:10 PM, Mark Dominus wrote: > Rutger, Val and I just had a meeting in which we decided on a plan for > how to do this. > > Either Rutger or I will write a program that dumps out the database > tables as a series of SQL 'insert' commands. We will be responsible > for > making sure that the data is properly escaped. The files will be > UTF-8 > encoded, so that the Unicode data that is currently in the database > will > be properly represented. > > The same program can also emit some SQL commands that set the sequence > numbers, grants, or whatever. > > SDSC has assured us that they can provide enough disk space at SDSC to > do this. > > Files can be copied from SDSC to NESCent with any of several methods, > including rsync. > > Once at NESCent, someone (who?) will be responsible for executing > these > large SQL batch files to import the data into the Pg database. If > done > properly, the Unicode data will be transferred faithfully. > > If this sounds like a bad idea, or if there are unanswered questions, > now is the time to speak up. > > > > ------------------------------------------------------------------------------ > Register Now for Creativity and Technology (CaT), June 3rd, NYC. CaT > is a gathering of tech-side developers & brand creativity > professionals. Meet > the minds behind Google Creative Lab, Visual Complexity, Processing, & > iPhoneDevCamp as they present alongside digital heavyweights like > Barbarian > Group, R/GA, & Big Spaceship. http://p.sf.net/sfu/creativitycat-com > _______________________________________________ > Treebase-devel mailing list > Tre...@li... > https://lists.sourceforge.net/lists/listinfo/treebase-devel -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at duke dot edu : =========================================================== |
From: Jon A. <jon...@du...> - 2009-05-29 15:12:32
|
On May 28, 2009, at 5:10 PM, Mark Dominus wrote: > > Files can be copied from SDSC to NESCent with any of several methods, > including rsync. rsync will work fine for me if we need incremental updates, otherwise scp is faster. > Once at NESCent, someone (who?) will be responsible for executing > these > large SQL batch files to import the data into the Pg database. If > done > properly, the Unicode data will be transferred faithfully. Unless I hear otherwise from Hilmar, I'm assuming that person will be me. I'll be setting up the rsync transfers and the SQL imports. If the SQL inserts all contain UTF8 characters, then there should be no problem with the import into a UTF8 postgresql database. If there are non-UTF8 characters in the SQL file, they can be stripped out with iconv or converted with a shell program called "recode" > If this sounds like a bad idea, or if there are unanswered questions, > now is the time to speak up. I also agree that this is the best option to try first. -Jon ------------------------------------------------------- Jon Auman Systems Administrator National Evolutionary Synthesis Center Duke University http:www.nescent.org jon...@ne... ------------------------------------------------------ |
From: William P. <wil...@ya...> - 2009-05-29 16:20:19
|
On May 29, 2009, at 11:12 AM, Jon Auman wrote: > If the SQL inserts all contain UTF8 characters, then there should be > no problem with the import into a UTF8 postgresql database. If there > are non-UTF8 characters in the SQL file, they can be stripped out with > iconv or converted with a shell program called "recode" Given the history of the legacy TreeBASE data, I believe that the vast majority of diacriticals will be properly formed in utf8, but there will be some malformed ones (1) dating from when we were entering data through a Mac application (Apple8 characters) and (2) as a result of people submitting data via web browsers that don't comply with our meta tags regarding character codings. I think it's fine to leave these malformed ones in (rather than auto-stripping them out) because we will want to fix them by hand later on, and they help alert us to where things need fixing. bp |
From: Mark D. <mj...@ge...> - 2009-05-29 16:26:48
|
On Fri, 2009-05-29 at 11:12 -0400, Jon Auman wrote: > > rsync will work fine for me if we need incremental updates, otherwise > scp is faster. I like rsync because it's easier to restart if the transmission is interrupted partway through. With scp you get to start over again, or keep track separately of how far you got. But I guess you already know that. We can probably do the initial transfer by scp'ing compressed files, and then use rsync for any later updates. > Unless I hear otherwise from Hilmar, I'm assuming that person will be > me. Excellent, thanks. > If the SQL inserts all contain UTF8 characters, then there should be > no problem with the import into a UTF8 postgresql database. If there > are non-UTF8 characters in the SQL file, I'm not sure what that would mean. What is a "non-UTF8 character"? -- Mark Jason Dominus mj...@ge... Penn Genome Frontiers Institute +1 215 573 5387 |