From: Hilmar L. <hl...@du...> - 2009-04-06 14:34:47
|
Hi all, to start us off on planning and coordinating the next steps for the TreeBASE migration I'll repost part of an email I've sent to most of the people on this list previously, but which hasn't had any serious follow-up discussion yet. As we agreed, this would be the time and the place to have that discussion. Broadly, the next thing to work on would be moving the source code from the SDSC repository to SourceForge and have all developers work off of that, so that we can then branch off for the migration work. As for the actual planning, in principle here is the list of things I see on our plate to sort out: 1. Status update 1a. middleware and UI source code, outstanding bugs 1b. database schema definition 1c. data migration from TB1 and testing of result 1d. unit testing of TB2 code 1e. user & usability testing of TB2 UI 2. Moving source code to Sf.net 2a. licensing cleanup issue 2b. import of code base into sf.net svn 2c. switch development repositories, make SDSC repository read-only 2d. content pages (project homepage, documentation) 3. Procedures 3a. regular activity updates 3b. schema changes 3c. DAO and middleware API changes and subsequently, in broad strokes: 3. Hardware purchase & OS, software setup 4. Schema migration 5. Data migration script 6. Software migration 7. Testing 8. Flipping the switch Any and all thoughts and feedback appreciated. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at duke dot edu : =========================================================== |
From: Rutger V. <rut...@gm...> - 2009-04-06 21:48:17
|
Does anyone have any experience with moving a project to sourceforge while retaining its revision history? I believe this can be done but I'm not sure and I don't know how to do it. On Mon, Apr 6, 2009 at 7:34 AM, Hilmar Lapp <hl...@du...> wrote: > Hi all, > > to start us off on planning and coordinating the next steps for the > TreeBASE migration I'll repost part of an email I've sent to most of > the people on this list previously, but which hasn't had any serious > follow-up discussion yet. As we agreed, this would be the time and the > place to have that discussion. > > Broadly, the next thing to work on would be moving the source code > from the SDSC repository to SourceForge and have all developers work > off of that, so that we can then branch off for the migration work. > > As for the actual planning, in principle here is the list of things I > see on our plate to sort out: > > 1. Status update > 1a. middleware and UI source code, outstanding bugs > 1b. database schema definition > 1c. data migration from TB1 and testing of result > 1d. unit testing of TB2 code > 1e. user & usability testing of TB2 UI > 2. Moving source code to Sf.net > 2a. licensing cleanup issue > 2b. import of code base into sf.net svn > 2c. switch development repositories, make SDSC repository read-only > 2d. content pages (project homepage, documentation) > 3. Procedures > 3a. regular activity updates > 3b. schema changes > 3c. DAO and middleware API changes > > and subsequently, in broad strokes: > 3. Hardware purchase & OS, software setup > 4. Schema migration > 5. Data migration script > 6. Software migration > 7. Testing > 8. Flipping the switch > > Any and all thoughts and feedback appreciated. > > -hilmar > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at duke dot edu : > =========================================================== > > > > > > ------------------------------------------------------------------------------ > _______________________________________________ > Treebase-devel mailing list > Tre...@li... > https://lists.sourceforge.net/lists/listinfo/treebase-devel > -- Dr. Rutger A. Vos Department of zoology University of British Columbia http://www.nexml.org http://rutgervos.blogspot.com |
From: Hilmar L. <hl...@du...> - 2009-04-06 21:57:52
|
On Apr 6, 2009, at 5:48 PM, Rutger Vos wrote: > I believe this can be done It can be done, there's a tool to export and one to import the svn repository. It requires admin help, though, i.e., we'd be dependent on SourceForge support staff and their prioritization of time to assist us with doing this. > but I'm not sure and I don't know how to do it. I would actually advise against it. It would make public any user or account names, host names, and passwords that were ever mistakenly committed to the repository. I'd advise to start with a clean slate that we have convinced ourselves is free of cruft (at least as far as entire files are concerned), free of information that would make some system vulnerable to security breach, and free of bogus, obsolete, or inapplicable license information. You can (and in fact should) still archive a complete dump of the current repository at the time of switching to Sf.net in the event that you want to go back later and find out about who originated a piece of code or to retrieve a file that used to be there but was deleted later. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at duke dot edu : =========================================================== |
From: Rutger V. <rut...@gm...> - 2009-04-06 22:05:18
|
> I would actually advise against it. It would make public any user or account > names, host names, and passwords that were ever mistakenly committed to the > repository. Good point. > You can (and in fact should) still archive a complete dump of the current > repository at the time of switching to Sf.net in the event that you want to > go back later and find out about who originated a piece of code or to > retrieve a file that used to be there but was deleted later. Ah, yes. Let's do that. -- Dr. Rutger A. Vos Department of zoology University of British Columbia http://www.nexml.org http://rutgervos.blogspot.com |
From: Rutger V. <rut...@gm...> - 2009-04-21 20:53:12
|
This is Hilmar's earlier thread on the same topic Val just brought up: migration to NESCent. Let's work off of this, it's more detailed. On Mon, Apr 6, 2009 at 7:34 AM, Hilmar Lapp <hl...@du...> wrote: > Hi all, > > to start us off on planning and coordinating the next steps for the > TreeBASE migration I'll repost part of an email I've sent to most of > the people on this list previously, but which hasn't had any serious > follow-up discussion yet. As we agreed, this would be the time and the > place to have that discussion. > > Broadly, the next thing to work on would be moving the source code > from the SDSC repository to SourceForge and have all developers work > off of that, so that we can then branch off for the migration work. > > As for the actual planning, in principle here is the list of things I > see on our plate to sort out: > > 1. Status update > 1a. middleware and UI source code, outstanding bugs > 1b. database schema definition > 1c. data migration from TB1 and testing of result > 1d. unit testing of TB2 code > 1e. user & usability testing of TB2 UI > 2. Moving source code to Sf.net > 2a. licensing cleanup issue > 2b. import of code base into sf.net svn > 2c. switch development repositories, make SDSC repository read-only > 2d. content pages (project homepage, documentation) > 3. Procedures > 3a. regular activity updates > 3b. schema changes > 3c. DAO and middleware API changes > > and subsequently, in broad strokes: > 3. Hardware purchase & OS, software setup > 4. Schema migration > 5. Data migration script > 6. Software migration > 7. Testing > 8. Flipping the switch > > Any and all thoughts and feedback appreciated. > > -hilmar > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at duke dot edu : > =========================================================== > > > > > > ------------------------------------------------------------------------------ > _______________________________________________ > Treebase-devel mailing list > Tre...@li... > https://lists.sourceforge.net/lists/listinfo/treebase-devel > -- Dr. Rutger A. Vos Department of zoology University of British Columbia http://www.nexml.org http://rutgervos.blogspot.com |
From: Rutger V. <rut...@gm...> - 2009-04-21 20:56:43
|
Hilmar, >> and subsequently, in broad strokes: >> 3. Hardware purchase & OS, software setup I'm curious about this step: what does it involve in practical terms to get to the point where I can ssh into rv...@tr... (or some such)? Rutger -- Dr. Rutger A. Vos Department of zoology University of British Columbia http://www.nexml.org http://rutgervos.blogspot.com |
From: Hilmar L. <hl...@du...> - 2009-04-21 21:02:59
|
On Apr 21, 2009, at 4:56 PM, Rutger Vos wrote: > Hilmar, > >>> and subsequently, in broad strokes: >>> 3. Hardware purchase & OS, software setup > > I'm curious about this step: what does it involve in practical terms > to get to the point where I can ssh into > rv...@tr... (or some such)? Purchase & delivery of the hardware, virtualization environment to be set up, virtual slices to be created, OS installed and imaged, accounts to be created. Jon would have more details. We'll also be doing testing of the host slices using our own development sites, and we'll be looking this and next week whether we can fast-track some of the hardware purchases so we can start testing earlier. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at duke dot edu : =========================================================== |
From: Hilmar L. <hl...@du...> - 2009-04-21 20:58:18
|
On Mon, Apr 6, 2009 at 7:34 AM, Hilmar Lapp <hl...@du...> wrote: > [...] > 3. Hardware purchase & OS, software setup Just as an update that most (though possibly not all) of you will already know, the outside-service agreement with NESCent is in place, and we have invoiced. As soon as we have the funds we will purchase the hardware. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at duke dot edu : =========================================================== |
From: Val T. <va...@ci...> - 2009-04-21 21:17:02
|
On Apr 21, 2009, at 4:58 PM, Hilmar Lapp wrote: > > > On Mon, Apr 6, 2009 at 7:34 AM, Hilmar Lapp <hl...@du...> wrote: >> [...] > >> 3. Hardware purchase & OS, software setup > > Just as an update that most (though possibly not all) of you will > already know, the outside-service agreement with NESCent is in > place, and we have invoiced. As soon as we have the funds we will > purchase the hardware. Does this mean we wait with any transfer of data until the hardware arrives and is set up? I must tell you that we may have been lucky that nothing went too wrong with SDSC's installation so far and the faster we become independent of that the better... Val > > > -hilmar > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at duke dot edu : > =========================================================== > > > > |
From: Hilmar L. <hl...@du...> - 2009-04-21 21:48:11
|
We can work on the software and data migration tasks in parallel to hardware, absolutely. -hilmar On Apr 21, 2009, at 5:03 PM, Val Tannen wrote: > > On Apr 21, 2009, at 4:58 PM, Hilmar Lapp wrote: > >> >> >> On Mon, Apr 6, 2009 at 7:34 AM, Hilmar Lapp <hl...@du...> wrote: >>> [...] >> >>> 3. Hardware purchase & OS, software setup >> >> Just as an update that most (though possibly not all) of you will >> already know, the outside-service agreement with NESCent is in >> place, and we have invoiced. As soon as we have the funds we will >> purchase the hardware. > > Does this mean we wait with any transfer of data until the hardware > arrives and is set up? > I must tell you that we may have been lucky that nothing went too > wrong with SDSC's installation > so far and the faster we become independent of that the better... > > Val > > > > > > >> >> >> -hilmar >> -- >> =========================================================== >> : Hilmar Lapp -:- Durham, NC -:- hlapp at duke dot edu : >> =========================================================== >> >> >> >> > > > ------------------------------------------------------------------------------ > Stay on top of everything new and different, both inside and > around Java (TM) technology - register by April 22, and save > $200 on the JavaOne (SM) conference, June 2-5, 2009, San Francisco. > 300 plus technical and hands-on sessions. Register today. > Use priority code J9JMT32. http://p.sf.net/sfu/p > _______________________________________________ > Treebase-devel mailing list > Tre...@li... > https://lists.sourceforge.net/lists/listinfo/treebase-devel -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at duke dot edu : =========================================================== |
From: Mark D. <mj...@ge...> - 2009-04-22 20:57:52
|
Hilmar Lapp wrote: > 1b. database schema definition > 4. Schema migration Hibernate will generate the database schema definition and create the tables if we ask it do; we have done this a couple of times at SDSC. Or did I misunderstand this point? > 1c. data migration from TB1 and testing of result > 5. Data migration script Right now the TB1 data is nearly all installed in the new TB2 database at SDSC. Since the SDSC and NESCent scemas will be identical, or nearly so, the data migration from SDSC to NESCent should be straightforward. I understand there is probably some way to dump the TB2-format data as it currently exists at SDSC, transfer the dump files to NESCent, and bulk-load them into the database on the NESCent side. I would not like to perform the TB1->TB2 migration a second time, if at all possible. > 1d. unit testing of TB2 code The current codebase is severely lacking in unit tests. What tests there are are often extremely slow and are more properly system tests than unit tests. |
From: Hilmar L. <hl...@du...> - 2009-04-23 15:34:51
|
On Apr 22, 2009, at 4:45 PM, Mark Dominus wrote: > [...] > Hibernate will generate the database schema definition and create the > tables if we ask it do I know, and in fact that's one possible way to migrate the schema from DB2 to PostgreSQL. It's not a good way to manage schema versions, though. I.e., there would be none, and accordingly, you then can't have software that uses the database directly rather than through Hibernate and still relies on anything, and you can't make any direct modifications to the schema. For software projects such as TreeBASE it is best practice to manage versioning of the schema as you manage versioning of the software, and to change a schema by writing and applying a migration script. > [...] > I understand there is probably some way to dump the TB2-format data as > it currently exists at SDSC, transfer the dump files to NESCent, and > bulk-load them into the database on the NESCent side. In theory yes, but in practice each RDBMS has its own dump format. Ideally we can get DB2 to dump the data as SQL standard-compliant INSERT statements, but I don't know DB2 enough yet to know whether it does that, and aside from that there's more than the data itself, such as the sequence(s), grants, etc that may not dump in a format that's readily ingestible by Pg. Also, the deliverable that we want from this is a fully scripted process that takes the dumps from DB2, does its thing, and at the end the data is fully imported into the NESCent PostgreSQL instance. The reason is that we want this to be repeatable. I.e., we will test first, fix, rerun, etc, until we can switch over in a well-coordinated fashion that involves only minimal downtime. > [...] I would not like to perform the TB1->TB2 migration a second > time, if at > all possible. That would be a bad idea indeed once data have been added to TB2. >> 1d. unit testing of TB2 code > > The current codebase is severely lacking in unit tests. What tests > there are are often extremely slow and are more properly system tests > than unit tests. I think it's going to be *very* important to have unit tests, and semantic data integrity tests (i.e., tests for integrity that go beyond the constraints that the database enforces) so that we can have confidence in the migration result. Otherwise we'll be sort of betting. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at duke dot edu : =========================================================== |
From: Mark D. <mjd...@ge...> - 2009-04-23 16:38:33
|
Hilmar Lapp wrote: > For software projects such as TreeBASE it is best practice to manage > versioning of the schema as you manage versioning of the software, and > to change a schema by writing and applying a migration script. Okay, that makes sense to me. Thanks. > > The deliverable that we want from this is a fully scripted process > that takes the dumps from DB2, does its thing, and at the end the data > is fully imported into the NESCent PostgreSQL instance. The reason is > that we want this to be repeatable. I.e., we will test first, fix, > rerun, etc, until we can switch over in a well-coordinated fashion that > involves only minimal downtime. Very good. What do you think I should do first? > >> [...] I would not like to perform the TB1->TB2 migration a second >> time, if at >> all possible. > > That would be a bad idea indeed once data have been added to TB2. I think we were hoping to move from SDSC to NESCent before TB2 opened for business. That is, we have an installation at SDSC now which is suitable for beta-testing, but I think we planned to discard all data uploaded by beta-testers before moving the data to NESCent. |
From: Mark D. <mjd...@ge...> - 2009-04-23 19:45:44
|
Hilmar Lapp wrote: >> I understand there is probably some way to dump the TB2-format data as >> it currently exists at SDSC, transfer the dump files to NESCent, and >> bulk-load them into the database on the NESCent side. > > In theory yes, but in practice each RDBMS has its own dump format. > Ideally we can get DB2 to dump the data as SQL standard-compliant > INSERT statements, but I don't know DB2 enough yet to know whether it > does that, and aside from that there's more than the data itself, such > as the sequence(s), grants, etc that may not dump in a format that's > readily ingestible by Pg. It appears that DB2 will dump the data in only one useful format, called IXF. Do you know if Pg can import that? if not we may have to do something. Also, although most of the database is pretty small, there is one table with around 3e8 rows, and it may not be practical to export this to a file or to transport the file. |
From: Mark D. <mjd...@ge...> - 2009-04-23 16:40:21
|
Hilmar Lapp wrote: >>> 1d. unit testing of TB2 code >> >> The current codebase is severely lacking in unit tests. What tests >> there are are often extremely slow and are more properly system tests >> than unit tests. > > I think it's going to be *very* important to have unit tests, I agree completely. I was just warning you what to expect. > and semantic data integrity tests (i.e., tests for integrity that go beyond > the constraints that the database enforces) so that we can have > confidence in the migration result. This we do have. |
From: Hilmar L. <hl...@du...> - 2009-04-23 18:04:28
|
On Apr 23, 2009, at 12:39 PM, Mark Dominus wrote: >> and semantic data integrity tests (i.e., tests for integrity that >> go beyond >> the constraints that the database enforces) so that we can have >> confidence in the migration result. > > This we do have. The semantic data integrity tests or the confidence? :) I'm assuming you meant the former, as we haven't migrated to Pg yet. Or were you saying you have confidence in the TB1->TB2 migration? -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at duke dot edu : =========================================================== |
From: Mark D. <mjd...@ge...> - 2009-04-23 18:44:36
|
Hilmar Lapp wrote: > > On Apr 23, 2009, at 12:39 PM, Mark Dominus wrote: >> This we do have. > > > The semantic data integrity tests or the confidence? :) I'm assuming you > meant the former, Yes. |
From: Rutger V. <rut...@gm...> - 2009-04-23 20:58:21
|
> It's not a good way to manage schema versions, though. I.e., there > would be none, and accordingly, you then can't have software that uses > the database directly rather than through Hibernate and still relies > on anything, and you can't make any direct modifications to the schema. So how do we make the jump to a situation where we have a versioned schema? The way I imagined it would be to let hibernate generate a pg-compatible schema file, run it on a fresh pg instance and stick the file in the repository. Now we've made the jump. Then - what happens if we need to alter one table (e.g. we need to make the "abstract" field of "article" longer)? We run an "alter table'" command an paste the command at the bottom of the schema file? Or is this a point where we change the original "create" command for the "article" table and rerun the whole script (which might involve dropping and reloading the whole database)? |
From: Hilmar L. <hl...@du...> - 2009-04-23 21:14:04
|
On Apr 23, 2009, at 4:58 PM, Rutger Vos wrote: >> It's not a good way to manage schema versions, though. I.e., there >> would be none, and accordingly, you then can't have software that >> uses >> the database directly rather than through Hibernate and still relies >> on anything, and you can't make any direct modifications to the >> schema. > > So how do we make the jump to a situation where we have a versioned > schema? The way I imagined it would be to let hibernate generate a > pg-compatible schema file, run it on a fresh pg instance and stick the > file in the repository. That's a possibility. It would require that the webapp code actually runs w/o error to that point. Another possibility is to take the schema dump from DB2 and convert from there. The schema dump should be easy to obtain, so that's where I would start. If the result looks like a lot of work, I would then try the hibernate route. > Now we've made the jump. Then - what happens if we need to alter > one table (e.g. we need to make the "abstract" field of "article" > longer)? We run an "alter table'" command an paste the command at > the bottom of the schema file? Or is this a point where we change > the original "create" command for the "article" table and > rerun the whole script (which might involve dropping and reloading the > whole database)? Yes and no. Yes, you do change the original CREATE command (so that your master script for instantiating the database from scratch stays up-to-date). To migrate existing database instances, you write a script that applies all necessary changes from the previous release to the next, without dropping or losing data. There are examples for how this looks like in BioSQL: http://tinyurl.com/cxwz6f http://tinyurl.com/csx2mj -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at duke dot edu : =========================================================== |
From: Mark D. <mjd...@ge...> - 2009-04-24 18:22:01
|
Hilmar Lapp wrote: > On Apr 22, 2009, at 4:45 PM, Mark Dominus wrote: >> I understand there is probably some way to dump the TB2-format data as >> it currently exists at SDSC, transfer the dump files to NESCent, and >> bulk-load them into the database on the NESCent side. > > In theory yes, but in practice each RDBMS has its own dump format. > Ideally we can get DB2 to dump the data as SQL standard-compliant > INSERT statements, but I don't know DB2 enough yet to know whether it > does that, and aside from that there's more than the data itself, such > as the sequence(s), grants, etc that may not dump in a format that's > readily ingestible by Pg. It dumps the sequences, grants, foreign key constraints, and so forth, as SQL; see trunk/schema.sql . But for dumping the data, it seems as though we can get any format we want, as long as it is IXF. So it then occurred to me that it would not be hard to write a program that would scan all the records in a table and write out a series of SQL INSERT statements. But rather than do that, it seems to me that it might make more sense to skip the text representation and just write a program that would run at NESCent, scan the tables over the network, and execute the appropriate INSERT statements directly, without ever serializing the data in between. The drawback of this comes if we need to import the SDSC data a second time for some reason. It would all have to be transferred over the network a second time. A dump file need only be transferred once, and then could be stored at NESCent and loaded as many times as needed. The benefit would be that there would be no need to worry about escape code conventions or strange characters or anything like that, and there would be no need to ship around a bunch of big files. |
From: Rutger V. <rut...@gm...> - 2009-04-24 20:01:39
|
It would be very useful if we did have a dump format (I imagined something simple like csv or some other delimiter). Some databases offer this for downloads (e.g. ncbi taxonomy, itis, "mammal species of the world") and it's a very useful feature. If we want treebase to be more than a place where trees go to die, this would be one way to facilitate meta-analyses and such. DB2::Admin on cpan (http://search.cpan.org/dist/DB2-Admin/) has a facility to dump DB2 tables as delimited files, so we could write a cron job script to do just that and make the output available as an archive. On Fri, Apr 24, 2009 at 11:21 AM, Mark Dominus <mjd...@ge...> wrote: > Hilmar Lapp wrote: >> On Apr 22, 2009, at 4:45 PM, Mark Dominus wrote: >>> I understand there is probably some way to dump the TB2-format data as >>> it currently exists at SDSC, transfer the dump files to NESCent, and >>> bulk-load them into the database on the NESCent side. >> >> In theory yes, but in practice each RDBMS has its own dump format. >> Ideally we can get DB2 to dump the data as SQL standard-compliant >> INSERT statements, but I don't know DB2 enough yet to know whether it >> does that, and aside from that there's more than the data itself, such >> as the sequence(s), grants, etc that may not dump in a format that's >> readily ingestible by Pg. > > It dumps the sequences, grants, foreign key constraints, and so forth, > as SQL; see trunk/schema.sql . > > But for dumping the data, it seems as though we can get any format we > want, as long as it is IXF. > > So it then occurred to me that it would not be hard to write a program > that would scan all the records in a table and write out a series of SQL > INSERT statements. > > But rather than do that, it seems to me that it might make more sense to > skip the text representation and just write a program that would run at > NESCent, scan the tables over the network, and execute the appropriate > INSERT statements directly, without ever serializing the data in between. > > The drawback of this comes if we need to import the SDSC data a second > time for some reason. It would all have to be transferred over the > network a second time. A dump file need only be transferred once, and > then could be stored at NESCent and loaded as many times as needed. The > benefit would be that there would be no need to worry about escape code > conventions or strange characters or anything like that, and there would > be no need to ship around a bunch of big files. > > ------------------------------------------------------------------------------ > Crystal Reports - New Free Runtime and 30 Day Trial > Check out the new simplified licensign option that enables unlimited > royalty-free distribution of the report engine for externally facing > server and web deployment. > http://p.sf.net/sfu/businessobjects > _______________________________________________ > Treebase-devel mailing list > Tre...@li... > https://lists.sourceforge.net/lists/listinfo/treebase-devel > -- Dr. Rutger A. Vos Department of zoology University of British Columbia http://www.nexml.org http://rutgervos.blogspot.com |
From: Jon A. <jon...@du...> - 2009-04-29 21:36:06
|
Ok, so I'll pick up this thread instead of the one earlier today that hijacked this conversation. I was swamped last week with other 'stuff', so I did not jump in to the conversation. My experience with postgresql to postgresql dumps has been spotty. Postgresql can dump to plain text, tar, or binary format. Sometimes the sequences try to get imported before the table and the whole import fails, in text mode. Some databases import just fine in text mode though. That's why I thought the csv file seemed like a good place to try. It's also the method documented on postgresql's wiki: http://wiki.postgresql.org/wiki/Image:DB2UDB-to-PG.pdf Of course, you would have to be careful to pick a delimiter that does not occur in the data values. I'm not opposed to the web service slurping the data in over the wire, but it seems like more work and more difficult to troubleshoot. Of course, I'm not much of a programmer, so I'm looking at things from a sys admin point of view. That's also a good point about encoding. -Jon ------------------------------------------------------- Jon Auman Systems Administrator National Evolutionary Synthesis Center Duke University http:www.nescent.org jon...@ne... ------------------------------------------------------ On Apr 24, 2009, at 2:21 PM, Mark Dominus wrote: > Hilmar Lapp wrote: >> On Apr 22, 2009, at 4:45 PM, Mark Dominus wrote: >>> I understand there is probably some way to dump the TB2-format >>> data as >>> it currently exists at SDSC, transfer the dump files to NESCent, and >>> bulk-load them into the database on the NESCent side. >> >> In theory yes, but in practice each RDBMS has its own dump format. >> Ideally we can get DB2 to dump the data as SQL standard-compliant >> INSERT statements, but I don't know DB2 enough yet to know whether it >> does that, and aside from that there's more than the data itself, >> such >> as the sequence(s), grants, etc that may not dump in a format that's >> readily ingestible by Pg. > > It dumps the sequences, grants, foreign key constraints, and so forth, > as SQL; see trunk/schema.sql . > > But for dumping the data, it seems as though we can get any format we > want, as long as it is IXF. > > So it then occurred to me that it would not be hard to write a program > that would scan all the records in a table and write out a series of > SQL > INSERT statements. > > But rather than do that, it seems to me that it might make more > sense to > skip the text representation and just write a program that would run > at > NESCent, scan the tables over the network, and execute the appropriate > INSERT statements directly, without ever serializing the data in > between. > > The drawback of this comes if we need to import the SDSC data a second > time for some reason. It would all have to be transferred over the > network a second time. A dump file need only be transferred once, and > then could be stored at NESCent and loaded as many times as needed. > The > benefit would be that there would be no need to worry about escape > code > conventions or strange characters or anything like that, and there > would > be no need to ship around a bunch of big files. > > ------------------------------------------------------------------------------ > Crystal Reports - New Free Runtime and 30 Day Trial > Check out the new simplified licensign option that enables unlimited > royalty-free distribution of the report engine for externally > facing > server and web deployment. > http://p.sf.net/sfu/businessobjects > _______________________________________________ > Treebase-devel mailing list > Tre...@li... > https://lists.sourceforge.net/lists/listinfo/treebase-devel |
From: Mark D. <mj...@ge...> - 2009-04-30 16:59:07
|
Hilmar Lapp wrote: > On Apr 21, 2009, at 4:56 PM, Rutger Vos wrote: >> >> I'm curious about this step: what does it involve in practical terms >> to get to the point where I can ssh into >> rv...@tr... (or some such)? > > Purchase & delivery of the hardware, virtualization environment to be > set up, virtual slices to be created, OS installed and imaged, > accounts to be created. > > Jon would have more details. We'll also be doing testing of the host > slices using our own development sites, and we'll be looking this and > next week whether we can fast-track some of the hardware purchases so > we can start testing earlier. > What's the status of this? Do we have an ETA? If not, do we have an ETA for the ETA? |
From: Hilmar L. <hl...@du...> - 2009-05-01 14:03:03
|
There'll be more on this next week. I'm tied up during the day (as I have been the whole week), more on this tonight and tomorrow. -hilmar On Apr 30, 2009, at 12:59 PM, Mark Dominus wrote: > Hilmar Lapp wrote: >> On Apr 21, 2009, at 4:56 PM, Rutger Vos wrote: >>> >>> I'm curious about this step: what does it involve in practical terms >>> to get to the point where I can ssh into >>> rv...@tr... (or some such)? >> >> Purchase & delivery of the hardware, virtualization environment to be >> set up, virtual slices to be created, OS installed and imaged, >> accounts to be created. >> >> Jon would have more details. We'll also be doing testing of the host >> slices using our own development sites, and we'll be looking this and >> next week whether we can fast-track some of the hardware purchases so >> we can start testing earlier. >> > > What's the status of this? Do we have an ETA? If not, do we have an > ETA for the ETA? > > ------------------------------------------------------------------------------ > Register Now & Save for Velocity, the Web Performance & Operations > Conference from O'Reilly Media. Velocity features a full day of > expert-led, hands-on workshops and two days of sessions from industry > leaders in dedicated Performance & Operations tracks. Use code > vel09scf > and Save an extra 15% before 5/3. http://p.sf.net/sfu/velocityconf > _______________________________________________ > Treebase-devel mailing list > Tre...@li... > https://lists.sourceforge.net/lists/listinfo/treebase-devel -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at duke dot edu : =========================================================== |
From: Mark D. <mj...@ge...> - 2009-05-27 17:37:16
|
On Fri, 2009-05-01 at 09:40 -0400, Hilmar Lapp wrote: > There'll be more on this next week. I'm tied up during the day (as I > have been the whole week), more on this tonight and tomorrow. What is the status of this? Do we have an ETA? If not, do we have an ETA for the ETA? > On Apr 30, 2009, at 12:59 PM, Mark Dominus wrote: > > > Hilmar Lapp wrote: > >> On Apr 21, 2009, at 4:56 PM, Rutger Vos wrote: > >>> > >>> I'm curious about this step: what does it involve in practical terms > >>> to get to the point where I can ssh into > >>> rv...@tr... (or some such)? > >> > >> Purchase & delivery of the hardware, virtualization environment to be > >> set up, virtual slices to be created, OS installed and imaged, > >> accounts to be created. > >> > >> Jon would have more details. We'll also be doing testing of the host > >> slices using our own development sites, and we'll be looking this and > >> next week whether we can fast-track some of the hardware purchases so > >> we can start testing earlier. > >> > > > > What's the status of this? Do we have an ETA? If not, do we have an > > ETA for the ETA? > -- Mark Jason Dominus mj...@ge... Penn Genome Frontiers Institute +1 215 573 5387 |