From: Chetna D. W. <ch...@ug...> - 2003-05-14 17:47:01
|
Hey all, I am working with Michael to load GenBank stuff in gus 3.0. Right our database is out of tablespace. My question is: Does repetitive use of plugin/GBParser resume where it left off or will it try to load everything from scratch. Situation here: Due to limited tablespace we could successfully load 18079 rows in the database (dots.ExternalNASequence and Dots.NAEntry). I am adding more tablespace and then re-run the GBParser on the same GenBank file. At the least I expect the primary key failure error for first 18079 rows and then GBParser should be able to load the remaining ones. It would be great if someone can give insight. Thanks, Chetna |
From: Jonathan C. <cra...@pc...> - 2003-05-14 18:00:16
|
Hi Chetna- Chetna D. Warade wrote: > I am working with Michael to load GenBank stuff in gus 3.0. > Right our database is out of tablespace. My question is: Does repetitive > use of plugin/GBParser resume where it left off or will it try to load > everything from scratch. This is something that has to be coded on a per-plugin basis, meaning that unless the authors of the GBParser plugin have explicitly given it the ability to restart cleanly, it won't. Or rather, most plugins will probably run a second time without complaint, but will likely create duplicate rows in the database. Whether you get duplicates also depends on how the plugin handles commits (also a plugin-specific issue). Most of the plugins that load a large amount of data will commit on a periodic basis (e.g. every 1000 or 10000 entries or rows), so that if a crash occurs at 5500 entries, for example, you would end up with 5000 in the database, assuming a commit frequency of 1000. And it also depends whether the plugin checks for the presence of entries/rows before loading duplicates (a facility that may provided support for, but is not equivalent to, the ability to restart a plugin on the same input files.) > Situation here: > Due to limited tablespace we could successfully load 18079 rows in the > database (dots.ExternalNASequence and Dots.NAEntry). I am adding more > tablespace and then re-run the GBParser on the same GenBank file. At the > least I expect the primary key failure error for first 18079 rows and > then GBParser should be able to load the remaining ones. In general you are unlikely to get primary key errors, since the primary key values are autogenerated, and so the second time the plugin is run it will generate a whole new set of IDs (assuming that it has not been written to handle restarts and/or check whether entries being inserted are already in the database.) Again, however, it's something that is plugin-specific. If a table has additional "unique" constraints, for example, and the plugin fails to check whether inserted rows are already in the database, then it is possible for constraint violations to occur when re-running a plugin. Anyway, the bottom line is that it depends almost entirely on how the GBParser has been implemented, and so your questions are all best answered either by the people who wrote the plugin or by looking at the Perl code directly. Jonathan |
From: Deborah F. P. <pi...@pc...> - 2003-05-14 18:12:02
|
Hi Chetna, In the case of this particular plugin, you won't get duplicates created in the db by running it again. However, there is an option, --start, that allows you to stipulate where you left off in a particular file and then continue from there saving you running time. You should have a log file with ouput from the plugin that looks something like this: STATUS: N=17 ACCS=AL671875 TOTAL_OBJECTS=123 TIME=Thu Feb 6 14:07:19 EST 2003 INSERTED: AL671875; N=17 Use the value of N for the last completed entry for --start. Debbie On Wed, 14 May 2003, Chetna D. Warade wrote: > > Hey all, > > I am working with Michael to load GenBank stuff in gus 3.0. > Right our database is out of tablespace. My question is: Does repetitive > use of plugin/GBParser resume where it left off or will it try to load > everything from scratch. > > Situation here: > Due to limited tablespace we could successfully load 18079 rows in the > database (dots.ExternalNASequence and Dots.NAEntry). I am adding more > tablespace and then re-run the GBParser on the same GenBank file. At the > least I expect the primary key failure error for first 18079 rows and > then > GBParser should be able to load the remaining ones. > > It would be great if someone can give insight. > > Thanks, > Chetna > > > > > ------------------------------------------------------- > Enterprise Linux Forum Conference & Expo, June 4-6, 2003, Santa Clara > The only event dedicated to issues related to Linux enterprise solutions > www.enterpriselinuxforum.com > > _______________________________________________ > Gusdev-gusdev mailing list > Gus...@li... > https://lists.sourceforge.net/lists/listinfo/gusdev-gusdev > |