From: Jonathan S. <js...@pc...> - 2004-07-02 04:30:29
|
Sucheta: I am in the process of using sqlldr to load data in to a new TESS table that has lots of data that we want to compress using Oracle's built-in compression ability. The table has all of the normal GUS overhead columns and will be queried using GUS objects and/or SQL as usual. If the overhead columns are constant for the load, they can be set in the header of the SQL loader file. (See example below; note the limited date resolution.) Once in the db, these rows are as good as any other rows. I 'need' to use sqlldr because it is the simplest way to make sure the data gets compressed as it's loaded. I will probably be doing more of this in the future. Will let you know how it goes. Since your data is not in originally in GenBank format, there is certainly much merit in writing a loader specifically for your data format. We certainly do not convert everything to GB format to put it in GUS. As you note, this is especially true when there is extra data that is not well handled by GenBank format. In this scenario, the best thing about using the object layer (as opposed to sqlldr) is that they handle most of the details of getting primary keys, rollbacks, submits over multiple tables, GUS overhead rows, etc. You get none of this with the bulk loader, though you could (re)create some of it I guess. We use the objects to load the PlasmoDB data read from XML files and I think the performance is perfectly acceptable. On the other hand, if you have to convert your data to GenBank format to submit it to GenBank anyway, then it wouldn't hurt to use gbparser and try to track down and fix (or at least report) any errors you encounter. GBparser should be kept in good working order! In any case, it is a good idea to compare the contents of the db after a run to make sure everything that you think should have gone in did in fact go in. Jonathan Schug ----- LOAD DATA INFILE * APPEND INTO TABLE TESS.PredictedBindingSite FIELDS TERMINATED BY ',' ( predicted_binding_site_id, model_id, na_sequence_id, score, begin, end, is_reversed, modification_date constant '19-JUN-04', user_read constant 1, user_write constant 1, group_read constant 1, group_write constant 1, other_read constant 1, other_write constant 0, row_user_id constant 7, row_group_id constant 0, row_project_id constant 0, row_alg_invocation_id constant 247696 ) BEGINDATA 1,5017,100928782,6.277017,41202114,41202122,0 ... ------------------------------------------------------------------------ --- Jonathan Schug Center for Bioinformatics js...@pc... Computational Biology and Informatics Lab (215) 573-3113 voice University of Pennsylvania, (215) 573-3111 fax 1413 Blockley Hall, Philadelphia, PA 19014-6021 |