From: Sucheta T. <su...@vb...> - 2004-07-02 12:36:14
|
Jonathan, Steve, Thank you for your comments and suggestion. What I am planning to do is convert few of the sequences into genbank format and see how it goes through GBParser and since our data will be uniform(unlike genbank generic format), I assume it will be possible to fix GBParser. I shall let you by today evening how things are. Thanks Sucheta > Sucheta: > > I am in the process of using sqlldr to load data in to a new TESS table > that has lots of data that we want to compress using Oracle's built-in > compression ability. The table has all of the normal GUS overhead > columns and will be queried using GUS objects and/or SQL as usual. If > the overhead columns are constant for the load, they can be set in the > header of the SQL loader file. (See example below; note the limited > date resolution.) Once in the db, these rows are as good as any other > rows. I 'need' to use sqlldr because it is the simplest way to make > sure the data gets compressed as it's loaded. I will probably be doing > more of this in the future. Will let you know how it goes. > > Since your data is not in originally in GenBank format, there is > certainly much merit in writing a loader specifically for your data > format. We certainly do not convert everything to GB format to put it > in GUS. As you note, this is especially true when there is extra data > that is not well handled by GenBank format. In this scenario, the best > thing about using the object layer (as opposed to sqlldr) is that they > handle most of the details of getting primary keys, rollbacks, submits > over multiple tables, GUS overhead rows, etc. You get none of this > with the bulk loader, though you could (re)create some of it I guess. > We use the objects to load the PlasmoDB data read from XML files and I > think the performance is perfectly acceptable. > > On the other hand, if you have to convert your data to GenBank format > to submit it to GenBank anyway, then it wouldn't hurt to use gbparser > and try to track down and fix (or at least report) any errors you > encounter. GBparser should be kept in good working order! In any > case, it is a good idea to compare the contents of the db after a run > to make sure everything that you think should have gone in did in fact > go in. > > Jonathan Schug > > ----- > > LOAD DATA > INFILE * > APPE > ND > INTO TABLE TESS.PredictedBindingSite > FIELDS TERMINATED BY ',' > ( predicted_binding_site_id, > model_id, > na_sequence_id, > score, > begin, > end, > is_reversed, > modification_date constant '19-JUN-04', > user_read constant 1, > user_write constant 1, > group_read constant 1, > group_write constant 1, > other_read constant 1, > other_write constant 0, > row_user_id constant 7, > row_group_id constant 0, > row_project_id constant 0, > row_alg_invocation_id constant 247696 > ) > BEGINDATA > 1,5017,100928782,6.277017,41202114,41202122,0 > ... > > ------------------------------------------------------------------------ > --- > Jonathan Schug Center for Bioinformatics > js...@pc... Computational Biology and Informatics Lab > (215) 573-3113 voice University of Pennsylvania, > (215) 573-3111 fax 1413 Blockley Hall, Philadelphia, PA > 19014-6021 > > -- Sucheta Tripathy Virginia Bioinformatics Institute Phase-I Washington street. Virginia Tech. Blacksburg,VA 24061-0447 phone:(540)231-8138 Fax: (540) 231-2606 |