From: Vladimir G. <vla...@du...> - 2010-01-27 18:46:25
|
Here is what happened to user_id and person_id in an empty database after I created a handful new users, starting with hibernate_sequence = 1000 (that's ONE thousand) user_id username person_id hibernate_sequence 10000 "vgapeyev"; 10010 1001 10001 "vgapeyev2"; 10011 1001 10002 "vgapeyev3"; 10012 1001 10020 "vgapeyev4"; 10030 1003 That is, Hibernate uses values from hibernate_sequence to generate PK ids in some cleverly mysterious ways. Consequently, the current declarations of PK columns like user_id bigint NOT NULL DEFAULT nextval('hibernate_sequence'::regclass); may deceive someone that an INSERT of a row without an ID explicitly specified would be ok, since the DB appears to know how to do the right thing. So, I'd propose to drop the default declarations, to avoid the deception: user_id bigint NOT NULL; --Vladimir |
From: Hilmar L. <hl...@ne...> - 2010-01-27 19:42:31
|
So does this mean that Hibernate assumes that no-one else is touching the database while it is running. Does this then also mean that we cannot touch the database (such as for inserting data) through any other means than the running instance of Hibernate while such an Hibernate is running? I.e., a Java program that also uses Hibernate but runs as a separate process won't do either. Is this a situation that we can live with, or does this need to be remedied? My gut feeling is the latter, but I'd like to hear people's thoughts. -hilmar On Jan 27, 2010, at 1:46 PM, Vladimir Gapeyev wrote: > > Here is what happened to user_id and person_id in an empty database > after I created a handful new users, starting with > hibernate_sequence = 1000 (that's ONE thousand) > > user_id username person_id hibernate_sequence > 10000 "vgapeyev"; 10010 1001 > 10001 "vgapeyev2"; 10011 1001 > 10002 "vgapeyev3"; 10012 1001 > 10020 "vgapeyev4"; 10030 1003 > > That is, Hibernate uses values from hibernate_sequence to generate > PK ids in some cleverly mysterious ways. > > Consequently, the current declarations of PK columns like > user_id bigint NOT NULL DEFAULT > nextval('hibernate_sequence'::regclass); > > may deceive someone that an INSERT of a row without an ID explicitly > specified would be ok, since the DB appears to know how to do the > right thing. > > So, I'd propose to drop the default declarations, to avoid the > deception: > > user_id bigint NOT NULL; > > --Vladimir > > ------------------------------------------------------------------------------ > The Planet: dedicated and managed hosting, cloud storage, colocation > Stay online with enterprise data centers and the best network in the > business > Choose flexible plans and management services without long-term > contracts > Personal 24x7 support from experience hosting pros just a phone call > away. > http://p.sf.net/sfu/theplanet-com_______________________________________________ > Treebase-devel mailing list > Tre...@li... > https://lists.sourceforge.net/lists/listinfo/treebase-devel -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- informatics.nescent.org : =========================================================== |
From: Vladimir G. <vla...@du...> - 2010-01-27 20:09:47
|
On Jan 27, 2010, at 2:42 PM, Hilmar Lapp wrote: > So does this mean that Hibernate assumes that no-one else is > touching the database while it is running. > Does this then also mean that we cannot touch the database (such as > for inserting data) through any other means than the running > instance of Hibernate while such an Hibernate is running? I.e., a > Java program that also uses Hibernate but runs as a separate process > won't do either. It's more like that the other program must respect the same convention for using hibernate_sequence that Hibernate in TB2 web front end uses. (I hope that a program on top of the TB2 DAO layer would respect it.) It is possible that the Hibernate convention is designed to allow for concurrent agents -- since it relies on transactional increments of hibernate_sequence. > Is this a situation that we can live with, or does this need to be > remedied? To have hope of meeting Feb 5, we have to live with it, I think. (Unless it also messes up data importing.) --VG |
From: William P. <wil...@ya...> - 2010-01-27 20:04:45
|
On Jan 27, 2010, at 2:42 PM, Hilmar Lapp wrote: > So does this mean that Hibernate assumes that no-one else is touching the database while it is running. > > Does this then also mean that we cannot touch the database (such as for inserting data) through any other means than the running instance of Hibernate while such an Hibernate is running? I.e., a Java program that also uses Hibernate but runs as a separate process won't do either. Is this a situation that we can live with, or does this need to be remedied? > > My gut feeling is the latter, but I'd like to hear people's thoughts. My thoughts exactly. I wrote the below text before you beat me to this issue. I'd really like to keep alternative interfaces open, if possible. e.g., I had imagined that I (or others) would create special interfaces, e.g. for doing housekeeping work without having to burden our (already bloated) java codebase. For example, I happen to know that there are tons of incorrect mappings between taxon_labels and taxa (which are data resources scoped to the entire database) and I was assuming that I could fix/edit these later -- or periodically update with new versions of NCBI, uBIO, etc. -- without having to add more and more miscellaneous features on the java side. bp > So -- is it the case that records cannot be created by an alternative route? e.g. using perl DBI/DBD. In prior discussion about what kind of sequencing system to use, I had the impression that neither of the two choices precluded our ability to access the data via a route other than java/hibernate. This seems really disappointing to me, because I think it would be really valuable to be able to have alternative interfaces. > > Am I right in thinking that hibernate's sequencing method will make it very difficult to use alternative connections like DBI/DBD? Also, if we keep the "nextval" declaration as-is, does that preserve our ability to use DBI/DBD (even if it is deceiving)? If the hibernate-generated ids are indeed an order of magnitude greater than the hibernate_sequence, then a new record created with nextval will collide with existing ids created by hibernate. > > bp |
From: Vladimir G. <vla...@du...> - 2010-01-27 20:19:08
|
On Jan 27, 2010, at 3:04 PM, William Piel wrote: >> Am I right in thinking that hibernate's sequencing method will make >> it very difficult to use alternative connections like DBI/DBD? >> Also, if we keep the "nextval" declaration as-is, does that >> preserve our ability to use DBI/DBD (even if it is deceiving)? If >> the hibernate-generated ids are indeed an order of magnitude >> greater than the hibernate_sequence, then a new record created with >> nextval will collide with existing ids created by hibernate. That's right, any inserts that bypass the TB2 DAO layer will be likely to cause conflicts with the PK generation algorithm used by Hibernate. This is what we started with, when Youjun ran into the unit tests (that use DAO) conflicting with PKs already in DB (generated via postgres nextval()) . --VG |
From: youjun g. <you...@ya...> - 2010-01-27 20:54:18
|
Every table in TB was created with a piece of script incuding a line like this: phylotreenode_id bigint NOT NULL DEFAULT nextval('hibernate_sequence'::regclass), My guess is this is not a hibernate issue, instead it's a business of postgresql. No matter through what application a insert to the table will always cause a query to the sequence. Think of this, if we change the name of "hibernate_sequence" to something else will it work the same way? will other application know the exist of hibernate? Youjun On Wed, Jan 27, 2010 at 2:42 PM, Hilmar Lapp <hl...@ne...> wrote: > So does this mean that Hibernate assumes that no-one else is touching the > database while it is running. > > Does this then also mean that we cannot touch the database (such as for > inserting data) through any other means than the running instance of > Hibernate while such an Hibernate is running? I.e., a Java program that also > uses Hibernate but runs as a separate process won't do either. Is this a > situation that we can live with, or does this need to be remedied? > > My gut feeling is the latter, but I'd like to hear people's thoughts. > > -hilmar > > > On Jan 27, 2010, at 1:46 PM, Vladimir Gapeyev wrote: > > > Here is what happened to user_id and person_id in an empty database after I > created a handful new users, starting with hibernate_sequence = 1000 (that's > ONE thousand) > > user_id username person_id hibernate_sequence > 10000 "vgapeyev"; 10010 1001 > 10001 "vgapeyev2"; 10011 1001 > 10002 "vgapeyev3"; 10012 1001 > 10020 "vgapeyev4"; 10030 1003 > > That is, Hibernate uses values from hibernate_sequence to generate PK ids > in some cleverly mysterious ways. > > Consequently, the current declarations of PK columns like > user_id bigint NOT NULL DEFAULT nextval('hibernate_sequence'::regclass); > > may deceive someone that an INSERT of a row without an ID explicitly > specified would be ok, since the DB appears to know how to do the right > thing. > > So, I'd propose to drop the default declarations, to avoid the deception: > > user_id bigint NOT NULL; > > --Vladimir > > > ------------------------------------------------------------------------------ > The Planet: dedicated and managed hosting, cloud storage, colocation > Stay online with enterprise data centers and the best network in the > business > Choose flexible plans and management services without long-term contracts > Personal 24x7 support from experience hosting pros just a phone call away. > > http://p.sf.net/sfu/theplanet-com_______________________________________________ > Treebase-devel mailing list > Tre...@li... > https://lists.sourceforge.net/lists/listinfo/treebase-devel > > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- informatics.nescent.org : > =========================================================== > > > > > > ------------------------------------------------------------------------------ > The Planet: dedicated and managed hosting, cloud storage, colocation > Stay online with enterprise data centers and the best network in the > business > Choose flexible plans and management services without long-term contracts > Personal 24x7 support from experience hosting pros just a phone call away. > http://p.sf.net/sfu/theplanet-com > _______________________________________________ > Treebase-devel mailing list > Tre...@li... > https://lists.sourceforge.net/lists/listinfo/treebase-devel > > |
From: Vladimir G. <vla...@du...> - 2010-01-27 21:48:32
|
On Jan 27, 2010, at 3:53 PM, youjun guo wrote: > Every table in TB was created with a piece of script incuding a line > like this: > phylotreenode_id bigint NOT NULL DEFAULT > nextval('hibernate_sequence'::regclass), Just to make sure we do not speak past each other... Column declarations like the above are what one can see in the treebasedev instance since last Friday, when I announced the "switch to hibernate_sequence". The declarations are exactly the effect of that switch. Till Friday, the columns were declared as phylotreenode_id bigint NOT NULL DEFAULT nextval('phylotreenode_id_sequence'::regclass), For reference, the treebasestage instance still has the old declarations. I would guess individual-sequence declarations were in effect since at least mid-Nov 2009, when Rutger added them to treebase-core/src/main/ resources/TBASE2_POSTGRES_CREATION.sql in SVN. The original version of this schema (in SVN since mid-June-2009) did not have any auto-increment declarations on PKs, as far as I can see. I would not yet guess one way or another how the columns were declared in the actual database at creation time, or when the first import from TB1 was done. --VG |
From: William P. <wil...@ya...> - 2010-01-28 14:03:38
|
On Jan 27, 2010, at 4:48 PM, Vladimir Gapeyev wrote: > The original version of this schema (in SVN since mid-June-2009) did not have any auto-increment declarations on PKs, as far as I can see. Is this the DB2 schema, or the Pg schema? Rutger and Mark J. D. converted the DB2 schema into the Pg schema -- and maybe that's what you're looking at. All existing migrated records (i.e., where all tables have id numbers that start at one and increase by one -- no big jumps and no order-of-magnitude-bigger) were done on the DB2 schema. So it's possible that: (1) the DB2 schema had a different sequencing system; (2) when the Pg schema was first designed, we forgot to include auto-increment annotations; (3) the auto-increment was later added by Rutger, but that (4) something wasn't working right with them, and hence we switched to database-wide hibernate_sequence. If it is the case that the original DB2 way of doing this allows us to bypass the DAO layer, can we revert to that in our Pg deployment? (Rutger -- can you give Vladimir access to the SDSC DB2 schema for him to see how this was done?) bp |
From: Vladimir G. <vla...@du...> - 2010-01-28 17:45:23
|
On Jan 28, 2010, at 8:52 AM, William Piel wrote: > On Jan 27, 2010, at 4:48 PM, Vladimir Gapeyev wrote: >> The original version of this schema (in SVN since mid-June-2009) >> did not have any auto-increment declarations on PKs, as far as I >> can see. > Is this the DB2 schema, or the Pg schema? Rutger and Mark J. D. > converted the DB2 schema into the Pg schema -- and maybe that's what > you're looking at. > (Rutger -- can you give Vladimir access to the SDSC DB2 schema for > him to see how this was done?) I attach the history of the schema after this message. (It might shed light on some points below.) > All existing migrated records (i.e., where all tables have id > numbers that start at one and increase by one -- no big jumps and no > order-of-magnitude-bigger) were done on the DB2 schema. Bill, do you mean the data migration was done on the DB2 instance (it's impossible to have a DB schema in a postgres instance!), and then the result was ported to Postgres (via an SQL dump and restore?). If so, was the web app connected to the DB2 instance ever used to enter new data after the migration? Inspecting IDs for that data may shed some light on how Hibernate sequencing behaved back then. Based on Youjun's assessment that TB2 DAO layer uses default Hibernate settings, my guess to explain our troubles would be this: Hibernate uses different default conventions on DB2 vs PostgreSQL. While the web app was developed, against DB2, Hibernate relied on DB2 auto- increments for PK's. When the database was suddenly switched from DB2 to Postgres, Hibernate created hibernate_sequence, as per its default for Postgres. If this guess is right, the logic of the DAO layer was built in the context "PKs are incremented one-by-one, by the DBMS". Ditto for the migration scripts. > So it's possible that: (1) the DB2 schema had a different sequencing > system; (2) when the Pg schema was first designed, we forgot to > include auto-increment annotations; (3) the auto-increment was later > added by Rutger, but that (4) something wasn't working right with > them, and hence we switched to database-wide hibernate_sequence. > > If it is the case that the original DB2 way of doing this allows us > to bypass the DAO layer, can we revert to that in our Pg deployment? I agree that all this argues in favor of ditching the hibernate_sequence switch and finding out the most appropriate way to instruct Hibernate to rely on DBMS id generation. I do not know what it should be exactly. If I recall right, Youjun's proposal in this direction called for annotations on each DAO class to instruct it which sequence to use. This does not sound right to me, since this instruction is already in the schema, and I'd trust Hibernate implementers to have designed how to take advantage of that. --Vladimir ======================================================== Here is the history of the schema, as I can reconstruct it from SVN: - The June 2009 version of TBASE2_POSTGRES_CREATION.sql contains extensive commented-out SQL, for objects in the JRUAN schema. I'd guess that's the DB2 code. In it, PKs are defined as ALGORITHM_ID BIGINT GENERATED BY DEFAULT AS IDENTITY (START WITH 1, INCREMENT BY 1, CACHE 20) NOT NULL, - The actual postgres code in the June 2009 version defines PKs as algorithm_id bigint NOT NULL, There are no sequences defined in this code. - The next, Nov 18, version added table-specific sequences: CREATE SEQUENCE algorithm_id_sequence; ALTER TABLE algorithm ALTER COLUMN algorithm_id SET DEFAULT nextval('algorithm_id_sequence'); - For the next, Nov 30 version, the SVN comment says "Recalculated sequence restart values". It also dropped all commented-out DB2 code, which confuses my diff tool, so I am not sure what exactly happened. All RESTART statements, like -- alter sequence analysis_id_sequence restart with 4887; are commented out in both November versions. The restart numbers are different in each version. - Dec 9, and final, version defines a few indexes. None of these versions defines hibernate_sequence. |
From: Rutger V. <rut...@gm...> - 2010-01-29 16:24:59
|
> Here is the history of the schema, as I can reconstruct it from SVN: > - The June 2009 version of TBASE2_POSTGRES_CREATION.sql contains extensive > commented-out SQL, for objects in the JRUAN schema. I'd guess that's the > DB2 code. In it, PKs are defined as > ALGORITHM_ID BIGINT GENERATED BY DEFAULT AS IDENTITY (START WITH 1, > INCREMENT BY 1, CACHE 20) NOT NULL, That's right, the commented out code were the creation statements for DB2, exported by Aqua data studio, which I translated by hand into what seemed like the correct statements for PostgreSQL > - The actual postgres code in the June 2009 version defines PKs as > algorithm_id bigint NOT NULL, > There are no sequences defined in this code. Right, which was *definitely* wrong. DB2 was instructed to make IDs automatically ("GENERATED BY DEFAULT AS IDENTITY"), so by that same logic PostgreSQL needed to do something functionally similar. > - The next, Nov 18, version added table-specific sequences: > CREATE SEQUENCE algorithm_id_sequence; > ALTER TABLE algorithm ALTER COLUMN algorithm_id SET DEFAULT > nextval('algorithm_id_sequence'); My understanding was that this was the closest way to mimic DB2's behaviour. > - For the next, Nov 30 version, the SVN comment says "Recalculated sequence > restart values". It also dropped all commented-out DB2 code, which confuses > my diff tool, so I am not sure what exactly happened. All RESTART > statements, like > -- alter sequence analysis_id_sequence restart with 4887; > are commented out in both November versions. The restart numbers are > different in each version. Yup, two things happened: I removed the old DB2 code, and I was trying to get the sequences to work such that they would start auto-incrementing beyond the highest value of the then-assigned IDs in each table. > - Dec 9, and final, version defines a few indexes. > None of these versions defines hibernate_sequence. That is correct, I have never defined a hibernate_sequence in the PostgreSQL creation script. -- Dr. Rutger A. Vos School of Biological Sciences Philip Lyle Building, Level 4 University of Reading Reading RG6 6BX United Kingdom Tel: +44 (0) 118 378 7535 http://www.nexml.org http://rutgervos.blogspot.com |
From: William P. <wil...@ya...> - 2010-01-28 19:20:44
|
On Jan 28, 2010, at 12:41 PM, Vladimir Gapeyev wrote: > Bill, do you mean the data migration was done on the DB2 instance ... and then the result was ported to Postgres (via an SQL dump and restore?). Yes. (and as a corollary, a scary thing is that the migration scripts have never been tested on the Pg instance) > If so, was the web app connected to the DB2 instance ever used to enter new data after the migration? Yes, in that we went through a beta-testing phase intermingled with data migration. My recollection is that new records from beta-testing increased ids in normal fashion (meaning i+1 from the prior one -- not an order of magnitude greater). For example, go here: http://8ball.sdsc.edu:6666/treebase-web/home.html go to submission tab and log-in as "Pieltest" with password "Pieltest". You'll see that the submission_ids are in the 2,300 - 2,400 range. You can try creating several dummy submissions one after the other -- and see if they increment by one. bp |
From: Vladimir G. <vla...@du...> - 2010-01-28 20:39:31
|
On Jan 28, 2010, at 2:20 PM, William Piel wrote: > >> Bill, do you mean the data migration was done on the DB2 >> instance ... and then the result was ported to Postgres (via an SQL >> dump and restore?). > > Yes. (and as a corollary, a scary thing is that the migration > scripts have never been tested on the Pg instance) > >> If so, was the web app connected to the DB2 instance ever used to >> enter new data after the migration? > > Yes, in that we went through a beta-testing phase intermingled with > data migration. My recollection is that new records from beta- > testing increased ids in normal fashion (meaning i+1 from the prior > one -- not an order of magnitude greater). I played with the SDSC UI as you suggested, and it indeed creates consecutive ids. All this speaks in favor of the theory that TB2 Hibernate got confused by the DB2-->Postgres switch, which was done in the hope that Hibernate default behavior would remain the same. It appears we must now figure out how to explicitly instruct TB2 DAO and Hibernate to behave in the way that best matches their behavior back in the DB2 days. If I am charged with this, I'll need to build some basic hands-on knowledge of Hibernate by doing at least one tutorial, go through some docs, and build a small project that mimics the TB2 setup, to understand what needs to be done. This will take a few days. Does anyone feel confident in their understanding of TB2 DAO and Hibernate to lead us along a faster path? --Vladimir |
From: Hilmar L. <hl...@ne...> - 2010-01-28 23:11:19
|
On Jan 28, 2010, at 3:39 PM, Vladimir Gapeyev wrote: > Does anyone feel confident in their understanding of TB2 DAO and > Hibernate to lead us along a faster path? Vladimir - can you ask Xianhua about this? I believe we are using it for PLHDB to entrust sequence generation to the database. The trick is that in this case Hibernate has to first create the record in the database, and then look it up again right away to see which PK was assigned. There is a hibernate config flag to instruct it to do that. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- informatics.nescent.org : =========================================================== |
From: Rutger V. <rut...@gm...> - 2010-01-29 16:28:10
|
> All this speaks in favor of the theory that TB2 Hibernate got confused > by the DB2-->Postgres switch, which was done in the hope that > Hibernate default behavior would remain the same. It appears we > must now figure out how to explicitly instruct TB2 DAO and Hibernate > to behave in the way that best matches their behavior back in the DB2 > days. Right, and that behaviour would be that the database itself autoincrements IDs without intervention from hibernate, as the DB2 creation commands indicate. > Does anyone feel confident in their understanding of TB2 DAO and > Hibernate to lead us along a faster path? I certainly can't: you can see my flailing and floundering in the svn history of the creation script. -- Dr. Rutger A. Vos School of Biological Sciences Philip Lyle Building, Level 4 University of Reading Reading RG6 6BX United Kingdom Tel: +44 (0) 118 378 7535 http://www.nexml.org http://rutgervos.blogspot.com |
From: Vladimir G. <vla...@du...> - 2010-01-29 19:07:52
|
I am still figuring out, in consultation with Xianhua here, how to instruct Hibernate to do what we want, but it is clear we need back a DB instance with individual sequences. The most expedient way would be to just re-create it from a backup. So, does anyone object to dropping the current treebasedev instance (that's the one connected to 6666 as well as treebase-dev.nescent.org front-ends)? Just in case, we will preserve a recent backup. (Would one from yesterday or the day before suffice?) If that's ok, I'll ask Jon to empty treebasedev, and re-create from the same Dec 8 backup he used for treebasestage. Depending on whether Jon has extra space, this might cause a few hours of downtime. --Vladimir On Jan 29, 2010, at 11:28 AM, Rutger Vos wrote: >> All this speaks in favor of the theory that TB2 Hibernate got >> confused >> by the DB2-->Postgres switch, which was done in the hope that >> Hibernate default behavior would remain the same. It appears we >> must now figure out how to explicitly instruct TB2 DAO and Hibernate >> to behave in the way that best matches their behavior back in the DB2 >> days. > > Right, and that behaviour would be that the database itself > autoincrements IDs without intervention from hibernate, as the DB2 > creation commands indicate. |
From: William P. <wil...@ya...> - 2010-01-29 19:39:58
|
I'm okay with this. bp On Jan 29, 2010, at 2:07 PM, Vladimir Gapeyev wrote: > I am still figuring out, in consultation with Xianhua here, how to > instruct Hibernate to do what we want, but it is clear we need back a > DB instance with individual sequences. The most expedient way would > be to just re-create it from a backup. > > So, does anyone object to dropping the current treebasedev instance > (that's the one connected to 6666 as well as treebase-dev.nescent.org > front-ends)? Just in case, we will preserve a recent backup. (Would > one from yesterday or the day before suffice?) > > If that's ok, I'll ask Jon to empty treebasedev, and re-create from > the same Dec 8 backup he used for treebasestage. Depending on whether > Jon has extra space, this might cause a few hours of downtime. > > --Vladimir |