|
From: Ann W. H. <aha...@ib...> - 2010-01-05 21:37:18
|
Gary Franklin wrote: > > Our company's testing department created a stress test that creates a large number of > tables, populates them, performs queries and then deletes the tables and does this > over and over many times. When you say "deletes the tables" do you mean that it deletes the records or that it drops the tables - or both? Is there any chance that some other connection has an open transaction while the test is running? Could you run gstat -a -r on the database after it gets bloated. That will show where the space is being lost, I think. And also run gstat -h while the test is running for a definitive answer on whether there are old transactions around during the test, not just after it has completed. The numbers are very disturbing - and its slightly interesting that the file sizes are exactly the same for the database after it's been backed up and restored and the first run. I'd expect some variation unless this is a single user test - or at least a single writer test. > > I would expect the size of the database to grow the first time I run the test (from > 802,816 bytes to 3,600,384 bytes) but why does the database continue to grow > each and every run after that? Shouldn't the unused pages be used instead of > the database continually getting bigger? Yes, of course unused pages should be reused. It may grow a bit because garbage collection happens later in one run than another or record placement changes from run to run. > > Is there any way to run the garbage collection after a specific number > of transactions have taken place, say 10,000, rather than being triggered > off the sweep interval? Garbage collection runs continuously, at least that's the theory. When a transaction reads a record, it checks for old versions that are no longer visible to running transactions. If there are, it removes them or puts them on a list of records to be removed. If it finds a deleted record and there are no transactions that started before the delete was committed, it does the same. There are patterns of usage that defeat that scheme - for example, if you always use indexed access through an index that always increments, delete only the oldest records, and read only the newest, the old records are never revisited, so they never go away. Sweep reads all of every table, getting rid of that sort of detritus. The sweep interval is based on the difference between the oldest non-committed transaction and the current transaction which generally works well - you don't want to have the overhead of a transaction reading every record in the database unless you need it and the on-going garbage collection should be good enough. > > Is there a way to automatically reduce the size of the database without > running gbak? No. The database is internally linked by page number, so it has to be completely rewritten to release pages. Best regards, Ann |