Hi, Vlad.
>> Reading through this thread, it seems to me that the basic design idea of
>> having a database / system wide configuration parameter solves, or can
>> solve
>> the problem of OS disk fragmentation.
>
> Perhaps
>
>> However, it looks to me like this is the wrong problem to solve.
>
> Why ? Anyway - if we can solve it by the cheap way - why not ?
Read the performance numbers below, and I'm sure it will become a lot
clearer.
>> As far as I
>> can tell, the whole point of doing things like allocating large blocks is
>> to
>> help performance of the database system - this is a system as a whole,
>> NOT
>> just the way the O/S allocates pages off a disk / disks.
>>
>> In order to do this in a more useful manner, what I propose is as
>> follows:
>>
>> 1) A parameter PER TABLE, which specifies the growth factor (KB/MB/No of
>> pages/Whatever, I don't really care how). You can access this with an
>> ALTER
>> TABLE <whatever syntax you like>, to change the setting. Default would be
>> current behaviour of single page at a time, if not otherwise specified.
>
> We want to avoid additional parameters as much as possible (we are
> Firebird,
> not ORACLE and want to stay 'self-tune' ;).
I absolutely agree. If you read my post a bit more carefully, you will
notice that the default behaviour (ie: You, as a SYSDBA, have NOT used the
"ALTER TABLE MY_BIG_TABLE SET AUTOEXPANDPAGECOUNT 1234" <or whatever other
syntax you like>), is exactly the way it is today. If you want to call that
'self-tuning', then that's fine.
On the topic of ORACLE, while I agree 100% that there is a HUGE amount of
stuff in ORACLE that has no value to us, either because it is there for
historical purposes, or simply does not apply due to architectural
differences, it does not mean that there is absolutely nothing to learn from
ORACLE, or any other system. The requirement is to take things on their own
merit, and use what is useful, and learn from other ideas, regardless.
Yes, self-tuning is a neat goal, and if you really want to do this, then
that's great. However, I have no doubt that if I had submitted an initial
proposal that stated that the requirement was to have an automatic tuning
system for this, the post would have been regarded as too much work. Let's
get something reasonably simple working, and later on you can do the
'correct' self-tuning, if you like, by expanding the statement to allow
something like "ALTER TABLE MY_BIG_TABLE SET AUTOEXPANDPAGECOUNT
AUTOCALCULATE", or whatever. By all means. Heck, change this to default
behaviour, if you like, later on. :)
>> 2) A database specific configuration parameter, TO SUPPRESS use of the
>> growth factors.
>
> This is more appropriate (as for me)
>
>> 3) On a database restore, you DO NOT USE these values. (this solves the
>> 'I
>> have a readonly database I want to distribute, and I want it to be small'
>> problem).
>
> This is wrong way. If parameters present it must work.
> My personal opinion of course
Remember, I just proposed this as a way to solve the already debated issue
of 'I have a readonly database, and I want it small' problem. I certainly
won't be using it, and I don't think that many people who actually do have
tables with more than 50 million rows will either. People with small tables
won't even know about the feature in the first place, and therefore their
potential use of the parameter is irrelevant anyway.
>
>> All they have to do is backup the system, and restore, then
>> distribute the restored version as the 'readonly' system, with no extra
>> junk. After a restore, the tables will already be grouped together, with
>> a
>> single table's data in consecutive pages anyway (ok, excluding a page
>> here
>> and there, for housekeeping).
>>
>> The reasons that I think that this is the way to go are:
>>
>> a) I feel that it is rather pointless dealing with the O/S disk
>> fragmentation issues, and ignore the table fragmentation WITHIN the
>> allocated O/S file/files.
>
> Why do you think we'll ignore table fragmentation ? Because i still not
> post all RFC's i have to post ? :)
Nope... I have no idea what you think about - all I can use is what is
public info :)
A quick search of ib-architect, firebird-devel, and firebird-doc for the
phrase 'table fragmentation' produced 2 hits in total - my original post,
and your response. :)
>> An extremely common operation is database backup,
>> for example. What does it do? Reads each TABLE, sequentially. It is very
>> inexpensive to buy RAID systems that are capable of reading / writing
>> 500-800MB/S, sustained, for sequential reads/writes, but you only get
>> this
>> performance level if BOTH the O/S file fragmentation AND the table
>> fragmentation within the O/S file/files is minimised. For large tables,
>> this
>> is critical.
>
> Agree
Great.
>> b) It allows you to base your decision on real expected growth factors of
>> each table.
>
> We are not ORACLE do you remember it ? :)
Yes, I do. But please remember that there is a tremendous amount of focus on
multithreading, tls efficiencies, and about 100 other performance related
things in firebird, which are not really required if you want to write an
application to handle 'Shopping Lists'. :)
Firebird is growing up, bit by bit, and the more it grows up, the more
'Enterprise' type features that will be required to sustain largeish
systems. By all means, make the defaults do what they do today, but give
people who actually have non-trivial sized systems the ability to use
Firebird too.
>> c) It should increase backup performance SIGNIFICANTLY.
> It is overstated i'd said
OK. I have a REAL system. I pulled all the clients off the system. Ran a
garbage collection. Twice. After the first time, gstat told me that the
transactions were right next to each other. Oldest to next. So I ran it
again, just to make 100% sure there was no cruft lying about. NO GARBAGE.
Did a backup. 39 Minutes, 13 seconds.
Did a restore.
Did a backup of the just restored database. 11 minutes, 2 seconds.
Also note that there is just about ZERO O/S file fragmentation for these
tests. At the time, the tests were done on A RAID10 volume, consisting of 22
Physical disks, with ONLY 2 files on the entire RAID volume - The original
database, and the restored one. (Also note that the restored database of
course only existed after the original was 'as full as it was going to get',
for this test).
Significant is an understatement, not an overstatement.
>> d) If you really want to muck about with the data, restore for a test
>> system, whatever, and you do not want to incur the space overhead, all
>> you
>> have to do for the test system / whatever is to switch on the option to
>> suppress the growth factors.
>>
>> Other things to bear in mind:
>>
>> Obviously, your performance will suffer if you have lots of back
>> versions/garbage all over the place etc, but this is true, even if we
>> don't
>> use an intelligent growth scheme.
>
> Agree
>
>> (Perhaps this should be looked at as well,
>> so that there is initially an attempt to store old backversions on the
>> page
>> directly after the one with the primary record on it, or something like
>> that?)
>
> I don't know such attempts.
Nor do I. That's why I brought it up. So people can chew it over, and see
what they think. This fits squarely into the 'auto-tune' concept, if you ask
me. :) It's a hell of a lot more efficient to read the next consecutive
file block to get your back version than it is to read something 100 miles
away - most O/S do a certain amount of read-ahead anyway, and the block is
probably already knocking about in the O/S cache, due to this.
>
>> To keep the scheme clean, ultimately it makes sense to take pages which
>> have
>> become empty by deleting records from a table, and NOT make them
>> generally
>> accessible. Reserve them for the exclusive use of the table they
>> originally
>> came from, so that we don't end up refragmenting stuff that was already
>> defragmented by initial placement, on mass deletion. (Yes, I am quite
>> happy
>> to waste 1GB of table space after deleting 5 million records, because:
>> a) On backup, to read 1GB of empty table space will only take 2
>> seconds,
>> with decent hardware (because it is sequentially allocated, of course).
>> b) I also know that pretty soon, due to the nature of the app, that
>> within 6 hours, the 5 million records will be back, and the space will no
>> longer be 'wasted'.
>> )
>
> Without way to manage it we must not do it. One user know it will reuse
> space,
> second know it will not reuse space and third know nothing about it.
Ummm... not sure quite what you mean here. If you are suggesting that we
need a way to see whether or not a table has a magic automatic growth factor
attached to it, I agree with you.
The way to manage it is above. If you have already issued a
"ALTER TABLE MY_BIG_TABLE SET AUTOEXPANDPAGECOUNT 1234" <or whatever other
syntax you like> command and change your mind, then just issue a
"ALTER TABLE MY_BIG_TABLE SET AUTOEXPANDPAGECOUNT 1", and you're back to
where you were before.
I can't see the problem.
It's clean, simple to understand, simple to document and if you don't know
about the feature, It won't hurt you. Default is to do what we've been doing
all along.
Kind regards
Larry
|