|
From: Vlad H. <hv...@us...> - 2007-01-29 14:11:20
|
> >> Doing extra/larger writes to preallocate space on them is a waste of cycles.
> >
> > My point is that preallocating space does not write to file - only in system
> > catalog. On windows it is true
>
> But for the non-windows file systems this is not necessarily true.
Now i see it, thanks for the explanations
> >> For file systems, it will depend on the
> >> allocation policy of file system used ...
> >
> > Preallocate at least one page is a must for "disk full" goal. So question
> > is will we get an user option to define how much pages can be preallocated,
> > right ?
>
> Not exactly. Users may want to disable the "file fragmentation
> avoidance" feature in cases where it does not benefit them because of
> their choice of file system. However, they may still want protection
> from "disk full" errors. That's why I'm not sure coupling the two goals
> is a good idea.
Without preallocating at least one page i doubt we can avoid "disk full" errors
> I guess users could specify a value of one page to protect them from the
> "disk full" condition only, but that is a not obvious.
Therefore i offer always allocate new page physically before fake it.
It worked fine on Windows - i even get a little performance increase at
database restore
> >> Also, just in case you're not aware of it, many unix-like file systems
> >> have "sparse" file space allocation. That is, files can have unallocated
> >> "holes" in them. For example, create an empty file, seek to offset
> >> 10,000,000, write one byte and then close the file. After this, the
> >> file's "size" may appear to be 10,000,001 bytes. However, how much file
> >> space was allocated depends upon the file system used. It might be one
> >> fs block (say 8KB), one fs extent (say 150MB), ~10MB, or something else.
> >> So, on "sparse" file systems, you'll probably need to write at least
> >> one byte to each fs block you want to preallocate.
> >
> > From what i know about Windows spare files - it is not recommended for
> > database use
>
> Sorry, many of the unix-like file systems in use today use sparse space
> allocation (e.g., ext3, reiserfs, UFS, etc.). I think you'll need to
> deal with this issue on these file systems.
We already agreed that posix have no standard and cheap way to physically
extend file. Sad but we must live with it
> You could try posix_fallocate() to preallocate space. If that isn't
> supported on your target platform, there is another trick I've seen used
> to allocate space at the end of a file efficiently. You call the
> sendfile() system call. This call transfers the data from one file
> descriptor to another and does the entire transfer in the kernel! If you
> want to try this, I can explain how this might be done.
Thank you, but most time is spend not in (relatively cheap) memory transfers
but filling disk with data. I doubt it will work much faster then plain write() :)
> >> An alternative implementation that I've seen is to keep an "emergency
> >> free block list" of preallocated (or previously freed) blocks.
> >
> > What i propose can be named such also
>
> They are not the same.
Now i see it ;)
> >> This space is reserved for writes during "disk full" conditions only.
> >
> > When ?
>
> In my proposed implementation, you only preallocate the "emergency page
> list" space at database creation time OR after you restart after a "disk
> full" error, when we hope there is free space available. You only use
> these emergency page(s) in the event of a ENOSPC (no space left on
> device) error on a file write. The page(s) are reserved exclusively for
> this purpose.
Where this pages will be stored ? In database file ? Then this is very close
to my initial offer except of that that pages will never used in normal cases and we
need implement new page type and way to manage this list.
In separate file ? Then we have no guarantee that we will use this space
if there are concurrent process actively eating disk space.
How much pages we will need to preallocate ? At least page cache size.
What it we change page cache size after initial preallocation ?
As you see - this is not as easy as it seems
> Thus, periodic preallocation is not required. There is virtually no
> additional runtime cost for this implementation. And since you say you
> only need one emergency page to handle the "disk full" error, then that
> is all need to allocate for the "emergency page list" -- one page.
No. One page is enough if we allocate it before faking the new page. If we
leave page allocation mechanism as is then we'll need page_cache_size
pages in reserve in worst case.
> >> This has the
> >> advantage of having "near zero runtime cost" because you only burn extra
> >> cpu cycles for preallocation at db creation time and after a "disk full"
> >> condition occurs.
>
> > My way also have "near zero runtime cost" ;)
>
> :) Actually, you're proposal does incur additional runtime costs
> because it periodically preallocates space and then uses that space. So,
> you must periodically replenish your supply of preallocated pages. This
> isn't required in the alternate implementation that I described.
In my offer cost of preallocation is cheap and rare because it is done by
big chunks. On windows only, of course ;(
> > BTW, how can we extend file after a "disk full" condition occurs ?
>
> There are various ways you might do this. One way, assuming the db would
> be restarted after the disk full error, is to check at db start time (or
> the first db file write) to see if the emergency page list is full. If
> it is not, fill it and then proceed.
>
> If you don't want to require a restart, I suppose you could do something
> like run in "read only" mode until some free space becomes available,
> then refill the emergency page list, and then again allow db writes.
It looks as one more problem in your offer ;)
Regards,
Vlad
|