[Linux-NTFS-Dev] delayed allocation for ntfs?

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Hi,

What do you think about using delayed allocation for NTFS. Here is my idea, 
let me know what you think...

The idea below is based on the simplest thing I could think off, i.e. there 
is no form of ENOSPC handling in prepare/commit_write.

Ok so here goes:

ntfs_prepare_write()
====================
if (page is uptodate)
         return success.
// For non-uptodate pages:
if (there is an allocation for the destination buffers) {
         map any partially overlapping buffers;
         read them synchronously from the backing store;
} else {
         // There is no allocation, i.e. hole or data extension
         zero any partially overlapping buffers;
}
return success;

ntfs_commit_write()
===================
mark buffers uptodate, but leave them unmapped;
update i_size if necessary;
set_page_dirty();
return success;

ntfs_writepage() -- analogous to the current ntfs_readpage()
================
- For non-resident, uncompressed attributes, map all *uptodate* buffers, 
allocating if necessary, and finally write them (i.e. only write uptodate 
buffers. if the page is uptodate, write all buffers). (If the attribute is 
mst protected, need to get fully exclusive access to the page, 
pre_write_mst protect the data, then have our async io completion handler 
post_write_mst deprotect the data again before unlocking the page.)

- For non-resident, compressed attributes, compress data chunk, allocate 
space if necessary, finally write out to backing store.

- For resident attributes, may need to convert to non-resident if size has 
grown too big, then go to the non-resident, uncompressed case above. If 
size is still small enough, just copy data to the mft record and mark that 
dirty for later write out (we could force a synchronous write if desired).

Conclusions
===========
The above proposal has the problem of permitting overallocation. So the 
user can overallocate without any form of stopping them. We could just have 
a stupid check like "if (NVolENOSPC(vol)) return -ENOSPC;" if we want so 
that once we notice we are actually out of space all further writes will be 
stopped. And we would do a simple NVolSetENOSPC(vol) when we notice we are 
out of space... The cluster deallocator would do a NVolClearENOSPC(vol).

We could of course do more complicated accounting to make sure we will 
never overallocate but that would make everything a lot more complicated...

Basically my proposal makes ntfs_writepage() the "workhorse" and keeps 
prepare/commit_write() as simple as possible. At the same time this speeds 
up writes a lot as we are not slowed down by allocations at write(2) time 
and allocate on vm writeback/sync instead.

Does this make sense or have I missed something important? Any 
better/alternative ideas are welcome!

What do you think about the necessity for free space accounting? Can we do 
just none? Should we do a simple ENOSPC per volume flag? Or do we really 
have to do full accounting to ensure we never overallocate?

The advantage of the delayed allocation is it allows easier handling of 
compressed files - there we cannot know how much space we will need as the 
page cache data size is not equal to the data written out to disk. We need 
to compress the data before we know. And we don't want to compress every 
time prepare/commit_write are called, otherwise byte by byte writes would 
_really_ suck performance wise.

But even for delayed allocation as above, if we decide to do full 
accounting of free space to prevent overallocation, we have a problem with 
compressed files. For extension of files or filling in of holes, we could 
just assume the data would not compress at all and charge that much in the 
accounting. But this becomes more complicated on overwrite as the new data 
is likely to compress differently well to the existing data so we may need 
to allocate more space just when overwriting and we have no way of telling 
the difference until we have compressed the data. This is why I suggest not 
to do accounting at all. Perhaps the ENOSPC flag would be useful as a 
sanity check though so an application can't trash the machine by just 
writing to a full partition and us not being able to write out the dirty 
data...

/me finishes in hope of stimulating some discussion or at least getting 
people's opinions...

Best regards,

         Anton

-- 
   "I've not lost my mind. It's backed up on tape somewhere." - Unknown
-- 
Anton Altaparmakov <aia21 at cantab.net> (replace at with @)
Linux NTFS Maintainer / IRC: #ntfs on irc.openprojects.net
WWW: http://linux-ntfs.sf.net/ & http://www-stu.christs.cam.ac.uk/~aia21/

[Linux-NTFS-Dev] delayed allocation for ntfs?

Development moved to https://sourceforge.net/projects/ntfs-3g/

[Linux-NTFS-Dev] delayed allocation for ntfs?