Re: [Linux-NTFS-Dev] Repost: NTFS Writing - Analysis

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

At 13:55 04/07/01, Richard Russon wrote:
>I've taken a careful look at all the actions we need to perform to write
>to an NTFS volume.  I've tried to break everything down into atomic
>pieces.
>
>The simplest example is touch_file().  We have to update the $MFT entry
>and the Index Entry. We also need to touch the $MFT, to show that it has 
>been altered.

Does NT really update the times of the MFT record for $MFT every single 
time the MFT is written to?!? It seems crazy... You could say this is 
putting some very heavy wear on the disk sectors containing the first mft 
record...

>So, touch_file() looks like:
>
>     touch_file()
>         update_file()            # update file
>         finalise()               # update $MFT
>
>     update_file()
>         Update $MFT entry        # update file's entry
>         Update Index entry
>
>     finalise()
>         update_file ($MFT)       # update timestamp
>         Copy $MFT to $MFTMirr
>
>The finalise will be needed after every update.

Apart from the nomenclature which I don't like (but that's details...) I 
agree in general. However, I disagree about finalise(). IMO 
update_file($MFT) and "Copy $MFT to $MFTMirr" should be one and the same 
thing. - When we are writing to $MFT anywhere in offset 0 to 4 * MFT record 
size, we have to write the data to $MFTMirr as well. - The best solution is 
IMO to put this into the low level write routines themselves, in fact the 
lower we go the better for transparency of $MFTMirr updates. After all the 
update of $MFTMirr should be in the same transaction as the update of $MFT. 
- Since the $MFTMirr is a byte-wise copy of $MFT's first four mft records, 
rather than a functional copy, it is not a problem to have the copy done 
real deep inside the write routines. - Maybe in (the yet to be written) 
ntfs_write_page() or something similar.

>Indeed if we copy NT exactly, then after every file READ, we need to 
>update the access times of the file, i.e. The $MFT and its mirror will 
>need updating.

Are you sure about this? AFAIK, NT will only update access times after a 
time delta has passed since the last update (I do not know how large the 
time delta is admittedly), and even then, only if update time updates are 
enabled in the registry (which they are by default, but setting the right 
registry key will deactivate the updates). - We will of course have a 
(re)mount option to disable access time updates and hence beet NT since it 
will per mounted volume and not for the whole driver. (-;

>The actual touch is broken into update & finalise to prevent infintite
>loops when touching the $MFT.

Infinite loops are something we definitely have to watch for carefully 
during the driver design/implementation as NTFS is full of dangers 
involving them due to having all the metadata being files...

>Another simple case is append_file().
>
>     append_file()
>         write_data()             # write data to disk
>         touch_file()             # update timestamp
>
>     write_data()
>         Write data               # put data on disk
>         Update $Bitmap           # set bits
>         touch_file($Bitmap)      # update timestamp

Update $Bitmap AFTER writing the data?!? I think I am hearing voices and 
they are all screaming "RACE condition"...

My suggestion would be:

         append_file()
         {
                 if (!write_attribute($DATA,...))
                         fail();
                 touch_file();
                 touch_file($MFT);
         }

         write_attribute()
         {
                 ... sees that it has to extend ...
                 if (!resize_attribute())
                         fail();
                 if (!inline: write_data_into_attribute &
                                 update attribute record &
                                 lock_inode_and_mft_record while doing this) {
                         resize_attribute(back to old size);
                         fail();
                 }
         }

         resize_attribute()
         {
                 inline: calculate nr_clusters needed;
                 if (!allocate_clusters())
                         fail();
                 lock_both_inode_and_mft_record_for_write(separate?);
                 if (!do_page_cache_magic_to_add_data_to_pages()) {
                         unlock_inode_and_mft_record();
                         deallocate_clusters();
                         fail();
                 }
                 inline: update_attribute record (run list, etc)
                 unlock_both_inode_and_mft_record(separate?);
         }

         allocate_clusters()
         {
                 lock_bitmap();
                 inline: allocate_clusters_and_create_run_list_of_them;
                 unlock_bitmap();
                 touch_file(bitmap);
                 return run_list;
         }

Note: I haven't written above optimized, it's just conceptual. Otherwise 
for example allocate clusters would not necessarily do the locking this way...

>The code we have tries to do everything itself.  If we can break it down
>into packets of work, two things happen.

>We can mimic, possibly even USE the log file,

Yes.

>and second we could queue and coalesce similar requests.

No. This already happens at two levels and we really do not need to do it 
and in fact can't do it usefully AFAICS. 1) At the level of the block 
devices in ll_rw_block and friends consecutive accesses are already 
batched/coalesced. 2) The page cache buffers multiple reads/writes and only 
invokes NTFS when we either need to really do a write or read (basically 
the stuff in address space operations). In the mean time we are never 
invoked and we don't care... The VFS rules. (-;

>Any thoughts / comments?

As above. Sorry it took so long to reply but I have been very busy in the 
lab...

Anton

-- 
   "Nothing succeeds like success." - Alexandre Dumas
-- 
Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @)
Linux NTFS Maintainer / WWW: http://linux-ntfs.sf.net/
ICQ: 8561279 / WWW: http://www-stu.christs.cam.ac.uk/~aia21/

Re: [Linux-NTFS-Dev] Repost: NTFS Writing - Analysis

Development moved to https://sourceforge.net/projects/ntfs-3g/

Re: [Linux-NTFS-Dev] Repost: NTFS Writing - Analysis