Hi Anton,
> Anton wrote...
> > Flatcap wrote...
> > The simplest example is touch_file(). We have to update the $MFT entry
> > and the Index Entry. We also need to touch the $MFT, to show that it has
> > been altered.
>
> Does NT really update the times of the MFT record for $MFT every single
> time the MFT is written to?!? It seems crazy... You could say this is
> putting some very heavy wear on the disk sectors containing the first mft
> record...
>
> > Indeed if we copy NT exactly, then after every file READ, we need to
> > update the access times of the file, i.e. The $MFT and its mirror will
> > need updating.
>
> Are you sure about this? AFAIK, NT will only update access times after a
> time delta has passed since the last update.
Well the MFT times seem to reflect the time of the last update (and I suppose
I ought to prefix every sentence with "I think that") The time delta sounds
like NT's fs cache. My ill-informed guess is that many of the updates are
getting swallowed.
> > So, touch_file() looks like:
> >
> > touch_file()
> > update_file() # update file
> > finalise() # update $MFT
> >
> > update_file()
> > Update $MFT entry # update file's entry
> > Update Index entry
> >
> > finalise()
> > update_file ($MFT) # update timestamp
> > Copy $MFT to $MFTMirr
> >
> > The finalise will be needed after every update.
>
> I agree in general. However, I disagree about finalise(). IMO
> update_file($MFT) and "Copy $MFT to $MFTMirr" should be one and the same
> thing. - When we are writing to $MFT anywhere in offset 0 to 4 * MFT record
> size, we have to write the data to $MFTMirr as well.
Yeah, fair enough for the practical details. I just wanted to break down
the actions into separate disk writes.
> The best solution is
> IMO to put this into the low level write routines themselves, in fact the
> lower we go the better for transparency of $MFTMirr updates. After all the
> update of $MFTMirr should be in the same transaction as the update of $MFT.
> - Since the $MFTMirr is a byte-wise copy of $MFT's first four mft records,
> rather than a functional copy, it is not a problem to have the copy done
> real deep inside the write routines. - Maybe in (the yet to be written)
> ntfs_write_page() or something similar.
Agreed. Admittedly the low levels would just be dealing with clusters
and would know little of MFT and Mirror, but this is an important exception.
> > The actual touch is broken into update & finalise to prevent infintite
> > loops when touching the $MFT.
>
> Infinite loops are something we definitely have to watch for carefully
> during the driver design/implementation as NTFS is full of dangers
> involving them due to having all the metadata being files...
I started writing a test prog, just to get some of the layout right.
It's a crummy program, but I found it easier to hack code than design :-)
(flatcap/text.c)
> > Another simple case is append_file().
> >
> > append_file()
> > write_data() # write data to disk
> > touch_file() # update timestamp
> >
> > write_data()
> > Write data # put data on disk
> > Update $Bitmap # set bits
> > touch_file($Bitmap) # update timestamp
>
> Update $Bitmap AFTER writing the data?!? I think I am hearing voices and
> they are all screaming "RACE condition"...
Damn, you're right :-) I thought about the order *I'd* want things written.
I thought: Number one - commit the data to disk. That way if things go wrong
something could possibly be recovered. Then the bitmap. Does sound a bit
dangerous, now you mention it.
> My suggestion would be:
> append_file()...
> write_attribute()...
> resize_attribute()...
> allocate_clusters()...
That looks good; I suppose my example was too simple. I need to rework it
and think about attributes.
> > The code we have tries to do everything itself. If we can break it down
> > into packets of work, two things happen.
> > We can mimic, possibly even USE the log file,
>
> Yes.
>
> > and second we could queue and coalesce similar requests.
>
> No. This already happens at two levels and we really do not need to do it
> and in fact can't do it usefully AFAICS. 1) At the level of the block
> devices in ll_rw_block and friends consecutive accesses are already
> batched/coalesced. 2) The page cache buffers multiple reads/writes and only
> invokes NTFS when we either need to really do a write or read (basically
> the stuff in address space operations). In the mean time we are never
> invoked and we don't care... The VFS rules. (-;
OK. I didn't want to rewrite the VFS, though I had a sort-of plan.
If we were queueing our requests, then when we encountered a problem
(bad sector, etc) we could try and work around it (or just rollback).
I need to start acquainting myself with reiserfs and ext3.
If journalling filesystems aren't doing their own queueing, then obviously
there's no reason for us to (and forget the LogFile).
FlatCap (Rich)
nt...@fl...
|