[SSI] CFS, mount options and NFS

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Before deciding on the mount options one might have for CFS and
what they might mean, I'd like to understand what is available for
NFS.  My understanding (which I hope someone will correct if it is
wrong), goes like this:

Physical filesystems (like ext2, ext3, XFS, ...) can be mounted 
-o sync  (turning into MS_SYNCHRONOUS in the the kernel) or not.
If they are not -o sync, then all operations are asynchronous, which
not only means writes but also creates, unlinks, renames, chmods, etc.
The -o sync means that all page writes are commited, which means
directory pages and data pages but does not include inodes.  This 
means, for example, that a file create could be lost on a reboot.
I'm not sure what prompts a inode to get written out?
Maybe not all physical filesystem work this way?  JFS?
These are not the semantics I'm used to.  I think the Unix OSs
always do meta-data operations (like create, unlink, rename, etc.)
synchronously (either to disk or to a journal).  Data is written
async unless the file was opened O_SYNC.

NFS can export a filesystem in NFSEXP_ASYNC mode or not (default
is not).  If it is not ASYNC, then at the end of each operation
all data and meta-data is commited to the physical filesystem
(either to disk or the journal), whether or not the physical
filesystem was mounted -o sync.  If the filesystem is
exported NFSEXP_ASYNC, then it is left up the physical filesystem,
which will do data and directories synchronous if mounted with
-o sync but will never guarantee anything about inodes.

On the NFS client, the NFS mount can be -o sync or not.  The only
difference is that with -o sync, you don't employ the biod's to 
asynchronously write pages of data to the server.  The NFS mount
can also be -o noac  (no attribute cache).  This implies -o sync
and also means the inode attributes are never trusted at the client
and must be re-obtained on almost all operations.  The NFS
mount can be either soft or hard.  If it is soft, the application will
see errors if a requested operation exceeds the retry count.
Subsequent operations done after the server returns might work, 
however.  If it is hard, the system just retries forever or until
the user interrupts it (if -o intr is set).

So what do the combinations means:

a. pfs not sync; share sync; nfs client not sync, hard
        - default
        - everything pretty synchronous except biod's can do writes
          from the client  
b. pfs not sync; share async; nfs client not sync, hard
        - everything pretty async;
        - client operations are sync to server except for data which can
           go thru the biods
        - everything at the server is async so nothing is commited until
           long after the client node has been told it is done.
        - very fast mode;  probably real problems to do failover;
c. pfs -o sync; share sync; nfs client not sync, hard
        - same as a, pfs -o sync doesn't matter given share of sync
d. pfs -o sync; share async; nfs client not sync, hard
        - meta data operations have the directory written synchronously
           but not the inode;
        - data is async from the client but sync at the server
        - hard to do failover because some inode operations that said
           they were done may never have been written to disk;
e. any of the above with nfs client -o sync mount
        - just affects whether writes can go via the biod's or not;
        - with -o sync, the client can see out-of-space errors
            synchronously;  with biod's, the failure has not way to be
            reported.
f. any of the above with nfs client -o noac mount
        - same as e) but on each operation the client inode must be refreshed
          from the server
g. any of the above with nfs client -o soft
        - errors are reported after some number of retries to the server;
        - if you are not -o sync, the errors may go the biod and be reported
                on the next write or close;
        - an error doesn't mean the operation didn't happen; maybe the
            response was lost.