[SSI] CFS, mount options and NFS
Brought to you by:
brucewalker,
rogertsang
From: Bruce J. W. <bj...@br...> - 2002-04-30 22:15:08
|
Before deciding on the mount options one might have for CFS and what they might mean, I'd like to understand what is available for NFS. My understanding (which I hope someone will correct if it is wrong), goes like this: Physical filesystems (like ext2, ext3, XFS, ...) can be mounted -o sync (turning into MS_SYNCHRONOUS in the the kernel) or not. If they are not -o sync, then all operations are asynchronous, which not only means writes but also creates, unlinks, renames, chmods, etc. The -o sync means that all page writes are commited, which means directory pages and data pages but does not include inodes. This means, for example, that a file create could be lost on a reboot. I'm not sure what prompts a inode to get written out? Maybe not all physical filesystem work this way? JFS? These are not the semantics I'm used to. I think the Unix OSs always do meta-data operations (like create, unlink, rename, etc.) synchronously (either to disk or to a journal). Data is written async unless the file was opened O_SYNC. NFS can export a filesystem in NFSEXP_ASYNC mode or not (default is not). If it is not ASYNC, then at the end of each operation all data and meta-data is commited to the physical filesystem (either to disk or the journal), whether or not the physical filesystem was mounted -o sync. If the filesystem is exported NFSEXP_ASYNC, then it is left up the physical filesystem, which will do data and directories synchronous if mounted with -o sync but will never guarantee anything about inodes. On the NFS client, the NFS mount can be -o sync or not. The only difference is that with -o sync, you don't employ the biod's to asynchronously write pages of data to the server. The NFS mount can also be -o noac (no attribute cache). This implies -o sync and also means the inode attributes are never trusted at the client and must be re-obtained on almost all operations. The NFS mount can be either soft or hard. If it is soft, the application will see errors if a requested operation exceeds the retry count. Subsequent operations done after the server returns might work, however. If it is hard, the system just retries forever or until the user interrupts it (if -o intr is set). So what do the combinations means: a. pfs not sync; share sync; nfs client not sync, hard - default - everything pretty synchronous except biod's can do writes from the client b. pfs not sync; share async; nfs client not sync, hard - everything pretty async; - client operations are sync to server except for data which can go thru the biods - everything at the server is async so nothing is commited until long after the client node has been told it is done. - very fast mode; probably real problems to do failover; c. pfs -o sync; share sync; nfs client not sync, hard - same as a, pfs -o sync doesn't matter given share of sync d. pfs -o sync; share async; nfs client not sync, hard - meta data operations have the directory written synchronously but not the inode; - data is async from the client but sync at the server - hard to do failover because some inode operations that said they were done may never have been written to disk; e. any of the above with nfs client -o sync mount - just affects whether writes can go via the biod's or not; - with -o sync, the client can see out-of-space errors synchronously; with biod's, the failure has not way to be reported. f. any of the above with nfs client -o noac mount - same as e) but on each operation the client inode must be refreshed from the server g. any of the above with nfs client -o soft - errors are reported after some number of retries to the server; - if you are not -o sync, the errors may go the biod and be reported on the next write or close; - an error doesn't mean the operation didn't happen; maybe the response was lost. |