From: Paul R. <pr...@ib...> - 2011-08-29 14:11:20
|
On Monday 29 August 2011 at 15:27 Adriano dos Santos Fernandes wrote: > > > > The default for ext3/4 is data=ordered and, from the kernel docs: > I read divergent things about this. Some says that ext3 default changed > to writeback, others says it depends from a kernel configure option. I haven't updated my kernel documentation since May 2010 but it seems consistent with the info in these links: http://www.mjmwired.net/kernel/Documentation/filesystems/ext3.txt http://www.mjmwired.net/kernel/Documentation/filesystems/ext4.txt Both say that the kernel default is data=ordered. However, distros can change this, so it is important to double check. The simplest way to do that is with: cat /proc/mounts > > "All data are forced directly out to the main file system prior to its > > metadata being committed to the journal." > > > > So presumably as long as data=ordered then a barrier flush will always > > imply that all data is written to disc. > > Does that means that FW=ON (i.e., O_SYNC mode) doesn't guarantee that a > commit reported as succeeded may really succeed if a fast power loss > happens and the hard disk has a non-battery based cache and barriers are > disabled? That is how I understand it. Each level seems to play smoke and mirrors. If just one level does asynchronous writes then all timings will be wrong and there is a risk to data integrity. The levels are Application - we can set FW=ON or OFF. If ON the we are saying write everything to disc immediately. If FW=OFF then we see a massive performance gain on small test runs (especially if super* is used.) Filesystem - ext3 (and others) are mounted async by default (at least for opensuse). I've done disc i/o tests with the partition mounted async that show anomalies for disc iops. The only way to remove the anomalies was with mounting with sync. I had previously mounted with barrier=1 but that was insufficent. (Of course FW=ON). Disc drive - Modern consumer drives are shipped with write cache = on. In theory the capacitors store sufficient energy to flush the cache to disc in event of power failure. Either way, if write caching is on then test results will be skewed. If the cache is not saturated then tests will appear to be quick (but data not actually written to disk). If the cache is saturated then results for test B will be distorted by the delayed writes from test A. > > And considering that O_SYNC and barrier are on, does it implies that any > page write will make the metadata flush, or something else must be done? > Hmmm. I think I've answered that in the section above. One thing is for sure - you know that the writes are synchronous when the performance drops massively :-) Paul -- Paul Reeves http://www.ibphoenix.com Specialists in Firebird support |