From: Marin B. <li...@ol...> - 2018-05-20 21:11:15
|
> Il giorno dom 20 mag 2018 alle ore 22:09 Marin Bernard > <li...@ol...> > ha scritto: > > The chunkserver acknowledges the write while the data are still > > pending > > commit to disk. If the server dies meanwhile, the data are lost. > > Even if client asked for O_DIRECT or fsync explicitely ? > If yes, this would break the POSIX compatibility, I think. No, not necessarily. It depends on what you mean by 'the client'. The POSIX interface is implemented by mfsmount, not the chunkserver. The client is just another process on the same machine. mfsmount may comply with POSIX and let chunkservers deal with the data in their own way in the background. The client process would know nothing about it, as it only speaks to mfsmount. From a POSIX perspective, the chunkserver is like a physical disk. POSIX does not specify how physical disks should work internally. If you O_DIRECT a file stored on a consumer hard drive and fill it with data, chances are that you'll experiment some kind of data loss if you unplug the box in the middle of a write: the content of the write cache, which was acknowledged but not committed, would have vanished. POSIX can't do anything about it. Since MooseFS extends over both the kernel and device layers, it has the opportunity to do better than POSIX, and break the tiering by leaking useful data from the mount processes to the chunkservers. I suppose this is why fsync operations are cascaded from mfsmount to the chunkservers. I do not know if this is the same with O_DIRECT, though. > > However, if goal is >= 2 (as it should always be), at least one > > more > > copy of the data must already be present on another chunkserver > > before > > the acknowledgment is sent. > > Not really. if goal >= 2, the ack is sent if another chunkserver has > commited to the cache, > so you are acknowledging when all goal copies are wrote to the cache, > not > to the disk. Absolutely. > This woud be ok in normal condition (like any writes made by any > software), > but if > client is asking for O_DIRECT, the acknoledge *must* be sent *after* > data > is stored on disk. This is what you would get with the ``mfscachemode=DIRECT`` mount option, which bypasses the cache completely, at least on client side. Yet, I don't know whether mfsmount is able to enforce O_DIRECT on a per-file basis or if the settings must apply to the whole mountpoint. |