From: Davies L. <dav...@gm...> - 2012-03-06 13:52:44
|
2012/3/6 Michał Borychowski <mic...@ge...> > Hi! > > I'm curious whether you made some tests with your solution? What was the > performance gain? Was it like 20-30% or rather 2-3%? > I had not do some benchmark for it, but we had noticed that the patch reduce the "time of data write operation" of chunk server from max 8 seconds (0.8 seconds, avg) to max 100 ms (50 ms avg). We write weblog to MFS by scribed in nearly realtime, had notice that huge amount of "writeworker: connection with (C0A8014B:9422) was timed out" in syslog of mfsmount before patching it. After the patch, the number of time out messages from 120k to about 100. Reduce the times of fsync() is necessary to improve write performance, especially for small block write operations. When delaying fsync() until fclose(), data loss will only occur when power failing. So we can touch a file when chunk server startup, then remove it when shutdown. When power failing, the flag file will not been removed, so it seems that some of chunks may not beed fully synced, it can scan all chunks with mtime after mtime of the flag file. > > Actually here we are quite skeptical about this. In one of your emails you > suggest to do fsync more rarely (eg. every 30 seconds). ut when CS is > closed cleanly, OS will do all fsyncs before closing the files but when CS > is not closed cleanly so how should we know which files to test? CS upon > startup doesn't do 'stat' on every file (it took too long time). So it > won't know which files to check. > > We could create some extra file (eg. named '.dirty') where we could save > id of the file upon opening it (and do fsync on the '.dirty' file). Upon > file closing we delete its id from '.dirty' file. When CS closes cleanly, > '.dirty' file should be empty. If not, upon starting of CS, it reads the > '.dirty' file and it scans all the chunks which are saved in this file. > > > You also gave some suggestions to use this options: > 1. FLUSH_ON_WRITE - option easy to implement, but not that secure > 2. FLUSH_DELAY - as above > 3. CHECKSUM_INITIAL - this would mean to read all chunks on all the disks > upon startup which is just impossible (in some environments would take more > than 24hrs). > > > And still we are afraid that there may be scenario that malfunction of CS > without fsyncs could cause that chunk will "return" to a proper form (in > sense of CRC) from before the save. It will mean that there would be > several "proper" copies of the same chunk, but with different content - we > cannot allow this to happen. > > Possible it would be necessary to inform master server that CS has some > 'unfsynced' chunks. So this gets still more complicated. That's why we are > curious whether the performance gain is substantial enough to start doing > this fine tuning. > > > Kind regards > Michał Borychowski > MooseFS Support Manager > > > -- - Davies |