From: Chris P. <ch...@ec...> - 2012-03-01 04:46:44
|
I have had similar ideas I currently have a patch to disable fsync on every block close, however this will probably lead to data corruption if there is a site-wide power outage. My thoughts are as follows: * Create config variable FLUSH_ON_WRITE (0/1) to disable or enable flush on write * Create config variable FLUSH_DELAY (seconds) to prevent flushing immediately - rather the pointers to the written chunks would be stored, and looped through (in a separate thread?), to flush any which are older than the delay. This would ensure that the chunks have a maximum time during which they may potentially be invalid on disk. If the FLUSH_DELAY is 0, then behaviour is as current * Create config variable CHECKSUM_INITIAL (0/1) If set to 1 would force a checksum of *all* blocks on a chunkserver at startup, to find potentially bad chunks before they are used. Is this necessary, though? Are checksums read on every block read? I may start making the above patches, if I get time to do so. Chris On 2012/03/01 6:09 AM, Davies Liu wrote: > Hi Michal! > > I have found the reason for bad write performance, the chunk server > will write the crc block > into disk EVERY second, then fsync(), which will take about 28ms. > > Can I reduce frequency of fsync() to every minute, then check the > chunk modified in recent > 30 minutes after booting? > > Davies > > 2012/2/23 Michał Borychowski <mic...@ge... > <mailto:mic...@ge...>> > > Hi Davies! > > Here is our analysis of this situation. Different files are > written simultaneously on the same CS - that's why pwrites are > written do different files. Block size of 64kB is not that small. > Writes in OS are sent through write cache so all saves being > multiplication of 4096B should work equally fast. > > Our tests: > > dd on Linux 64k : 640k > $ dd if=/dev/zero of=/tmp/test bs=64k count=10000 > 10000+0 records in > 10000+0 records out > 655360000 bytes (655 MB) copied, 22.1836 s, 29.5 MB/s > > $ dd if=/dev/zero of=/tmp/test bs=640k count=1000 > 1000+0 records in > 1000+0 records out > 655360000 bytes (655 MB) copied, 23.1311 s, 28.3 MB/s > > dd on Mac OS X 64k : 640k > $ dd if=/dev/zero of=/tmp/test bs=64k count=10000 > 10000+0 records in > 10000+0 records out > 655360000 bytes transferred in 14.874652 secs (44058846 bytes/sec) > > $ dd if=/dev/zero of=/tmp/test bs=640k count=1000 > 1000+0 records in > 1000+0 records out > 655360000 bytes transferred in 14.578427 secs (44954096 bytes/sec) > > So the times are similar. Saves going to different files also > should not be a problem as a kernel scheduler takes care of this. > > If you have some specific idea how to improve the saves please > share it with us. > > > Kind regards > Michał > > -----Original Message----- > From: Davies Liu [mailto:dav...@gm... > <mailto:dav...@gm...>] > Sent: Wednesday, February 22, 2012 8:24 AM > To: moo...@li... > <mailto:moo...@li...> > Subject: [Moosefs-users] Bad write performance of mfschunkserver > > Hi,devs: > > Today, We found that some mfschunkserver were not responsive, > caused many timeout in mfsmount, then all the write operation were > blocked. > > After some digging, we found that there were some small but > continuous write bandwidth, strace show that many small pwrite() > between several files: > > [pid 7087] 12:28:28 pwrite(19, "baggins3 60.210.18.235 > sE7NtNQU7"..., 25995, 55684725 <unfinished ...> [pid 7078] > 12:28:28 pwrite(17, "2012/02/22 12:28:28:root: WARNIN"..., 69, > 21768909 <unfinished ...> [pid 7080] 12:28:28 pwrite(20, > "gardner4 183.7.50.169 mr5vi+Z4H3"..., 47663, 34550257 <unfinished > ...> [pid 7079] 12:28:28 pwrite(19, "\" \"Mozilla/5.0 (Windows NT > 6.1) "..., 40377, 55710720 <unfinished ...> [pid 7086] 12:28:28 > pwrite(23, "MATP; InfoPath.2; .NET4.0C; 360S"..., 65536, 6427648 > <unfinished ...> [pid 7082] 12:28:28 pwrite(23, "; GTB7.2; SLCC2; > .NET CLR 2.0.50"..., 65536, 6493184 <unfinished ...> [pid 7083] > 12:28:28 pwrite(20, > "\255BYU\355\237\347\226s\261\307N{A\355\203S\306\244\255\322[\322\rJ\32[z3\31\311\327"..., > 4096, 1024 <unfinished ...> > [pid 7078] 12:28:28 pwrite(23, "ovie/subject/4724373/reviews?sta"..., > 65536, 6558720 <unfinished ...> > [pid 7080] 12:28:28 pwrite(19, > "[\"[\4\5\266v\324\366\245n\t\315\202\227\\\343=\336-\r > k)\316\354\335\353\373\340\331;"..., 4096, 1024 <unfinished ...> > [pid 7079] 12:28:28 pwrite(23, "ta-Python/2.0.15\" > 0.016\n211.147."..., 65536, 6624256 <unfinished ...> [pid 7081] > 12:28:28 pwrite(23, "4034093?apikey=0eb695f25995d7eb2"..., > 65536, 6689792 <unfinished ...> > [pid 7084] 12:28:28 pwrite(23, " y8G23n95BKY:43534427:wind8vssc4"..., > 65536, 6755328) = 65536 <0.000108> > [pid 7078] 12:28:28 pwrite(23, "TkVvKuXfug:3248233:5Yo9vFoOIuo > \""..., 65536, 6820864 <unfinished ...> [pid 7086] 12:28:28 > pwrite(23, ":s|1563396:s|1040897:s|1395290:s"..., > 65536, 6886400 <unfinished ...> > [pid 7085] 12:28:28 pwrite(23, "dows%3B%20U%3B%20Windows%20NT%20"..., > 65536, 6951936 <unfinished ...> > [pid 7087] 12:28:28 pwrite(23, "/533.17.9 (KHTML, like Gecko) > Ve"..., 65536, 7017472 <unfinished ...> [pid 7079] 12:28:28 > pwrite(23, " r1m+tFW1T5M:: > \"22/Feb/2012:00:0"..., 65536, 7083008 <unfinished ...> [pid > 7086] 12:28:28 pwrite(19, "baggins5 61.174.60.117 i6MSCBvE1"..., > 25159, 55751097 <unfinished ...> [pid 7084] 12:28:28 pwrite(20, > "gardner1 182.118.7.64 TjxzPKdqNU"..., 10208, 34597920 <unfinished > ...> [pid 7080] 12:28:28 pwrite(23, > "d7eb2c23c1d70cc187c1&alt=json HT"..., 65536, 7148544 <unfinished > ...> [pid 7083] 12:28:28 pwrite(23, > "5_Google&type=n&channel=-3&user_"..., > 65536, 7214080 <unfinished ...> > [pid 7085] 12:28:28 pwrite(19, "12-02-22 12:28:27 1861 \"GET > /ser"..., 23179, 55776256 <unfinished ...> [pid 7082] 12:28:28 > pwrite(23, "\"http://douban.fm/swf/53035/fmpl"..., 65536, 7279616 > <unfinished ...> [pid 7078] 12:28:28 pwrite(20, > "opic/27639291/add_comment HTTP/1"..., 18576, 34608128 <unfinished > ...> [pid 7087] 12:28:28 pwrite(19, > "[\"[\4\5\266v\324\366\245n\t\315\202\227\\\343=\336-\r > k)\316\354\335\353\373\340\331;"..., 4096, 1024 <unfinished ...> > [pid 7079] 12:28:28 pwrite(23, "ww.douban.com > <http://ww.douban.com>%2Fgroup%2Ftopic%2F"..., > 65536, 7345152 <unfinished ...> > [pid 7081] 12:28:28 pwrite(20, > "\255BYU\355\237\347\226s\261\307N{A\355\203S\306\244\255\322[\322\rJ\32[z3\31\311\327"..., > 4096, 1024 <unfinished ...> > [pid 7086] 12:28:28 pwrite(23, "patible; MSIE 7.0; Windows NT > 6."..., 65536, 7410688 <unfinished ...> [pid 7084] 12:28:28 > pwrite(23, "fari/535.7 360EE\" > 0.006\n211.147."..., 65536, 7476224 <unfinished ...> [pid 7080] > 12:28:28 pwrite(23, "1:OUIVR8CIG5c \"22/Feb/2012:00:03"..., 65536, > 7541760 <unfinished ...> [pid 7085] 12:28:28 pwrite(23, "fm \"GET > /j/mine/playlist?type=s&"..., 65536, 7607296 <unfinished ...> [pid > 7083] 12:28:28 pwrite(23, "pe=n&channel=18&user_id=39266798"..., > 65536, 7672832 <unfinished ...> > [pid 7082] 12:28:28 pwrite(23, " 0.023\n125.34.190.128 :: > \"22/Feb"..., 65536, 7738368 <unfinished ...> [pid 7078] 12:28:28 > pwrite(23, "00 5859 \"http://www.douban.com/p"..., 65536, 7803904 > <unfinished ...> [pid 7079] 12:28:28 pwrite(23, "03:08 +0800\" > www.douban.com <http://www.douban.com> \"GET"..., 65536, 7869440 > <unfinished ...> [pid 7086] 12:28:28 pwrite(23, "type=all > HTTP/1.1\" 200 1492 \"-\" > "..., 65536, 7934976 <unfinished ...> > [pid 7084] 12:28:28 pwrite(23, "Hiapk&user_id=57982902&expire=13"..., > 65536, 8000512 <unfinished ...> > [pid 7080] 12:28:28 pwrite(23, "0.011\n116.253.89.216 > rxASuWZf1wg"..., 65536, 8066048 <unfinished ...> [pid 7085] > 12:28:28 pwrite(23, "9 +0800\" www.douban.com > <http://www.douban.com> \"GET /ph"..., 65536, 8131584) = 65536 > <0.000062> [pid 7083] 12:28:28 pwrite(23, " +0800\" > www.douban.com <http://www.douban.com> \"GET /eve"..., 65536, > 8197120 <unfinished ...> [pid 7082] 12:28:28 pwrite(23, " +0800\" > www.douban.com <http://www.douban.com> \"POST /se"..., 65536, > 8262656) = 65536 <0.000103> [pid 7087] 12:28:28 pwrite(23, "0 > 12971 \"http://www.douban.com/g"..., 65536, 8328192 <unfinished > ...> [pid 7081] 12:28:28 pwrite(23, ".0 (compatible; MSIE 7.0; > Window"..., 65536, 8393728) = 65536 <0.000065> > > In order to get better performance, the chunk server should merge > the continuous sequential write operations into larger ones. > > -- > - Davies > > ------------------------------------------------------------------------------ > Virtualization & Cloud Management Using Capacity Planning Cloud > computing makes use of virtualization - but cloud computing also > focuses on allowing computing to be delivered as a service. > http://www.accelacomm.com/jaw/sfnl/114/51521223/ > _______________________________________________ > moosefs-users mailing list > moo...@li... > <mailto:moo...@li...> > https://lists.sourceforge.net/lists/listinfo/moosefs-users > > > > > -- > - Davies > > > ------------------------------------------------------------------------------ > Virtualization& Cloud Management Using Capacity Planning > Cloud computing makes use of virtualization - but cloud computing > also focuses on allowing computing to be delivered as a service. > http://www.accelacomm.com/jaw/sfnl/114/51521223/ > > > _______________________________________________ > moosefs-users mailing list > moo...@li... > https://lists.sourceforge.net/lists/listinfo/moosefs-users |