From: Michał B. <mic...@ge...> - 2012-03-01 12:17:28
|
Hi Davies and Chris! Your remarks are very interesting. Please wait with your patches - we’ll take these into account in one of the future versions. Regards Michal From: Davies Liu [mailto:dav...@gm...] Sent: Thursday, March 01, 2012 9:52 AM To: Chris Picton Cc: Michał Borychowski; moo...@li... Subject: Re: [Moosefs-users] Bad write performance of mfschunkserver I can not figure it out how to make fsync frequency configurable, then move fsync() just before close(): --- mfs-1.6.26/mfschunkserver/hddspacemgr.c 2012-02-08 16:15:03.000000000 +0800 +++ mfs-1.6.26-r1/mfschunkserver/hddspacemgr.c 2012-03-01 16:17:23.000000000 +0800 @@ -1887,28 +1887,28 @@ errno = errmem; return status; } - ts = get_usectime(); -#ifdef F_FULLFSYNC - if (fcntl(c->fd,F_FULLFSYNC)<0) { - int errmem = errno; - mfs_arg_errlog_silent(LOG_WARNING,"hdd_io_end: file:%s - fsync (via fcntl) error",c->filename); - errno = errmem; - return ERROR_IO; - } -#else - if (fsync(c->fd)<0) { - int errmem = errno; - mfs_arg_errlog_silent(LOG_WARNING,"hdd_io_end: file:%s - fsync (direct call) error",c->filename); - errno = errmem; - return ERROR_IO; - } -#endif - te = get_usectime(); - hdd_stats_datafsync(c->owner,te-ts); } c->crcrefcount--; if (c->crcrefcount==0) { if (OPENSTEPS==0) { + ts = get_usectime(); +#ifdef F_FULLFSYNC + if (fcntl(c->fd,F_FULLFSYNC)<0) { + int errmem = errno; + mfs_arg_errlog_silent(LOG_WARNING,"hdd_io_end: file:%s - fsync (via fcntl) error",c->filename); + errno = errmem; + return ERROR_IO; + } +#else + if (fsync(c->fd)<0) { + int errmem = errno; + mfs_arg_errlog_silent(LOG_WARNING,"hdd_io_end: file:%s - fsync (direct call) error",c->filename); + errno = errmem; + return ERROR_IO; + } +#endif + te = get_usectime(); + hdd_stats_datafsync(c->owner,te-ts); if (close(c->fd)<0) { int errmem = errno; c->fd = -1; On Thu, Mar 1, 2012 at 12:21 PM, Chris Picton <ch...@ec...> wrote: I have had similar ideas I currently have a patch to disable fsync on every block close, however this will probably lead to data corruption if there is a site-wide power outage. My thoughts are as follows: · Create config variable FLUSH_ON_WRITE (0/1) to disable or enable flush on write · Create config variable FLUSH_DELAY (seconds) to prevent flushing immediately - rather the pointers to the written chunks would be stored, and looped through (in a separate thread?), to flush any which are older than the delay. This would ensure that the chunks have a maximum time during which they may potentially be invalid on disk. If the FLUSH_DELAY is 0, then behaviour is as current · Create config variable CHECKSUM_INITIAL (0/1) If set to 1 would force a checksum of *all* blocks on a chunkserver at startup, to find potentially bad chunks before they are used. Is this necessary, though? Are checksums read on every block read? I may start making the above patches, if I get time to do so. Chris On 2012/03/01 6:09 AM, Davies Liu wrote: Hi Michal! I have found the reason for bad write performance, the chunk server will write the crc block into disk EVERY second, then fsync(), which will take about 28ms. Can I reduce frequency of fsync() to every minute, then check the chunk modified in recent 30 minutes after booting? Davies 2012/2/23 Michał Borychowski <mic...@ge...> Hi Davies! Here is our analysis of this situation. Different files are written simultaneously on the same CS - that's why pwrites are written do different files. Block size of 64kB is not that small. Writes in OS are sent through write cache so all saves being multiplication of 4096B should work equally fast. Our tests: dd on Linux 64k : 640k $ dd if=/dev/zero of=/tmp/test bs=64k count=10000 10000+0 records in 10000+0 records out 655360000 bytes (655 MB) copied, 22.1836 s, 29.5 MB/s $ dd if=/dev/zero of=/tmp/test bs=640k count=1000 1000+0 records in 1000+0 records out 655360000 bytes (655 MB) copied, 23.1311 s, 28.3 MB/s dd on Mac OS X 64k : 640k $ dd if=/dev/zero of=/tmp/test bs=64k count=10000 10000+0 records in 10000+0 records out 655360000 bytes transferred in 14.874652 secs (44058846 bytes/sec) $ dd if=/dev/zero of=/tmp/test bs=640k count=1000 1000+0 records in 1000+0 records out 655360000 bytes transferred in 14.578427 secs (44954096 bytes/sec) So the times are similar. Saves going to different files also should not be a problem as a kernel scheduler takes care of this. If you have some specific idea how to improve the saves please share it with us. Kind regards Michał -----Original Message----- From: Davies Liu [mailto:dav...@gm...] Sent: Wednesday, February 22, 2012 8:24 AM To: moo...@li... Subject: [Moosefs-users] Bad write performance of mfschunkserver Hi,devs: Today, We found that some mfschunkserver were not responsive, caused many timeout in mfsmount, then all the write operation were blocked. After some digging, we found that there were some small but continuous write bandwidth, strace show that many small pwrite() between several files: [pid 7087] 12:28:28 pwrite(19, "baggins3 60.210.18.235 sE7NtNQU7"..., 25995, 55684725 <unfinished ...> [pid 7078] 12:28:28 pwrite(17, "2012/02/22 12:28:28:root: WARNIN"..., 69, 21768909 <unfinished ...> [pid 7080] 12:28:28 pwrite(20, "gardner4 183.7.50.169 mr5vi+Z4H3"..., 47663, 34550257 <unfinished ...> [pid 7079] 12:28:28 pwrite(19, "\" \"Mozilla/5.0 (Windows NT 6.1) "..., 40377, 55710720 <unfinished ...> [pid 7086] 12:28:28 pwrite(23, "MATP; InfoPath.2; .NET4.0C; 360S"..., 65536, 6427648 <unfinished ...> [pid 7082] 12:28:28 pwrite(23, "; GTB7.2; SLCC2; .NET CLR 2.0.50"..., 65536, 6493184 <unfinished ...> [pid 7083] 12:28:28 pwrite(20, "\255BYU\355\237\347\226s\261\307N{A\355\203S\306\244\255\322[\322\rJ\32[z3\31\311\327"..., 4096, 1024 <unfinished ...> [pid 7078] 12:28:28 pwrite(23, "ovie/subject/4724373/reviews?sta"..., 65536, 6558720 <unfinished ...> [pid 7080] 12:28:28 pwrite(19, "[\"[\4\5\266v\324\366\245n\t\315\202\227\\\343=\336-\r k)\316\354\335\353\373\340\331;"..., 4096, 1024 <unfinished ...> [pid 7079] 12:28:28 pwrite(23, "ta-Python/2.0.15\" 0.016\n211.147."..., 65536, 6624256 <unfinished ...> [pid 7081] 12:28:28 pwrite(23, "4034093?apikey=0eb695f25995d7eb2"..., 65536, 6689792 <unfinished ...> [pid 7084] 12:28:28 pwrite(23, " y8G23n95BKY:43534427:wind8vssc4"..., 65536, 6755328) = 65536 <0.000108> [pid 7078] 12:28:28 pwrite(23, "TkVvKuXfug:3248233:5Yo9vFoOIuo \""..., 65536, 6820864 <unfinished ...> [pid 7086] 12:28:28 pwrite(23, ":s|1563396:s|1040897:s|1395290:s"..., 65536, 6886400 <unfinished ...> [pid 7085] 12:28:28 pwrite(23, "dows%3B%20U%3B%20Windows%20NT%20"..., 65536, 6951936 <unfinished ...> [pid 7087] 12:28:28 pwrite(23, "/533.17.9 (KHTML, like Gecko) Ve"..., 65536, 7017472 <unfinished ...> [pid 7079] 12:28:28 pwrite(23, " r1m+tFW1T5M:: \"22/Feb/2012:00:0"..., 65536, 7083008 <unfinished ...> [pid 7086] 12:28:28 pwrite(19, "baggins5 61.174.60.117 i6MSCBvE1"..., 25159, 55751097 <unfinished ...> [pid 7084] 12:28:28 pwrite(20, "gardner1 182.118.7.64 TjxzPKdqNU"..., 10208, 34597920 <unfinished ...> [pid 7080] 12:28:28 pwrite(23, "d7eb2c23c1d70cc187c1&alt=json HT"..., 65536, 7148544 <unfinished ...> [pid 7083] 12:28:28 pwrite(23, "5_Google&type=n&channel=-3&user_"..., 65536, 7214080 <unfinished ...> [pid 7085] 12:28:28 pwrite(19, "12-02-22 12:28:27 1861 \"GET /ser"..., 23179, 55776256 <unfinished ...> [pid 7082] 12:28:28 pwrite(23, "\"http://douban.fm/swf/53035/fmpl"..., 65536, 7279616 <unfinished ...> [pid 7078] 12:28:28 pwrite(20, "opic/27639291/add_comment HTTP/1"..., 18576, 34608128 <unfinished ...> [pid 7087] 12:28:28 pwrite(19, "[\"[\4\5\266v\324\366\245n\t\315\202\227\\\343=\336-\r k)\316\354\335\353\373\340\331;"..., 4096, 1024 <unfinished ...> [pid 7079] 12:28:28 pwrite(23, "ww.douban.com%2Fgroup%2Ftopic%2F"..., 65536, 7345152 <unfinished ...> [pid 7081] 12:28:28 pwrite(20, "\255BYU\355\237\347\226s\261\307N{A\355\203S\306\244\255\322[\322\rJ\32[z3\31\311\327"..., 4096, 1024 <unfinished ...> [pid 7086] 12:28:28 pwrite(23, "patible; MSIE 7.0; Windows NT 6."..., 65536, 7410688 <unfinished ...> [pid 7084] 12:28:28 pwrite(23, "fari/535.7 360EE\" 0.006\n211.147."..., 65536, 7476224 <unfinished ...> [pid 7080] 12:28:28 pwrite(23, "1:OUIVR8CIG5c \"22/Feb/2012:00:03"..., 65536, 7541760 <unfinished ...> [pid 7085] 12:28:28 pwrite(23, "fm \"GET /j/mine/playlist?type=s&"..., 65536, 7607296 <unfinished ...> [pid 7083] 12:28:28 pwrite(23, "pe=n&channel=18&user_id=39266798"..., 65536, 7672832 <unfinished ...> [pid 7082] 12:28:28 pwrite(23, " 0.023\n125.34.190.128 :: \"22/Feb"..., 65536, 7738368 <unfinished ...> [pid 7078] 12:28:28 pwrite(23, "00 5859 \"http://www.douban.com/p"..., 65536, 7803904 <unfinished ...> [pid 7079] 12:28:28 pwrite(23, "03:08 +0800\" www.douban.com \"GET"..., 65536, 7869440 <unfinished ...> [pid 7086] 12:28:28 pwrite(23, "type=all HTTP/1.1\" 200 1492 \"-\" "..., 65536, 7934976 <unfinished ...> [pid 7084] 12:28:28 pwrite(23, "Hiapk&user_id=57982902&expire=13"..., 65536, 8000512 <unfinished ...> [pid 7080] 12:28:28 pwrite(23, "0.011\n116.253.89.216 rxASuWZf1wg"..., 65536, 8066048 <unfinished ...> [pid 7085] 12:28:28 pwrite(23, "9 +0800\" www.douban.com \"GET /ph"..., 65536, 8131584) = 65536 <0.000062> [pid 7083] 12:28:28 pwrite(23, " +0800\" www.douban.com \"GET /eve"..., 65536, 8197120 <unfinished ...> [pid 7082] 12:28:28 pwrite(23, " +0800\" www.douban.com \"POST /se"..., 65536, 8262656) = 65536 <0.000103> [pid 7087] 12:28:28 pwrite(23, "0 12971 \"http://www.douban.com/g"..., 65536, 8328192 <unfinished ...> [pid 7081] 12:28:28 pwrite(23, ".0 (compatible; MSIE 7.0; Window"..., 65536, 8393728) = 65536 <0.000065> In order to get better performance, the chunk server should merge the continuous sequential write operations into larger ones. -- - Davies ------------------------------------------------------------------------------ Virtualization & Cloud Management Using Capacity Planning Cloud computing makes use of virtualization - but cloud computing also focuses on allowing computing to be delivered as a service. http://www.accelacomm.com/jaw/sfnl/114/51521223/ _______________________________________________ moosefs-users mailing list moo...@li... https://lists.sourceforge.net/lists/listinfo/moosefs-users -- - Davies ------------------------------------------------------------------------------ Virtualization & Cloud Management Using Capacity Planning Cloud computing makes use of virtualization - but cloud computing also focuses on allowing computing to be delivered as a service. http://www.accelacomm.com/jaw/sfnl/114/51521223/ _______________________________________________ moosefs-users mailing list moo...@li... https://lists.sourceforge.net/lists/listinfo/moosefs-users -- - Davies |