From: Davies L. <dav...@gm...> - 2012-02-22 07:24:00
|
Hi,devs: Today, We found that some mfschunkserver were not responsive, caused many timeout in mfsmount, then all the write operation were blocked. After some digging, we found that there were some small but continuous write bandwidth, strace show that many small pwrite() between several files: [pid 7087] 12:28:28 pwrite(19, "baggins3 60.210.18.235 sE7NtNQU7"..., 25995, 55684725 <unfinished ...> [pid 7078] 12:28:28 pwrite(17, "2012/02/22 12:28:28:root: WARNIN"..., 69, 21768909 <unfinished ...> [pid 7080] 12:28:28 pwrite(20, "gardner4 183.7.50.169 mr5vi+Z4H3"..., 47663, 34550257 <unfinished ...> [pid 7079] 12:28:28 pwrite(19, "\" \"Mozilla/5.0 (Windows NT 6.1) "..., 40377, 55710720 <unfinished ...> [pid 7086] 12:28:28 pwrite(23, "MATP; InfoPath.2; .NET4.0C; 360S"..., 65536, 6427648 <unfinished ...> [pid 7082] 12:28:28 pwrite(23, "; GTB7.2; SLCC2; .NET CLR 2.0.50"..., 65536, 6493184 <unfinished ...> [pid 7083] 12:28:28 pwrite(20, "\255BYU\355\237\347\226s\261\307N{A\355\203S\306\244\255\322[\322\rJ\32[z3\31\311\327"..., 4096, 1024 <unfinished ...> [pid 7078] 12:28:28 pwrite(23, "ovie/subject/4724373/reviews?sta"..., 65536, 6558720 <unfinished ...> [pid 7080] 12:28:28 pwrite(19, "[\"[\4\5\266v\324\366\245n\t\315\202\227\\\343=\336-\r k)\316\354\335\353\373\340\331;"..., 4096, 1024 <unfinished ...> [pid 7079] 12:28:28 pwrite(23, "ta-Python/2.0.15\" 0.016\n211.147."..., 65536, 6624256 <unfinished ...> [pid 7081] 12:28:28 pwrite(23, "4034093?apikey=0eb695f25995d7eb2"..., 65536, 6689792 <unfinished ...> [pid 7084] 12:28:28 pwrite(23, " y8G23n95BKY:43534427:wind8vssc4"..., 65536, 6755328) = 65536 <0.000108> [pid 7078] 12:28:28 pwrite(23, "TkVvKuXfug:3248233:5Yo9vFoOIuo \""..., 65536, 6820864 <unfinished ...> [pid 7086] 12:28:28 pwrite(23, ":s|1563396:s|1040897:s|1395290:s"..., 65536, 6886400 <unfinished ...> [pid 7085] 12:28:28 pwrite(23, "dows%3B%20U%3B%20Windows%20NT%20"..., 65536, 6951936 <unfinished ...> [pid 7087] 12:28:28 pwrite(23, "/533.17.9 (KHTML, like Gecko) Ve"..., 65536, 7017472 <unfinished ...> [pid 7079] 12:28:28 pwrite(23, " r1m+tFW1T5M:: \"22/Feb/2012:00:0"..., 65536, 7083008 <unfinished ...> [pid 7086] 12:28:28 pwrite(19, "baggins5 61.174.60.117 i6MSCBvE1"..., 25159, 55751097 <unfinished ...> [pid 7084] 12:28:28 pwrite(20, "gardner1 182.118.7.64 TjxzPKdqNU"..., 10208, 34597920 <unfinished ...> [pid 7080] 12:28:28 pwrite(23, "d7eb2c23c1d70cc187c1&alt=json HT"..., 65536, 7148544 <unfinished ...> [pid 7083] 12:28:28 pwrite(23, "5_Google&type=n&channel=-3&user_"..., 65536, 7214080 <unfinished ...> [pid 7085] 12:28:28 pwrite(19, "12-02-22 12:28:27 1861 \"GET /ser"..., 23179, 55776256 <unfinished ...> [pid 7082] 12:28:28 pwrite(23, "\"http://douban.fm/swf/53035/fmpl"..., 65536, 7279616 <unfinished ...> [pid 7078] 12:28:28 pwrite(20, "opic/27639291/add_comment HTTP/1"..., 18576, 34608128 <unfinished ...> [pid 7087] 12:28:28 pwrite(19, "[\"[\4\5\266v\324\366\245n\t\315\202\227\\\343=\336-\r k)\316\354\335\353\373\340\331;"..., 4096, 1024 <unfinished ...> [pid 7079] 12:28:28 pwrite(23, "ww.douban.com%2Fgroup%2Ftopic%2F"..., 65536, 7345152 <unfinished ...> [pid 7081] 12:28:28 pwrite(20, "\255BYU\355\237\347\226s\261\307N{A\355\203S\306\244\255\322[\322\rJ\32[z3\31\311\327"..., 4096, 1024 <unfinished ...> [pid 7086] 12:28:28 pwrite(23, "patible; MSIE 7.0; Windows NT 6."..., 65536, 7410688 <unfinished ...> [pid 7084] 12:28:28 pwrite(23, "fari/535.7 360EE\" 0.006\n211.147."..., 65536, 7476224 <unfinished ...> [pid 7080] 12:28:28 pwrite(23, "1:OUIVR8CIG5c \"22/Feb/2012:00:03"..., 65536, 7541760 <unfinished ...> [pid 7085] 12:28:28 pwrite(23, "fm \"GET /j/mine/playlist?type=s&"..., 65536, 7607296 <unfinished ...> [pid 7083] 12:28:28 pwrite(23, "pe=n&channel=18&user_id=39266798"..., 65536, 7672832 <unfinished ...> [pid 7082] 12:28:28 pwrite(23, " 0.023\n125.34.190.128 :: \"22/Feb"..., 65536, 7738368 <unfinished ...> [pid 7078] 12:28:28 pwrite(23, "00 5859 \"http://www.douban.com/p"..., 65536, 7803904 <unfinished ...> [pid 7079] 12:28:28 pwrite(23, "03:08 +0800\" www.douban.com \"GET"..., 65536, 7869440 <unfinished ...> [pid 7086] 12:28:28 pwrite(23, "type=all HTTP/1.1\" 200 1492 \"-\" "..., 65536, 7934976 <unfinished ...> [pid 7084] 12:28:28 pwrite(23, "Hiapk&user_id=57982902&expire=13"..., 65536, 8000512 <unfinished ...> [pid 7080] 12:28:28 pwrite(23, "0.011\n116.253.89.216 rxASuWZf1wg"..., 65536, 8066048 <unfinished ...> [pid 7085] 12:28:28 pwrite(23, "9 +0800\" www.douban.com \"GET /ph"..., 65536, 8131584) = 65536 <0.000062> [pid 7083] 12:28:28 pwrite(23, " +0800\" www.douban.com \"GET /eve"..., 65536, 8197120 <unfinished ...> [pid 7082] 12:28:28 pwrite(23, " +0800\" www.douban.com \"POST /se"..., 65536, 8262656) = 65536 <0.000103> [pid 7087] 12:28:28 pwrite(23, "0 12971 \"http://www.douban.com/g"..., 65536, 8328192 <unfinished ...> [pid 7081] 12:28:28 pwrite(23, ".0 (compatible; MSIE 7.0; Window"..., 65536, 8393728) = 65536 <0.000065> In order to get better performance, the chunk server should merge the continuous sequential write operations into larger ones. -- - Davies |