From: Papp T. <to...@ma...> - 2011-05-27 08:19:51
|
hi! Sometimes there is an error on our mini cluster. Still Ubuntu Natty with recompiled moosefs from ppa. May 27 08:15:00 backup1 mfsmaster[12869]: server 1 (ip: 192.168.3.21, port: 9422): usedspace: 7739308400640 (7207.79 GiB), totalspace: 10944744390656 (10193.09 GiB), usage: 70.71% May 27 08:15:00 backup1 mfsmaster[12869]: total: usedspace: 7739308400640 (7207.79 GiB), totalspace: 10944744390656 (10193.09 GiB), usage: 70.71% May 27 08:15:02 backup1 mfschunkserver[2556]: testing chunk: /mnt/mfschunk1/B1/chunk_0000000000005EB1_00000001.mfs May 27 08:15:12 backup1 mfschunkserver[2556]: testing chunk: /mnt/mfschunk1/B2/chunk_0000000000005EB2_00000001.mfs May 27 08:15:23 backup1 mfschunkserver[2556]: testing chunk: /mnt/mfschunk1/B3/chunk_0000000000005EB3_00000001.mfs May 27 08:15:28 backup1 mfsmount[12917]: master: tcp recv error: ETIMEDOUT (Operation timed out) (1) May 27 08:15:29 backup1 mfsmount[12917]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 27 08:15:32 backup1 mfsmount[12917]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 27 08:15:33 backup1 mfschunkserver[2556]: testing chunk: /mnt/mfschunk1/B4/chunk_0000000000005EB4_00000001.mfs May 27 08:15:35 backup1 mfsmount[12917]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 27 08:15:44 backup1 mfsmount[12917]: last message repeated 3 times May 27 08:15:44 backup1 mfschunkserver[2556]: testing chunk: /mnt/mfschunk1/B5/chunk_0000000000005EB5_00000001.mfs May 27 08:15:47 backup1 mfsmount[12917]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 27 08:15:55 backup1 mfsmount[12917]: last message repeated 2 times May 27 08:15:55 backup1 mfschunkserver[2556]: testing chunk: /mnt/mfschunk1/B6/chunk_0000000000005EB6_00000001.mfs May 27 08:15:56 backup1 mfsmount[12917]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 27 08:15:59 backup1 mfsmount[12917]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 27 08:16:00 backup1 mfschunkserver[2556]: connecting ... May 27 08:16:00 backup1 mfschunkserver[2556]: connected to Master May 27 08:16:02 backup1 mfsmount[12917]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 27 08:17:05 backup1 mfsmount[12917]: last message repeated 21 times May 27 08:17:15 backup1 mfsmount[12917]: last message repeated 3 times May 27 08:17:15 backup1 mfsmaster[12869]: connection with client(ip:192.168.3.21) has been closed by peer May 27 08:17:15 backup1 mfsmaster[12869]: connection with CS(192.168.3.21) has been closed by peer May 27 08:17:15 backup1 mfsmaster[12869]: chunkserver disconnected - ip: 192.168.3.21, port: 9422, usedspace: 7739308400640 (7207.79 GiB), totalspace: 10944744390656 (10193.09 GiB) May 27 08:17:16 backup1 mfsmaster[12869]: connection with ML(192.168.3.13) has been closed by peer May 27 08:17:16 backup1 mfsmaster[12869]: connection with client(ip:192.168.3.21) has been closed by peer May 27 08:17:17 backup1 mfsmaster[12869]: last message repeated 7 times May 27 08:17:17 backup1 mfsmount[12917]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 27 08:17:17 backup1 mfsmaster[12869]: connection with client(ip:192.168.3.21) has been closed by peer May 27 08:17:19 backup1 mfsmaster[12869]: last message repeated 28 times May 27 08:17:19 backup1 mfsmount[12917]: registered to master May 27 08:18:00 backup1 mfsmaster[12869]: chunkservers status: May 27 08:18:00 backup1 mfsmaster[12869]: total: usedspace: 0 (0.00 GiB), totalspace: 0 (0.00 GiB), usage: 0.00% May 27 08:18:16 backup1 mfsmaster[12869]: chunkserver disconnected - ip: 192.168.3.21, port: 0, usedspace: 0 (0.00 GiB), totalspace: 0 (0.00 GiB) May 27 08:19:00 backup1 mfsmaster[12869]: chunkservers status: May 27 08:19:00 backup1 mfsmaster[12869]: total: usedspace: 0 (0.00 GiB), totalspace: 0 (0.00 GiB), usage: 0.00% May 27 08:19:14 backup1 mfsmount[12917]: file: 7340122, index: 0 - fs_writechunk returns status 8 May 27 08:20:00 backup1 mfsmount[12917]: last message repeated 17 times May 27 08:20:00 backup1 mfsmaster[12869]: chunkservers status: May 27 08:20:00 backup1 mfsmaster[12869]: total: usedspace: 0 (0.00 GiB), totalspace: 0 (0.00 GiB), usage: 0.00% May 27 08:20:05 backup1 mfsmount[12917]: file: 7340122, index: 0 - fs_writechunk returns status 8 May 27 08:21:00 backup1 mfsmount[12917]: last message repeated 7 times May 27 08:21:00 backup1 mfsmaster[12869]: chunkservers status: May 27 08:21:00 backup1 mfsmaster[12869]: total: usedspace: 0 (0.00 GiB), totalspace: 0 (0.00 GiB), usage: 0.00% May 27 08:21:02 backup1 mfsmount[12917]: file: 7340122, index: 0 - fs_writechunk returns status 8 May 27 08:21:29 backup1 mfsmount[12917]: last message repeated 3 times May 27 08:21:29 backup1 mfsmount[12917]: error writing file number 7340122: EIO (Input/output error) May 27 08:21:30 backup1 mfsmount[12917]: file: 7340122, index: 0, chunk: 4816320, version: 1 - there are no valid copies May 27 08:21:30 backup1 mfsmount[12917]: file: 7340122, index: 0 - can't connect to proper chunkserver (try counter: 1) May 27 08:22:00 backup1 mfsmaster[12869]: chunkservers status: May 27 08:22:00 backup1 mfsmaster[12869]: total: usedspace: 0 (0.00 GiB), totalspace: 0 (0.00 GiB), usage: 0.00% May 27 08:22:30 backup1 mfsmount[12917]: file: 7340122, index: 0, chunk: 4816320, version: 1 - there are no valid copies May 27 08:22:30 backup1 mfsmount[12917]: file: 7340122, index: 0 - can't connect to proper chunkserver (try counter: 8) May 27 08:23:00 backup1 mfsmaster[12869]: chunkservers status: May 27 08:23:00 backup1 mfsmaster[12869]: total: usedspace: 0 (0.00 GiB), totalspace: 0 (0.00 GiB), usage: 0.00% May 27 08:23:30 backup1 mfsmount[12917]: file: 7340122, index: 0, chunk: 4816320, version: 1 - there are no valid copies May 27 08:23:30 backup1 mfsmount[12917]: file: 7340122, index: 0 - can't connect to proper chunkserver (try counter: 15) May 27 08:24:00 backup1 mfsmaster[12869]: chunkservers status: May 27 08:24:00 backup1 mfsmaster[12869]: total: usedspace: 0 (0.00 GiB), totalspace: 0 (0.00 GiB), usage: 0.00% May 27 08:24:30 backup1 mfsmount[12917]: file: 7340122, index: 0, chunk: 4816320, version: 1 - there are no valid copies May 27 08:24:30 backup1 mfsmount[12917]: file: 7340122, index: 0 - can't connect to proper chunkserver (try counter: 22) May 27 08:25:00 backup1 mfsmaster[12869]: chunkservers status: May 27 08:25:00 backup1 mfsmaster[12869]: total: usedspace: 0 (0.00 GiB), totalspace: 0 (0.00 GiB), usage: 0.00% May 27 08:25:30 backup1 mfsmount[12917]: file: 7340122, index: 0, chunk: 4816320, version: 1 - there are no valid copies May 27 08:25:30 backup1 mfsmount[12917]: file: 7340122, index: 0 - can't connect to proper chunkserver (try counter: 29) May 27 08:26:00 backup1 mfsmaster[12869]: chunkservers status: May 27 08:26:00 backup1 mfsmaster[12869]: total: usedspace: 0 (0.00 GiB), totalspace: 0 (0.00 GiB), usage: 0.00% May 27 08:26:10 backup1 mfschunkserver[2556]: testing chunk: /mnt/mfschunk1/B7/chunk_0000000000005EB7_00000001.mfs May 27 08:26:10 backup1 mfschunkserver[2556]: connection reset by Master May 27 08:26:20 backup1 mfschunkserver[2556]: connecting ... May 27 08:26:20 backup1 mfschunkserver[2556]: connected to Master May 27 08:26:21 backup1 mfschunkserver[2556]: testing chunk: /mnt/mfschunk1/B8/chunk_0000000000005EB8_00000001.mfs May 27 08:26:21 backup1 mfsmaster[12869]: chunkserver register begin (packet version: 5) - ip: 192.168.3.21, port: 9422 May 27 08:26:28 backup1 mfsmaster[12869]: chunkserver register end (packet version: 5) - ip: 192.168.3.21, port: 9422, usedspace: 7739308400640 (7207.79 GiB), totalspace: 10944744390656 (10193.09 GiB) May 27 08:26:31 backup1 mfschunkserver[2556]: testing chunk: /mnt/mfschunk1/B9/chunk_0000000000005EB9_00000001.mfs May 27 08:26:41 backup1 mfschunkserver[2556]: testing chunk: /mnt/mfschunk1/BA/chunk_0000000000005EBA_00000001.mfs May 27 08:26:42 backup1 mfsmount[12917]: file: 7445943, index: 0, chunk: 4816323, version: 1 - writeworker: connection with (C0A80315:9422) was timed out (unfinished writes: 2; try counter: 1) May 27 08:26:51 backup1 mfsmount[12917]: last message repeated 2 times May 27 08:26:51 backup1 mfschunkserver[2556]: testing chunk: /mnt/mfschunk1/BB/chunk_0000000000005EBB_00000001.mfs May 27 08:27:00 backup1 mfsmaster[12869]: chunkservers status: May 27 08:27:00 backup1 mfsmaster[12869]: server 1 (ip: 192.168.3.21, port: 9422): usedspace: 7739309965312 (7207.79 GiB), totalspace: 10944744390656 (10193.09 GiB), usage: 70.71% May 27 08:27:00 backup1 mfsmaster[12869]: total: usedspace: 7739309965312 (7207.79 GiB), totalspace: 10944744390656 (10193.09 GiB), usage: 70.71% May 27 08:27:01 backup1 mfschunkserver[2556]: testing chunk: /mnt/mfschunk1/BC/chunk_0000000000005EBC_00000001.mfs May 27 08:27:11 backup1 mfschunkserver[2556]: testing chunk: /mnt/mfschunk1/BD/chunk_0000000000005EBD_00000001.mfs May 27 08:27:21 backup1 mfschunkserver[2556]: testing chunk: /mnt/mfschunk1/BE/chunk_0000000000005EBE_00000001.mfs May 27 08:27:31 backup1 mfschunkserver[2556]: testing chunk: /mnt/mfschunk1/BF/chunk_0000000000005EBF_00000001.mfs May 27 08:27:41 backup1 mfschunkserver[2556]: testing chunk: /mnt/mfschunk1/C0/chunk_0000000000005EC0_00000001.mfs May 27 08:27:51 backup1 mfschunkserver[2556]: testing chunk: /mnt/mfschunk1/C1/chunk_0000000000005EC1_00000001.mfs May 27 08:28:00 backup1 mfsmaster[12869]: chunkservers status: May 27 08:28:00 backup1 mfsmaster[12869]: server 1 (ip: 192.168.3.21, port: 9422): usedspace: 7739310948352 (7207.79 GiB), totalspace: 10944744390656 (10193.09 GiB), usage: 70.71% May 27 08:28:00 backup1 mfsmaster[12869]: total: usedspace: 7739310948352 (7207.79 GiB), totalspace: 10944744390656 (10193.09 GiB), usage: 70.71% May 27 08:28:01 backup1 mfschunkserver[2556]: testing chunk: /mnt/mfschunk1/C3/chunk_0000000000005EC3_00000001.mfs May 27 08:28:11 backup1 mfschunkserver[2556]: testing chunk: /mnt/mfschunk1/C4/chunk_0000000000005EC4_00000001.mfs May 27 08:28:22 backup1 mfschunkserver[2556]: testing chunk: /mnt/mfschunk1/C5/chunk_0000000000005EC5_00000001.mfs May 27 08:28:32 backup1 mfschunkserver[2556]: testing chunk: /mnt/mfschunk1/C7/chunk_0000000000005EC7_00000001.mfs May 27 08:28:42 backup1 mfschunkserver[2556]: testing chunk: /mnt/mfschunk1/C8/chunk_0000000000005EC8_00000001.mfs May 27 08:28:52 backup1 mfschunkserver[2556]: testing chunk: /mnt/mfschunk1/C9/chunk_0000000000005EC9_00000001.mfs May 27 08:29:00 backup1 mfsmaster[12869]: server 1 (ip: 192.168.3.21, port: 9422): usedspace: 7739313045504 (7207.80 GiB), totalspace: 10944744390656 (10193.09 GiB), usage: 70.71% May 27 08:29:00 backup1 mfsmaster[12869]: total: usedspace: 7739313045504 (7207.80 GiB), totalspace: 10944744390656 (10193.09 GiB), usage: 70.71% May 27 08:29:02 backup1 mfschunkserver[2556]: testing chunk: /mnt/mfschunk1/CA/chunk_0000000000005ECA_00000001.mfs May 27 08:29:12 backup1 mfschunkserver[2556]: testing chunk: /mnt/mfschunk1/CB/chunk_0000000000005ECB_00000001.mfs May 27 08:29:22 backup1 mfschunkserver[2556]: testing chunk: /mnt/mfschunk1/CC/chunk_0000000000005ECC_00000001.mfs May 27 08:29:26 backup1 mfsmount[12917]: master: tcp recv error: ETIMEDOUT (Operation timed out) (1) May 27 08:29:27 backup1 mfsmount[12917]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 27 08:29:30 backup1 mfsmount[12917]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 27 08:29:32 backup1 mfschunkserver[2556]: testing chunk: /mnt/mfschunk1/CD/chunk_0000000000005ECD_00000001.mfs May 27 08:29:32 backup1 mfsmaster[12869]: connection with client(ip:192.168.3.21) has been closed by peer May 27 08:29:32 backup1 mfsmaster[12869]: last message repeated 2 times May 27 08:29:32 backup1 mfsmount[12917]: registered to master May 27 08:29:42 backup1 mfschunkserver[2556]: testing chunk: /mnt/mfschunk1/CE/chunk_0000000000005ECE_00000001.mfs May 27 08:29:52 backup1 mfschunkserver[2556]: testing chunk: /mnt/mfschunk1/D0/chunk_0000000000005ED0_00000001.mfs May 27 08:30:02 backup1 mfschunkserver[2556]: testing chunk: /mnt/mfschunk1/D1/chunk_0000000000005ED1_00000001.mfs May 27 08:30:05 backup1 mfsmount[12917]: file: 122301, index: 0, chunk: 4816324, version: 1 - writeworker: connection with (C0A80315:9422) was timed out (unfinished writes: 1; try counter: 1) May 27 08:30:12 backup1 mfschunkserver[2556]: testing chunk: /mnt/mfschunk1/D2/chunk_0000000000005ED2_00000001.mfs May 27 08:30:22 backup1 mfschunkserver[2556]: testing chunk: /mnt/mfschunk1/D3/chunk_0000000000005ED3_00000001.mfs May 27 08:30:33 backup1 mfschunkserver[2556]: testing chunk: /mnt/mfschunk1/D4/chunk_0000000000005ED4_00000001.mfs May 27 08:30:43 backup1 mfschunkserver[2556]: testing chunk: /mnt/mfschunk1/D5/chunk_0000000000005ED5_00000001.mfs May 27 08:30:53 backup1 mfschunkserver[2556]: testing chunk: /mnt/mfschunk1/D6/chunk_0000000000005ED6_00000001.mfs May 27 08:31:03 backup1 mfschunkserver[2556]: testing chunk: /mnt/mfschunk1/D7/chunk_0000000000005ED7_00000001.mfs May 27 08:31:13 backup1 mfschunkserver[2556]: testing chunk: /mnt/mfschunk1/D8/chunk_0000000000005ED8_00000001.mfs May 27 08:31:23 backup1 mfschunkserver[2556]: testing chunk: /mnt/mfschunk1/D9/chunk_0000000000005ED9_00000001.mfs May 27 08:31:33 backup1 mfschunkserver[2556]: testing chunk: /mnt/mfschunk1/DA/chunk_0000000000005EDA_00000001.mfs May 27 08:31:43 backup1 mfsmount[12917]: master: tcp recv error: ETIMEDOUT (Operation timed out) (1) May 27 08:31:43 backup1 mfschunkserver[2556]: testing chunk: /mnt/mfschunk1/DB/chunk_0000000000005EDB_00000001.mfs May 27 08:31:44 backup1 mfsmount[12917]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 27 08:31:53 backup1 mfsmount[12917]: last message repeated 3 times May 27 08:31:53 backup1 mfschunkserver[2556]: testing chunk: /mnt/mfschunk1/DD/chunk_0000000000005EDD_00000001.mfs May 27 08:31:56 backup1 mfsmount[12917]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 27 08:32:03 backup1 mfsmount[12917]: last message repeated 2 times May 27 08:32:03 backup1 mfschunkserver[2556]: testing chunk: /mnt/mfschunk1/DE/chunk_0000000000005EDE_00000001.mfs May 27 08:32:05 backup1 mfsmount[12917]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 27 08:32:14 backup1 mfsmount[12917]: last message repeated 2 times May 27 08:32:14 backup1 mfschunkserver[2556]: testing chunk: /mnt/mfschunk1/DF/chunk_0000000000005EDF_00000001.mfs May 27 08:32:14 backup1 mfsmount[12917]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 27 08:32:24 backup1 mfsmount[12917]: last message repeated 3 times May 27 08:32:24 backup1 mfschunkserver[2556]: testing chunk: /mnt/mfschunk1/E0/chunk_0000000000005EE0_00000001.mfs May 27 08:32:25 backup1 mfschunkserver[2556]: connecting ... May 27 08:32:25 backup1 mfschunkserver[2556]: connected to Master May 27 08:32:26 backup1 mfsmount[12917]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 27 08:33:29 backup1 mfsmount[12917]: last message repeated 21 times May 27 08:34:25 backup1 mfsmount[12917]: last message repeated 18 times May 27 08:34:25 backup1 mfsmaster[12869]: connection with client(ip:192.168.3.21) has been closed by peer May 27 08:34:25 backup1 mfsmaster[12869]: connection with CS(192.168.3.21) has been closed by peer May 27 08:34:25 backup1 mfsmaster[12869]: chunkserver disconnected - ip: 192.168.3.21, port: 9422, usedspace: 7739311943680 (7207.80 GiB), totalspace: 10944744390656 (10193.09 GiB) May 27 08:34:26 backup1 mfsmount[12917]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 27 08:34:33 backup1 mfsmount[12917]: last message repeated 2 times May 27 08:34:33 backup1 mfschunkserver[2556]: testing chunk: /mnt/mfschunk1/E1/chunk_0000000000005EE1_00000001.mfs May 27 08:34:34 backup1 mfsmaster[12869]: connection with ML(192.168.3.13) has been closed by peer May 27 08:34:34 backup1 mfsmaster[12869]: connection with client(ip:192.168.3.21) has been closed by peer May 27 08:34:34 backup1 mfsmaster[12869]: chunkserver register begin (packet version: 5) - ip: 192.168.3.21, port: 9422 May 27 08:34:34 backup1 mfsmaster[12869]: connection with ML(192.168.3.13) has been closed by peer May 27 08:34:34 backup1 mfsmaster[12869]: connection with client(ip:192.168.3.21) has been closed by peer May 27 08:34:34 backup1 mfsmaster[12869]: connection with CS(192.168.3.21) has been closed by peer May 27 08:34:34 backup1 mfsmaster[12869]: chunkserver disconnected - ip: 192.168.3.21, port: 9422, usedspace: 0 (0.00 GiB), totalspace: 0 (0.00 GiB) May 27 08:34:35 backup1 mfsmaster[12869]: connection with client(ip:192.168.3.21) has been closed by peer May 27 08:34:35 backup1 mfsmaster[12869]: last message repeated 2 times May 27 08:34:35 backup1 mfsmount[12917]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 27 08:34:35 backup1 mfsmaster[12869]: connection with client(ip:192.168.3.21) has been closed by peer May 27 08:34:37 backup1 mfsmaster[12869]: last message repeated 52 times May 27 08:34:37 backup1 mfsmount[12917]: registered to master May 27 08:34:39 backup1 mfsmount[12917]: file: 7445943, index: 0 - fs_writechunk returns status 8 May 27 08:34:40 backup1 mfsmount[12917]: last message repeated 2 times May 27 08:34:40 backup1 mfschunkserver[2556]: connecting ... May 27 08:34:40 backup1 mfschunkserver[2556]: connected to Master May 27 08:34:40 backup1 mfsmount[12917]: file: 7445943, index: 0 - fs_writechunk returns status 8 May 27 08:34:41 backup1 mfsmount[12917]: file: 7445943, index: 0 - fs_writechunk returns status 8 May 27 08:34:41 backup1 mfsmaster[12869]: chunkserver register begin (packet version: 5) - ip: 192.168.3.21, port: 9422 May 27 08:34:42 backup1 mfsmaster[12869]: chunkserver register end (packet version: 5) - ip: 192.168.3.21, port: 9422, usedspace: 7739311943680 (7207.80 GiB), totalspace: 10944744390656 (10193.09 GiB) May 27 08:34:43 backup1 mfschunkserver[2556]: testing chunk: /mnt/mfschunk1/E2/chunk_0000000000005EE2_00000001.mfs May 27 08:34:53 backup1 mfschunkserver[2556]: testing chunk: /mnt/mfschunk1/E3/chunk_0000000000005EE3_00000001.mfs May 27 08:35:00 backup1 mfsmaster[12869]: chunkservers status: May 27 08:35:00 backup1 mfsmaster[12869]: server 1 (ip: 192.168.3.21, port: 9422): usedspace: 7739314479104 (7207.80 GiB), totalspace: 10944744390656 (10193.09 GiB), usage: 70.71% May 27 08:35:00 backup1 mfsmaster[12869]: total: usedspace: 7739314479104 (7207.80 GiB), totalspace: 10944744390656 (10193.09 GiB), usage: 70.71% May 27 08:35:03 backup1 mfschunkserver[2556]: testing chunk: /mnt/mfschunk1/E4/chunk_0000000000005EE4_00000001.mfs May 27 08:35:14 backup1 mfschunkserver[2556]: testing chunk: /mnt/mfschunk1/E5/chunk_0000000000005EE5_00000001.mfs May 27 08:35:24 backup1 mfschunkserver[2556]: testing chunk: /mnt/mfschunk1/E6/chunk_0000000000005EE6_00000001.mfs May 27 08:35:34 backup1 mfschunkserver[2556]: testing chunk: /mnt/mfschunk1/E7/chunk_0000000000005EE7_00000001.mfs May 27 08:35:44 backup1 mfschunkserver[2556]: testing chunk: /mnt/mfschunk1/E8/chunk_0000000000005EE8_00000001.mfs May 27 08:35:54 backup1 mfschunkserver[2556]: testing chunk: /mnt/mfschunk1/E9/chunk_0000000000005EE9_00000001.mfs May 27 08:36:00 backup1 mfsmaster[12869]: chunkservers status: May 27 08:36:00 backup1 mfsmaster[12869]: server 1 (ip: 192.168.3.21, port: 9422): usedspace: 7739314479104 (7207.80 GiB), totalspace: 10944744390656 (10193.09 GiB), usage: 70.71% May 27 08:36:00 backup1 mfsmaster[12869]: total: usedspace: 7739314479104 (7207.80 GiB), totalspace: 10944744390656 (10193.09 GiB), usage: 70.71% May 27 08:36:04 backup1 mfschunkserver[2556]: testing chunk: /mnt/mfschunk1/EA/chunk_0000000000005EEA_00000001.mfs May 27 08:36:14 backup1 mfschunkserver[2556]: testing chunk: /mnt/mfschunk1/EB/chunk_0000000000005EEB_00000001.mfs May 27 08:36:24 backup1 mfschunkserver[2556]: testing chunk: /mnt/mfschunk1/EC/chunk_0000000000005EEC_00000001.mfs May 27 08:36:34 backup1 mfschunkserver[2556]: testing chunk: /mnt/mfschunk1/ED/chunk_0000000000005EED_00000001.mfs May 27 08:36:45 backup1 mfschunkserver[2556]: testing chunk: /mnt/mfschunk1/EE/chunk_0000000000005EEE_00000001.mfs May 27 08:36:55 backup1 mfschunkserver[2556]: testing chunk: /mnt/mfschunk1/EF/chunk_0000000000005EEF_00000001.mfs May 27 08:37:00 backup1 mfsmaster[12869]: chunkservers status: May 27 08:37:00 backup1 mfsmaster[12869]: server 1 (ip: 192.168.3.21, port: 9422): usedspace: 7739314479104 (7207.80 GiB), totalspace: 10944744390656 (10193.09 GiB), usage: 70.71% May 27 08:37:00 backup1 mfsmaster[12869]: total: usedspace: 7739314479104 (7207.80 GiB), totalspace: 10944744390656 (10193.09 GiB), usage: 70.71% Can you help me what to do with it? Is this fuse, moosefs, kernel problem or something else? Thank you, tamas |
From: Papp T. <to...@ma...> - 2011-05-28 13:13:10
|
On 05/27/2011 10:19 AM, Papp Tamas wrote: > hi! > > Sometimes there is an error on our mini cluster. > Still Ubuntu Natty with recompiled moosefs from ppa. > > May 27 08:15:00 backup1 mfsmaster[12869]: server 1 (ip: 192.168.3.21, > port: 9422): usedspace: 7739308400640 (7207.79 GiB), totalspace: > 10944744390656 (10193.09 GiB), usage: 70.71% > May 27 08:15:00 backup1 mfsmaster[12869]: total: usedspace: > 7739308400640 (7207.79 GiB), totalspace: 10944744390656 (10193.09 GiB), > usage: 70.71% > May 27 08:15:02 backup1 mfschunkserver[2556]: testing chunk: > /mnt/mfschunk1/B1/chunk_0000000000005EB1_00000001.mfs > May 27 08:15:12 backup1 mfschunkserver[2556]: testing chunk: > /mnt/mfschunk1/B2/chunk_0000000000005EB2_00000001.mfs > May 27 08:15:23 backup1 mfschunkserver[2556]: testing chunk: > /mnt/mfschunk1/B3/chunk_0000000000005EB3_00000001.mfs > May 27 08:15:28 backup1 mfsmount[12917]: master: tcp recv error: > ETIMEDOUT (Operation timed out) (1) > May 27 08:15:29 backup1 mfsmount[12917]: master: register error (read > header: ETIMEDOUT (Operation timed out)) > May 27 08:15:32 backup1 mfsmount[12917]: master: register error (read > header: ETIMEDOUT (Operation timed out)) > May 27 08:15:33 backup1 mfschunkserver[2556]: testing chunk: > /mnt/mfschunk1/B4/chunk_0000000000005EB4_00000001.mfs > May 27 08:15:35 backup1 mfsmount[12917]: master: register error (read > header: ETIMEDOUT (Operation timed out)) > May 27 08:15:44 backup1 mfsmount[12917]: last message repeated 3 times > May 27 08:15:44 backup1 mfschunkserver[2556]: testing chunk: > /mnt/mfschunk1/B5/chunk_0000000000005EB5_00000001.mfs > May 27 08:15:47 backup1 mfsmount[12917]: master: register error (read > header: ETIMEDOUT (Operation timed out)) > May 27 08:15:55 backup1 mfsmount[12917]: last message repeated 2 times > May 27 08:15:55 backup1 mfschunkserver[2556]: testing chunk: > /mnt/mfschunk1/B6/chunk_0000000000005EB6_00000001.mfs > May 27 08:15:56 backup1 mfsmount[12917]: master: register error (read > header: ETIMEDOUT (Operation timed out)) > May 27 08:15:59 backup1 mfsmount[12917]: master: register error (read > header: ETIMEDOUT (Operation timed out)) > May 27 08:16:00 backup1 mfschunkserver[2556]: connecting ... > May 27 08:16:00 backup1 mfschunkserver[2556]: connected to Master > May 27 08:16:02 backup1 mfsmount[12917]: master: register error (read > header: ETIMEDOUT (Operation timed out)) > May 27 08:17:05 backup1 mfsmount[12917]: last message repeated 21 times > May 27 08:17:15 backup1 mfsmount[12917]: last message repeated 3 times > May 27 08:17:15 backup1 mfsmaster[12869]: connection with > client(ip:192.168.3.21) has been closed by peer > May 27 08:17:15 backup1 mfsmaster[12869]: connection with > CS(192.168.3.21) has been closed by peer > May 27 08:17:15 backup1 mfsmaster[12869]: chunkserver disconnected - ip: > 192.168.3.21, port: 9422, usedspace: 7739308400640 (7207.79 GiB), > totalspace: 10944744390656 (10193.09 GiB) > May 27 08:17:16 backup1 mfsmaster[12869]: connection with > ML(192.168.3.13) has been closed by peer > May 27 08:17:16 backup1 mfsmaster[12869]: connection with > client(ip:192.168.3.21) has been closed by peer > May 27 08:17:17 backup1 mfsmaster[12869]: last message repeated 7 times > May 27 08:17:17 backup1 mfsmount[12917]: master: register error (read > header: ETIMEDOUT (Operation timed out)) > May 27 08:17:17 backup1 mfsmaster[12869]: connection with > client(ip:192.168.3.21) has been closed by peer > May 27 08:17:19 backup1 mfsmaster[12869]: last message repeated 28 times > May 27 08:17:19 backup1 mfsmount[12917]: registered to master > May 27 08:18:00 backup1 mfsmaster[12869]: chunkservers status: > May 27 08:18:00 backup1 mfsmaster[12869]: total: usedspace: 0 (0.00 > GiB), totalspace: 0 (0.00 GiB), usage: 0.00% > May 27 08:18:16 backup1 mfsmaster[12869]: chunkserver disconnected - ip: > 192.168.3.21, port: 0, usedspace: 0 (0.00 GiB), totalspace: 0 (0.00 GiB) > May 27 08:19:00 backup1 mfsmaster[12869]: chunkservers status: > May 27 08:19:00 backup1 mfsmaster[12869]: total: usedspace: 0 (0.00 > GiB), totalspace: 0 (0.00 GiB), usage: 0.00% > May 27 08:19:14 backup1 mfsmount[12917]: file: 7340122, index: 0 - > fs_writechunk returns status 8 > May 27 08:20:00 backup1 mfsmount[12917]: last message repeated 17 times > May 27 08:20:00 backup1 mfsmaster[12869]: chunkservers status: > May 27 08:20:00 backup1 mfsmaster[12869]: total: usedspace: 0 (0.00 > GiB), totalspace: 0 (0.00 GiB), usage: 0.00% > May 27 08:20:05 backup1 mfsmount[12917]: file: 7340122, index: 0 - > fs_writechunk returns status 8 > May 27 08:21:00 backup1 mfsmount[12917]: last message repeated 7 times > May 27 08:21:00 backup1 mfsmaster[12869]: chunkservers status: > May 27 08:21:00 backup1 mfsmaster[12869]: total: usedspace: 0 (0.00 > GiB), totalspace: 0 (0.00 GiB), usage: 0.00% > May 27 08:21:02 backup1 mfsmount[12917]: file: 7340122, index: 0 - > fs_writechunk returns status 8 > May 27 08:21:29 backup1 mfsmount[12917]: last message repeated 3 times > May 27 08:21:29 backup1 mfsmount[12917]: error writing file number > 7340122: EIO (Input/output error) > May 27 08:21:30 backup1 mfsmount[12917]: file: 7340122, index: 0, chunk: > 4816320, version: 1 - there are no valid copies > May 27 08:21:30 backup1 mfsmount[12917]: file: 7340122, index: 0 - can't > connect to proper chunkserver (try counter: 1) > May 27 08:22:00 backup1 mfsmaster[12869]: chunkservers status: > May 27 08:22:00 backup1 mfsmaster[12869]: total: usedspace: 0 (0.00 > GiB), totalspace: 0 (0.00 GiB), usage: 0.00% > May 27 08:22:30 backup1 mfsmount[12917]: file: 7340122, index: 0, chunk: > 4816320, version: 1 - there are no valid copies > May 27 08:22:30 backup1 mfsmount[12917]: file: 7340122, index: 0 - can't > connect to proper chunkserver (try counter: 8) > May 27 08:23:00 backup1 mfsmaster[12869]: chunkservers status: > May 27 08:23:00 backup1 mfsmaster[12869]: total: usedspace: 0 (0.00 > GiB), totalspace: 0 (0.00 GiB), usage: 0.00% > May 27 08:23:30 backup1 mfsmount[12917]: file: 7340122, index: 0, chunk: > 4816320, version: 1 - there are no valid copies > May 27 08:23:30 backup1 mfsmount[12917]: file: 7340122, index: 0 - can't > connect to proper chunkserver (try counter: 15) > May 27 08:24:00 backup1 mfsmaster[12869]: chunkservers status: > May 27 08:24:00 backup1 mfsmaster[12869]: total: usedspace: 0 (0.00 > GiB), totalspace: 0 (0.00 GiB), usage: 0.00% > May 27 08:24:30 backup1 mfsmount[12917]: file: 7340122, index: 0, chunk: > 4816320, version: 1 - there are no valid copies > May 27 08:24:30 backup1 mfsmount[12917]: file: 7340122, index: 0 - can't > connect to proper chunkserver (try counter: 22) > May 27 08:25:00 backup1 mfsmaster[12869]: chunkservers status: > May 27 08:25:00 backup1 mfsmaster[12869]: total: usedspace: 0 (0.00 > GiB), totalspace: 0 (0.00 GiB), usage: 0.00% > May 27 08:25:30 backup1 mfsmount[12917]: file: 7340122, index: 0, chunk: > 4816320, version: 1 - there are no valid copies > May 27 08:25:30 backup1 mfsmount[12917]: file: 7340122, index: 0 - can't > connect to proper chunkserver (try counter: 29) > May 27 08:26:00 backup1 mfsmaster[12869]: chunkservers status: > May 27 08:26:00 backup1 mfsmaster[12869]: total: usedspace: 0 (0.00 > GiB), totalspace: 0 (0.00 GiB), usage: 0.00% > May 27 08:26:10 backup1 mfschunkserver[2556]: testing chunk: > /mnt/mfschunk1/B7/chunk_0000000000005EB7_00000001.mfs > May 27 08:26:10 backup1 mfschunkserver[2556]: connection reset by Master > May 27 08:26:20 backup1 mfschunkserver[2556]: connecting ... > May 27 08:26:20 backup1 mfschunkserver[2556]: connected to Master > May 27 08:26:21 backup1 mfschunkserver[2556]: testing chunk: > /mnt/mfschunk1/B8/chunk_0000000000005EB8_00000001.mfs > May 27 08:26:21 backup1 mfsmaster[12869]: chunkserver register begin > (packet version: 5) - ip: 192.168.3.21, port: 9422 > May 27 08:26:28 backup1 mfsmaster[12869]: chunkserver register end > (packet version: 5) - ip: 192.168.3.21, port: 9422, usedspace: > 7739308400640 (7207.79 GiB), totalspace: 10944744390656 (10193.09 GiB) > May 27 08:26:31 backup1 mfschunkserver[2556]: testing chunk: > /mnt/mfschunk1/B9/chunk_0000000000005EB9_00000001.mfs > May 27 08:26:41 backup1 mfschunkserver[2556]: testing chunk: > /mnt/mfschunk1/BA/chunk_0000000000005EBA_00000001.mfs > May 27 08:26:42 backup1 mfsmount[12917]: file: 7445943, index: 0, chunk: > 4816323, version: 1 - writeworker: connection with (C0A80315:9422) was > timed out (unfinished writes: 2; try counter: 1) > May 27 08:26:51 backup1 mfsmount[12917]: last message repeated 2 times > May 27 08:26:51 backup1 mfschunkserver[2556]: testing chunk: > /mnt/mfschunk1/BB/chunk_0000000000005EBB_00000001.mfs > May 27 08:27:00 backup1 mfsmaster[12869]: chunkservers status: > May 27 08:27:00 backup1 mfsmaster[12869]: server 1 (ip: 192.168.3.21, > port: 9422): usedspace: 7739309965312 (7207.79 GiB), totalspace: > 10944744390656 (10193.09 GiB), usage: 70.71% > May 27 08:27:00 backup1 mfsmaster[12869]: total: usedspace: > 7739309965312 (7207.79 GiB), totalspace: 10944744390656 (10193.09 GiB), > usage: 70.71% > May 27 08:27:01 backup1 mfschunkserver[2556]: testing chunk: > /mnt/mfschunk1/BC/chunk_0000000000005EBC_00000001.mfs > May 27 08:27:11 backup1 mfschunkserver[2556]: testing chunk: > /mnt/mfschunk1/BD/chunk_0000000000005EBD_00000001.mfs > May 27 08:27:21 backup1 mfschunkserver[2556]: testing chunk: > /mnt/mfschunk1/BE/chunk_0000000000005EBE_00000001.mfs > May 27 08:27:31 backup1 mfschunkserver[2556]: testing chunk: > /mnt/mfschunk1/BF/chunk_0000000000005EBF_00000001.mfs > May 27 08:27:41 backup1 mfschunkserver[2556]: testing chunk: > /mnt/mfschunk1/C0/chunk_0000000000005EC0_00000001.mfs > May 27 08:27:51 backup1 mfschunkserver[2556]: testing chunk: > /mnt/mfschunk1/C1/chunk_0000000000005EC1_00000001.mfs > May 27 08:28:00 backup1 mfsmaster[12869]: chunkservers status: > May 27 08:28:00 backup1 mfsmaster[12869]: server 1 (ip: 192.168.3.21, > port: 9422): usedspace: 7739310948352 (7207.79 GiB), totalspace: > 10944744390656 (10193.09 GiB), usage: 70.71% > May 27 08:28:00 backup1 mfsmaster[12869]: total: usedspace: > 7739310948352 (7207.79 GiB), totalspace: 10944744390656 (10193.09 GiB), > usage: 70.71% > May 27 08:28:01 backup1 mfschunkserver[2556]: testing chunk: > /mnt/mfschunk1/C3/chunk_0000000000005EC3_00000001.mfs > May 27 08:28:11 backup1 mfschunkserver[2556]: testing chunk: > /mnt/mfschunk1/C4/chunk_0000000000005EC4_00000001.mfs > May 27 08:28:22 backup1 mfschunkserver[2556]: testing chunk: > /mnt/mfschunk1/C5/chunk_0000000000005EC5_00000001.mfs > May 27 08:28:32 backup1 mfschunkserver[2556]: testing chunk: > /mnt/mfschunk1/C7/chunk_0000000000005EC7_00000001.mfs > May 27 08:28:42 backup1 mfschunkserver[2556]: testing chunk: > /mnt/mfschunk1/C8/chunk_0000000000005EC8_00000001.mfs > May 27 08:28:52 backup1 mfschunkserver[2556]: testing chunk: > /mnt/mfschunk1/C9/chunk_0000000000005EC9_00000001.mfs > May 27 08:29:00 backup1 mfsmaster[12869]: server 1 (ip: 192.168.3.21, > port: 9422): usedspace: 7739313045504 (7207.80 GiB), totalspace: > 10944744390656 (10193.09 GiB), usage: 70.71% > May 27 08:29:00 backup1 mfsmaster[12869]: total: usedspace: > 7739313045504 (7207.80 GiB), totalspace: 10944744390656 (10193.09 GiB), > usage: 70.71% > May 27 08:29:02 backup1 mfschunkserver[2556]: testing chunk: > /mnt/mfschunk1/CA/chunk_0000000000005ECA_00000001.mfs > May 27 08:29:12 backup1 mfschunkserver[2556]: testing chunk: > /mnt/mfschunk1/CB/chunk_0000000000005ECB_00000001.mfs > May 27 08:29:22 backup1 mfschunkserver[2556]: testing chunk: > /mnt/mfschunk1/CC/chunk_0000000000005ECC_00000001.mfs > May 27 08:29:26 backup1 mfsmount[12917]: master: tcp recv error: > ETIMEDOUT (Operation timed out) (1) > May 27 08:29:27 backup1 mfsmount[12917]: master: register error (read > header: ETIMEDOUT (Operation timed out)) > May 27 08:29:30 backup1 mfsmount[12917]: master: register error (read > header: ETIMEDOUT (Operation timed out)) > May 27 08:29:32 backup1 mfschunkserver[2556]: testing chunk: > /mnt/mfschunk1/CD/chunk_0000000000005ECD_00000001.mfs > May 27 08:29:32 backup1 mfsmaster[12869]: connection with > client(ip:192.168.3.21) has been closed by peer > May 27 08:29:32 backup1 mfsmaster[12869]: last message repeated 2 times > May 27 08:29:32 backup1 mfsmount[12917]: registered to master > May 27 08:29:42 backup1 mfschunkserver[2556]: testing chunk: > /mnt/mfschunk1/CE/chunk_0000000000005ECE_00000001.mfs > May 27 08:29:52 backup1 mfschunkserver[2556]: testing chunk: > /mnt/mfschunk1/D0/chunk_0000000000005ED0_00000001.mfs > May 27 08:30:02 backup1 mfschunkserver[2556]: testing chunk: > /mnt/mfschunk1/D1/chunk_0000000000005ED1_00000001.mfs > May 27 08:30:05 backup1 mfsmount[12917]: file: 122301, index: 0, chunk: > 4816324, version: 1 - writeworker: connection with (C0A80315:9422) was > timed out (unfinished writes: 1; try counter: 1) > May 27 08:30:12 backup1 mfschunkserver[2556]: testing chunk: > /mnt/mfschunk1/D2/chunk_0000000000005ED2_00000001.mfs > May 27 08:30:22 backup1 mfschunkserver[2556]: testing chunk: > /mnt/mfschunk1/D3/chunk_0000000000005ED3_00000001.mfs > May 27 08:30:33 backup1 mfschunkserver[2556]: testing chunk: > /mnt/mfschunk1/D4/chunk_0000000000005ED4_00000001.mfs > May 27 08:30:43 backup1 mfschunkserver[2556]: testing chunk: > /mnt/mfschunk1/D5/chunk_0000000000005ED5_00000001.mfs > May 27 08:30:53 backup1 mfschunkserver[2556]: testing chunk: > /mnt/mfschunk1/D6/chunk_0000000000005ED6_00000001.mfs > May 27 08:31:03 backup1 mfschunkserver[2556]: testing chunk: > /mnt/mfschunk1/D7/chunk_0000000000005ED7_00000001.mfs > May 27 08:31:13 backup1 mfschunkserver[2556]: testing chunk: > /mnt/mfschunk1/D8/chunk_0000000000005ED8_00000001.mfs > May 27 08:31:23 backup1 mfschunkserver[2556]: testing chunk: > /mnt/mfschunk1/D9/chunk_0000000000005ED9_00000001.mfs > May 27 08:31:33 backup1 mfschunkserver[2556]: testing chunk: > /mnt/mfschunk1/DA/chunk_0000000000005EDA_00000001.mfs > May 27 08:31:43 backup1 mfsmount[12917]: master: tcp recv error: > ETIMEDOUT (Operation timed out) (1) > May 27 08:31:43 backup1 mfschunkserver[2556]: testing chunk: > /mnt/mfschunk1/DB/chunk_0000000000005EDB_00000001.mfs > May 27 08:31:44 backup1 mfsmount[12917]: master: register error (read > header: ETIMEDOUT (Operation timed out)) > May 27 08:31:53 backup1 mfsmount[12917]: last message repeated 3 times > May 27 08:31:53 backup1 mfschunkserver[2556]: testing chunk: > /mnt/mfschunk1/DD/chunk_0000000000005EDD_00000001.mfs > May 27 08:31:56 backup1 mfsmount[12917]: master: register error (read > header: ETIMEDOUT (Operation timed out)) > May 27 08:32:03 backup1 mfsmount[12917]: last message repeated 2 times > May 27 08:32:03 backup1 mfschunkserver[2556]: testing chunk: > /mnt/mfschunk1/DE/chunk_0000000000005EDE_00000001.mfs > May 27 08:32:05 backup1 mfsmount[12917]: master: register error (read > header: ETIMEDOUT (Operation timed out)) > May 27 08:32:14 backup1 mfsmount[12917]: last message repeated 2 times > May 27 08:32:14 backup1 mfschunkserver[2556]: testing chunk: > /mnt/mfschunk1/DF/chunk_0000000000005EDF_00000001.mfs > May 27 08:32:14 backup1 mfsmount[12917]: master: register error (read > header: ETIMEDOUT (Operation timed out)) > May 27 08:32:24 backup1 mfsmount[12917]: last message repeated 3 times > May 27 08:32:24 backup1 mfschunkserver[2556]: testing chunk: > /mnt/mfschunk1/E0/chunk_0000000000005EE0_00000001.mfs > May 27 08:32:25 backup1 mfschunkserver[2556]: connecting ... > May 27 08:32:25 backup1 mfschunkserver[2556]: connected to Master > May 27 08:32:26 backup1 mfsmount[12917]: master: register error (read > header: ETIMEDOUT (Operation timed out)) > May 27 08:33:29 backup1 mfsmount[12917]: last message repeated 21 times > May 27 08:34:25 backup1 mfsmount[12917]: last message repeated 18 times > May 27 08:34:25 backup1 mfsmaster[12869]: connection with > client(ip:192.168.3.21) has been closed by peer > May 27 08:34:25 backup1 mfsmaster[12869]: connection with > CS(192.168.3.21) has been closed by peer > May 27 08:34:25 backup1 mfsmaster[12869]: chunkserver disconnected - ip: > 192.168.3.21, port: 9422, usedspace: 7739311943680 (7207.80 GiB), > totalspace: 10944744390656 (10193.09 GiB) > May 27 08:34:26 backup1 mfsmount[12917]: master: register error (read > header: ETIMEDOUT (Operation timed out)) > May 27 08:34:33 backup1 mfsmount[12917]: last message repeated 2 times > May 27 08:34:33 backup1 mfschunkserver[2556]: testing chunk: > /mnt/mfschunk1/E1/chunk_0000000000005EE1_00000001.mfs > May 27 08:34:34 backup1 mfsmaster[12869]: connection with > ML(192.168.3.13) has been closed by peer > May 27 08:34:34 backup1 mfsmaster[12869]: connection with > client(ip:192.168.3.21) has been closed by peer > May 27 08:34:34 backup1 mfsmaster[12869]: chunkserver register begin > (packet version: 5) - ip: 192.168.3.21, port: 9422 > May 27 08:34:34 backup1 mfsmaster[12869]: connection with > ML(192.168.3.13) has been closed by peer > May 27 08:34:34 backup1 mfsmaster[12869]: connection with > client(ip:192.168.3.21) has been closed by peer > May 27 08:34:34 backup1 mfsmaster[12869]: connection with > CS(192.168.3.21) has been closed by peer > May 27 08:34:34 backup1 mfsmaster[12869]: chunkserver disconnected - ip: > 192.168.3.21, port: 9422, usedspace: 0 (0.00 GiB), totalspace: 0 (0.00 GiB) > May 27 08:34:35 backup1 mfsmaster[12869]: connection with > client(ip:192.168.3.21) has been closed by peer > May 27 08:34:35 backup1 mfsmaster[12869]: last message repeated 2 times > May 27 08:34:35 backup1 mfsmount[12917]: master: register error (read > header: ETIMEDOUT (Operation timed out)) > May 27 08:34:35 backup1 mfsmaster[12869]: connection with > client(ip:192.168.3.21) has been closed by peer > May 27 08:34:37 backup1 mfsmaster[12869]: last message repeated 52 times > May 27 08:34:37 backup1 mfsmount[12917]: registered to master > May 27 08:34:39 backup1 mfsmount[12917]: file: 7445943, index: 0 - > fs_writechunk returns status 8 > May 27 08:34:40 backup1 mfsmount[12917]: last message repeated 2 times > May 27 08:34:40 backup1 mfschunkserver[2556]: connecting ... > May 27 08:34:40 backup1 mfschunkserver[2556]: connected to Master > May 27 08:34:40 backup1 mfsmount[12917]: file: 7445943, index: 0 - > fs_writechunk returns status 8 > May 27 08:34:41 backup1 mfsmount[12917]: file: 7445943, index: 0 - > fs_writechunk returns status 8 > May 27 08:34:41 backup1 mfsmaster[12869]: chunkserver register begin > (packet version: 5) - ip: 192.168.3.21, port: 9422 > May 27 08:34:42 backup1 mfsmaster[12869]: chunkserver register end > (packet version: 5) - ip: 192.168.3.21, port: 9422, usedspace: > 7739311943680 (7207.80 GiB), totalspace: 10944744390656 (10193.09 GiB) > May 27 08:34:43 backup1 mfschunkserver[2556]: testing chunk: > /mnt/mfschunk1/E2/chunk_0000000000005EE2_00000001.mfs > May 27 08:34:53 backup1 mfschunkserver[2556]: testing chunk: > /mnt/mfschunk1/E3/chunk_0000000000005EE3_00000001.mfs > May 27 08:35:00 backup1 mfsmaster[12869]: chunkservers status: > May 27 08:35:00 backup1 mfsmaster[12869]: server 1 (ip: 192.168.3.21, > port: 9422): usedspace: 7739314479104 (7207.80 GiB), totalspace: > 10944744390656 (10193.09 GiB), usage: 70.71% > May 27 08:35:00 backup1 mfsmaster[12869]: total: usedspace: > 7739314479104 (7207.80 GiB), totalspace: 10944744390656 (10193.09 GiB), > usage: 70.71% > May 27 08:35:03 backup1 mfschunkserver[2556]: testing chunk: > /mnt/mfschunk1/E4/chunk_0000000000005EE4_00000001.mfs > May 27 08:35:14 backup1 mfschunkserver[2556]: testing chunk: > /mnt/mfschunk1/E5/chunk_0000000000005EE5_00000001.mfs > May 27 08:35:24 backup1 mfschunkserver[2556]: testing chunk: > /mnt/mfschunk1/E6/chunk_0000000000005EE6_00000001.mfs > May 27 08:35:34 backup1 mfschunkserver[2556]: testing chunk: > /mnt/mfschunk1/E7/chunk_0000000000005EE7_00000001.mfs > May 27 08:35:44 backup1 mfschunkserver[2556]: testing chunk: > /mnt/mfschunk1/E8/chunk_0000000000005EE8_00000001.mfs > May 27 08:35:54 backup1 mfschunkserver[2556]: testing chunk: > /mnt/mfschunk1/E9/chunk_0000000000005EE9_00000001.mfs > May 27 08:36:00 backup1 mfsmaster[12869]: chunkservers status: > May 27 08:36:00 backup1 mfsmaster[12869]: server 1 (ip: 192.168.3.21, > port: 9422): usedspace: 7739314479104 (7207.80 GiB), totalspace: > 10944744390656 (10193.09 GiB), usage: 70.71% > May 27 08:36:00 backup1 mfsmaster[12869]: total: usedspace: > 7739314479104 (7207.80 GiB), totalspace: 10944744390656 (10193.09 GiB), > usage: 70.71% > May 27 08:36:04 backup1 mfschunkserver[2556]: testing chunk: > /mnt/mfschunk1/EA/chunk_0000000000005EEA_00000001.mfs > May 27 08:36:14 backup1 mfschunkserver[2556]: testing chunk: > /mnt/mfschunk1/EB/chunk_0000000000005EEB_00000001.mfs > May 27 08:36:24 backup1 mfschunkserver[2556]: testing chunk: > /mnt/mfschunk1/EC/chunk_0000000000005EEC_00000001.mfs > May 27 08:36:34 backup1 mfschunkserver[2556]: testing chunk: > /mnt/mfschunk1/ED/chunk_0000000000005EED_00000001.mfs > May 27 08:36:45 backup1 mfschunkserver[2556]: testing chunk: > /mnt/mfschunk1/EE/chunk_0000000000005EEE_00000001.mfs > May 27 08:36:55 backup1 mfschunkserver[2556]: testing chunk: > /mnt/mfschunk1/EF/chunk_0000000000005EEF_00000001.mfs > May 27 08:37:00 backup1 mfsmaster[12869]: chunkservers status: > May 27 08:37:00 backup1 mfsmaster[12869]: server 1 (ip: 192.168.3.21, > port: 9422): usedspace: 7739314479104 (7207.80 GiB), totalspace: > 10944744390656 (10193.09 GiB), usage: 70.71% > May 27 08:37:00 backup1 mfsmaster[12869]: total: usedspace: > 7739314479104 (7207.80 GiB), totalspace: 10944744390656 (10193.09 GiB), > usage: 70.71% > > > Can you help me what to do with it? > Is this fuse, moosefs, kernel problem or something else? hi! I've just realized an other error. $ dirvish --vault cluster/Projects dirvish cluster/Projects:default fatal error: filesystem full dirvish cluster/Projects:default fatal error (12) -- filesystem full cluster/Projects:default post-server failed (1) log: May 28 14:58:04 backup1 mfschunkserver[2556]: testing chunk: /mnt/mfschunk1/70/chunk_0000000000008470_00000001.mfs May 28 14:58:14 backup1 mfschunkserver[2556]: testing chunk: /mnt/mfschunk1/71/chunk_0000000000008471_00000001.mfs May 28 14:58:24 backup1 mfschunkserver[2556]: testing chunk: /mnt/mfschunk1/72/chunk_0000000000008472_00000001.mfs May 28 14:58:34 backup1 mfschunkserver[2556]: testing chunk: /mnt/mfschunk1/73/chunk_0000000000008473_00000001.mfs May 28 14:58:44 backup1 mfschunkserver[2556]: testing chunk: /mnt/mfschunk1/74/chunk_0000000000008474_00000001.mfs May 28 14:58:55 backup1 mfschunkserver[2556]: testing chunk: /mnt/mfschunk1/75/chunk_0000000000008475_00000001.mfs May 28 14:59:00 backup1 mfsmaster[12869]: chunkservers status: May 28 14:59:00 backup1 mfsmaster[12869]: server 1 (ip: 192.168.3.21, port: 9422): usedspace: 7734218166272 (7203.05 GiB), totalspace: 10944744390656 (10193.09 GiB), usage: 70.67% May 28 14:59:00 backup1 mfsmaster[12869]: total: usedspace: 7734218166272 (7203.05 GiB), totalspace: 10944744390656 (10193.09 GiB), usage: 70.67% May 28 14:59:05 backup1 mfschunkserver[2556]: testing chunk: /mnt/mfschunk1/76/chunk_0000000000008476_00000001.mfs May 28 14:59:15 backup1 mfschunkserver[2556]: testing chunk: /mnt/mfschunk1/77/chunk_0000000000008477_00000001.mfs May 28 14:59:24 backup1 mfsmount[12917]: master: tcp recv error: ETIMEDOUT (Operation timed out) (1) May 28 14:59:25 backup1 mfsmount[12917]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 14:59:25 backup1 mfschunkserver[2556]: testing chunk: /mnt/mfschunk1/78/chunk_0000000000008478_00000001.mfs May 28 14:59:28 backup1 mfsmount[12917]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 14:59:35 backup1 mfsmount[12917]: last message repeated 2 times May 28 14:59:35 backup1 mfschunkserver[2556]: testing chunk: /mnt/mfschunk1/79/chunk_0000000000008479_00000001.mfs May 28 14:59:37 backup1 mfsmount[12917]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 14:59:45 backup1 mfsmount[12917]: last message repeated 2 times May 28 14:59:45 backup1 mfschunkserver[2556]: testing chunk: /mnt/mfschunk1/7A/chunk_000000000000847A_00000001.mfs May 28 14:59:46 backup1 mfsmount[12917]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 14:59:55 backup1 mfsmount[12917]: last message repeated 3 times May 28 14:59:55 backup1 mfschunkserver[2556]: testing chunk: /mnt/mfschunk1/7B/chunk_000000000000847B_00000001.mfs May 28 14:59:58 backup1 mfsmount[12917]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 15:00:06 backup1 mfsmount[12917]: last message repeated 2 times May 28 15:00:06 backup1 mfschunkserver[2556]: testing chunk: /mnt/mfschunk1/7C/chunk_000000000000847C_00000001.mfs May 28 15:00:07 backup1 mfsmount[12917]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 15:00:15 backup1 mfsmount[12917]: last message repeated 2 times May 28 15:00:15 backup1 mfschunkserver[2556]: connecting ... May 28 15:00:15 backup1 mfschunkserver[2556]: connected to Master May 28 15:00:16 backup1 mfsmount[12917]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 15:01:19 backup1 mfsmount[12917]: last message repeated 21 times May 28 15:02:19 backup1 mfsmount[12917]: last message repeated 19 times May 28 15:02:19 backup1 mfsmaster[12869]: connection with client(ip:192.168.3.21) has been closed by peer May 28 15:02:19 backup1 mfsmaster[12869]: connection with CS(192.168.3.21) has been closed by peer May 28 15:02:19 backup1 mfsmaster[12869]: chunkserver disconnected - ip: 192.168.3.21, port: 9422, usedspace: 7734399709184 (7203.22 GiB), totalspace: 10944744390656 (10193.09 GiB) May 28 15:02:19 backup1 mfsmount[12917]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 15:02:22 backup1 mfsmount[12917]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 15:02:23 backup1 mfsmaster[12869]: connection with ML(192.168.3.20) has been closed by peer May 28 15:02:23 backup1 mfsmaster[12869]: connection with ML(192.168.3.13) has been closed by peer May 28 15:02:23 backup1 mfsmaster[12869]: connection with client(ip:192.168.3.21) has been closed by peer May 28 15:02:23 backup1 mfsmaster[12869]: connection with ML(192.168.3.13) has been closed by peer May 28 15:02:23 backup1 mfsmaster[12869]: connection with client(ip:192.168.3.21) has been closed by peer May 28 15:02:23 backup1 mfsmaster[12869]: connection with ML(192.168.3.20) has been closed by peer May 28 15:02:23 backup1 mfsmaster[12869]: connection with client(ip:192.168.3.21) has been closed by peer May 28 15:02:23 backup1 mfsmaster[12869]: connection with ML(192.168.3.13) has been closed by peer May 28 15:02:23 backup1 mfsmaster[12869]: connection with client(ip:192.168.3.21) has been closed by peer May 28 15:02:24 backup1 mfsmaster[12869]: last message repeated 55 times May 28 15:02:24 backup1 mfsmount[12917]: registered to master May 28 15:02:25 backup1 mfsmount[12917]: file: 122301, index: 0, chunk: 4816943, version: 1 - there are no valid copies May 28 15:02:25 backup1 mfsmount[12917]: file: 122301, index: 0 - can't connect to proper chunkserver (try counter: 1) May 28 15:02:26 backup1 mfsmount[12917]: file: 4691391, index: 0 - fs_writechunk returns status 12 May 28 15:02:26 backup1 mfsmount[12917]: last message repeated 2 times May 28 15:02:26 backup1 mfsmount[12917]: file: 107222, index: 0 - fs_writechunk returns status 8 May 28 15:02:27 backup1 mfsmount[12917]: file: 4691391, index: 0 - fs_writechunk returns status 12 May 28 15:02:32 backup1 mfsmount[12917]: last message repeated 3 times May 28 15:02:32 backup1 mfsmount[12917]: file: 107222, index: 0 - fs_writechunk returns status 8 May 28 15:02:33 backup1 mfsmount[12917]: file: 4691391, index: 0 - fs_writechunk returns status 12 May 28 15:02:38 backup1 mfsmount[12917]: last message repeated 2 times May 28 15:02:38 backup1 mfsmount[12917]: file: 107222, index: 0 - fs_writechunk returns status 8 May 28 15:02:41 backup1 mfsmount[12917]: file: 4691391, index: 0 - fs_writechunk returns status 12 May 28 15:02:44 backup1 mfsmount[12917]: file: 4691391, index: 0 - fs_writechunk returns status 12 May 28 15:02:44 backup1 mfsmount[12917]: file: 107222, index: 0 - fs_writechunk returns status 8 May 28 15:02:48 backup1 mfsmount[12917]: file: 4691391, index: 0 - fs_writechunk returns status 12 May 28 15:02:51 backup1 mfsmount[12917]: file: 107222, index: 0 - fs_writechunk returns status 8 May 28 15:02:52 backup1 mfsmount[12917]: file: 4691391, index: 0 - fs_writechunk returns status 12 May 28 15:02:56 backup1 mfsmount[12917]: file: 4691391, index: 0 - fs_writechunk returns status 12 May 28 15:02:58 backup1 mfsmount[12917]: file: 107222, index: 0 - fs_writechunk returns status 8 May 28 15:03:01 backup1 mfsmount[12917]: file: 4691391, index: 0 - fs_writechunk returns status 12 May 28 15:03:05 backup1 mfsmount[12917]: file: 107222, index: 0 - fs_writechunk returns status 8 May 28 15:03:06 backup1 mfsmount[12917]: file: 4691391, index: 0 - fs_writechunk returns status 12 May 28 15:03:11 backup1 mfsmount[12917]: file: 4691391, index: 0 - fs_writechunk returns status 12 May 28 15:03:13 backup1 mfsmount[12917]: file: 107222, index: 0 - fs_writechunk returns status 8 May 28 15:03:17 backup1 mfsmount[12917]: file: 4691391, index: 0 - fs_writechunk returns status 12 May 28 15:03:20 backup1 mfsmaster[12869]: chunkserver disconnected - ip: 192.168.3.21, port: 0, usedspace: 0 (0.00 GiB), totalspace: 0 (0.00 GiB) May 28 15:03:22 backup1 mfsmount[12917]: file: 107222, index: 0 - fs_writechunk returns status 8 May 28 15:03:23 backup1 mfsmount[12917]: file: 4691391, index: 0 - fs_writechunk returns status 12 May 28 15:03:25 backup1 mfsmount[12917]: file: 122301, index: 0, chunk: 4816943, version: 1 - there are no valid copies May 28 15:03:25 backup1 mfsmount[12917]: file: 122301, index: 0 - can't connect to proper chunkserver (try counter: 8) May 28 15:03:29 backup1 mfsmount[12917]: file: 4691391, index: 0 - fs_writechunk returns status 12 May 28 15:03:30 backup1 mfsmount[12917]: file: 107222, index: 0 - fs_writechunk returns status 8 May 28 15:03:36 backup1 mfsmount[12917]: file: 4691391, index: 0 - fs_writechunk returns status 12 May 28 15:03:39 backup1 mfsmount[12917]: file: 107222, index: 0 - fs_writechunk returns status 8 May 28 15:03:43 backup1 mfsmount[12917]: file: 4691391, index: 0 - fs_writechunk returns status 12 May 28 15:03:48 backup1 mfsmount[12917]: file: 107222, index: 0 - fs_writechunk returns status 8 May 28 15:03:50 backup1 mfsmount[12917]: file: 4691391, index: 0 - fs_writechunk returns status 12 May 28 15:03:57 backup1 mfsmount[12917]: file: 107222, index: 0 - fs_writechunk returns status 8 May 28 15:03:57 backup1 mfsmount[12917]: error writing file number 107222: EIO (Input/output error) May 28 15:03:58 backup1 mfsmount[12917]: file: 4691391, index: 0 - fs_writechunk returns status 12 May 28 15:04:00 backup1 mfsmaster[12869]: chunkservers status: May 28 15:04:00 backup1 mfsmaster[12869]: total: usedspace: 0 (0.00 GiB), totalspace: 0 (0.00 GiB), usage: 0.00% May 28 15:04:06 backup1 mfsmount[12917]: file: 4691391, index: 0 - fs_writechunk returns status 12 May 28 15:04:25 backup1 mfsmount[12917]: last message repeated 2 times May 28 15:04:25 backup1 mfsmount[12917]: file: 122301, index: 0, chunk: 4816943, version: 1 - there are no valid copies May 28 15:04:25 backup1 mfsmount[12917]: file: 122301, index: 0 - can't connect to proper chunkserver (try counter: 15) May 28 15:04:32 backup1 mfsmount[12917]: file: 4691391, index: 0 - fs_writechunk returns status 12 May 28 15:04:41 backup1 mfsmount[12917]: file: 4691391, index: 0 - fs_writechunk returns status 12 May 28 15:04:41 backup1 mfsmount[12917]: error writing file number 4691391: ENOSPC (No space left on device) May 28 15:04:42 backup1 mfsmount[12917]: file: 107221, index: 0 - fs_writechunk returns status 8 May 28 15:05:00 backup1 mfsmount[12917]: last message repeated 10 times May 28 15:05:00 backup1 mfsmaster[12869]: chunkservers status: May 28 15:05:00 backup1 mfsmaster[12869]: total: usedspace: 0 (0.00 GiB), totalspace: 0 (0.00 GiB), usage: 0.00% May 28 15:05:00 backup1 mfsmount[12917]: file: 107221, index: 0 - fs_writechunk returns status 8 May 28 15:05:25 backup1 mfsmount[12917]: last message repeated 5 times May 28 15:05:25 backup1 mfsmount[12917]: file: 122301, index: 0, chunk: 4816943, version: 1 - there are no valid copies May 28 15:05:25 backup1 mfsmount[12917]: file: 122301, index: 0 - can't connect to proper chunkserver (try counter: 22) May 28 15:05:27 backup1 mfsmount[12917]: file: 107221, index: 0 - fs_writechunk returns status 8 May 28 15:06:00 backup1 mfsmount[12917]: last message repeated 4 times May 28 15:06:00 backup1 mfsmaster[12869]: chunkservers status: May 28 15:06:00 backup1 mfsmaster[12869]: total: usedspace: 0 (0.00 GiB), totalspace: 0 (0.00 GiB), usage: 0.00% May 28 15:06:01 backup1 mfsmount[12917]: file: 107221, index: 0 - fs_writechunk returns status 8 May 28 15:06:25 backup1 mfsmount[12917]: last message repeated 3 times May 28 15:06:25 backup1 mfsmount[12917]: file: 122301, index: 0, chunk: 4816943, version: 1 - there are no valid copies May 28 15:06:25 backup1 mfsmount[12917]: file: 122301, index: 0 - can't connect to proper chunkserver (try counter: 29) May 28 15:06:25 backup1 mfschunkserver[2556]: testing chunk: /mnt/mfschunk1/7D/chunk_000000000000847D_00000001.mfs May 28 15:06:25 backup1 mfschunkserver[2556]: connection reset by Master May 28 15:06:32 backup1 mfsmount[12917]: file: 107221, index: 0 - fs_writechunk returns status 8 May 28 15:06:35 backup1 mfschunkserver[2556]: connecting ... May 28 15:06:35 backup1 mfschunkserver[2556]: connected to Master May 28 15:06:36 backup1 mfschunkserver[2556]: testing chunk: /mnt/mfschunk1/7E/chunk_000000000000847E_00000001.mfs May 28 15:06:36 backup1 mfsmaster[12869]: chunkserver register begin (packet version: 5) - ip: 192.168.3.21, port: 9422 May 28 15:06:38 backup1 mfsmaster[12869]: chunkserver register end (packet version: 5) - ip: 192.168.3.21, port: 9422, usedspace: 7734393311232 (7203.22 GiB), totalspace: 10944744390656 (10193.09 GiB) May 28 15:06:46 backup1 mfschunkserver[2556]: testing chunk: /mnt/mfschunk1/7F/chunk_000000000000847F_00000001.mfs May 28 15:06:56 backup1 mfschunkserver[2556]: testing chunk: /mnt/mfschunk1/80/chunk_0000000000008480_00000001.mfs May 28 15:07:00 backup1 mfsmaster[12869]: chunkservers status: May 28 15:07:00 backup1 mfsmaster[12869]: server 1 (ip: 192.168.3.21, port: 9422): usedspace: 7734393442304 (7203.22 GiB), totalspace: 10944744390656 (10193.09 GiB), usage: 70.67% May 28 15:07:00 backup1 mfsmaster[12869]: total: usedspace: 7734393442304 (7203.22 GiB), totalspace: 10944744390656 (10193.09 GiB), usage: 70.67% Filesystem Size Used Avail Use% Mounted on /dev/sda2 19G 1.4G 17G 8% / none 3.9G 180K 3.9G 1% /dev none 4.0G 0 4.0G 0% /dev/shm none 4.0G 716K 4.0G 1% /var/run none 4.0G 0 4.0G 0% /var/lock /dev/sda6 10T 7.1T 3.0T 71% /mnt/mfschunk1 /dev/sda3 19G 12G 6.0G 66% /var /dev/sda4 4.6G 138M 4.3G 4% /tmp mfsmaster:9421 10T 7.1T 3.0T 71% /data/backup Filesystem Inodes IUsed IFree IUse% Mounted on /dev/sda2 1222992 64010 1158982 6% / none 1021194 762 1020432 1% /dev none 1023079 1 1023078 1% /dev/shm none 1023079 58 1023021 1% /var/run none 1023079 1 1023078 1% /var/lock /dev/sda6 2138062720 4778154 2133284566 1% /mnt/mfschunk1 /dev/sda3 1222992 3174 1219818 1% /var /dev/sda4 305216 13 305203 1% /tmp mfsmaster:9421 1031528383 30522363 1001006020 3% /data/backup So it's definetly not full. I don't understand, what's going on:/ Does somebody have any idea? Thank you, tamas |
From: Papp T. <to...@ma...> - 2011-05-28 17:49:31
|
On 05/28/2011 03:12 PM, Papp Tamas wrote: > Does somebody have any idea? Again, more info on this: More info on this. The mfsmaster died totally, strace show nothing and the process is in state 'D': 12869 ? D< 199:42 /usr/sbin/mfsmaster Actually I'm removing files from trash. After the rm job the master node came back. log: May 28 19:30:00 backup1 mfsmaster[12869]: chunkservers status: May 28 19:30:00 backup1 mfsmaster[12869]: server 1 (ip: 192.168.3.21, port: 9422): usedspace: 7726711619584 (7196.06 GiB), totalspace: 10944744390656 (10193.09 GiB), usage: 70.60% May 28 19:30:00 backup1 mfsmaster[12869]: total: usedspace: 7726711619584 (7196.06 GiB), totalspace: 10944744390656 (10193.09 GiB), usage: 70.60% May 28 19:30:08 backup1 mfschunkserver[2556]: testing chunk: /mnt/mfschunk1/10/chunk_0000000000008A10_00000001.mfs May 28 19:30:10 backup1 mfsmount[3621]: master: tcp recv error: ETIMEDOUT (Operation timed out) (1) May 28 19:30:10 backup1 mfsmount[12917]: master: tcp recv error: ETIMEDOUT (Operation timed out) (1) May 28 19:30:11 backup1 mfsmount[3621]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:30:11 backup1 mfsmount[12917]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:30:14 backup1 mfsmount[3621]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:30:14 backup1 mfsmount[12917]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:30:17 backup1 mfsmount[3621]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:30:17 backup1 mfsmount[12917]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:30:18 backup1 mfschunkserver[2556]: testing chunk: /mnt/mfschunk1/11/chunk_0000000000008A11_00000001.mfs May 28 19:30:20 backup1 mfsmount[3621]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:30:20 backup1 mfsmount[12917]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:30:23 backup1 mfsmount[3621]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:30:23 backup1 mfsmount[12917]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:30:26 backup1 mfsmount[3621]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:30:26 backup1 mfsmount[12917]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:30:28 backup1 mfschunkserver[2556]: testing chunk: /mnt/mfschunk1/12/chunk_0000000000008A12_00000001.mfs May 28 19:30:29 backup1 mfsmount[3621]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:30:29 backup1 mfsmount[12917]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:30:32 backup1 mfsmount[3621]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:30:32 backup1 mfsmount[12917]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:30:35 backup1 mfsmount[3621]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:30:35 backup1 mfsmount[12917]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:30:38 backup1 mfsmount[3621]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:30:38 backup1 mfsmount[12917]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:30:39 backup1 mfschunkserver[2556]: testing chunk: /mnt/mfschunk1/13/chunk_0000000000008A13_00000001.mfs May 28 19:30:41 backup1 mfsmount[3621]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:30:41 backup1 mfsmount[12917]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:30:44 backup1 mfsmount[3621]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:30:44 backup1 mfsmount[12917]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:30:47 backup1 mfsmount[3621]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:30:47 backup1 mfsmount[12917]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:30:49 backup1 mfschunkserver[2556]: testing chunk: /mnt/mfschunk1/14/chunk_0000000000008A14_00000001.mfs May 28 19:30:41 backup1 mfsmount[3621]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:30:41 backup1 mfsmount[12917]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:30:44 backup1 mfsmount[3621]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:30:44 backup1 mfsmount[12917]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:30:47 backup1 mfsmount[3621]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:30:47 backup1 mfsmount[12917]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:30:49 backup1 mfschunkserver[2556]: testing chunk: /mnt/mfschunk1/14/chunk_0000000000008A14_00000001.mfs May 28 19:30:50 backup1 mfsmount[3621]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:30:50 backup1 mfsmount[12917]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:30:53 backup1 mfsmount[3621]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:30:53 backup1 mfsmount[12917]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:30:56 backup1 mfsmount[3621]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:30:56 backup1 mfsmount[12917]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:30:59 backup1 mfschunkserver[2556]: testing chunk: /mnt/mfschunk1/15/chunk_0000000000008A15_00000001.mfs May 28 19:30:59 backup1 mfsmount[3621]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:30:59 backup1 mfsmount[12917]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:31:02 backup1 mfsmount[3621]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:31:02 backup1 mfsmount[12917]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:31:05 backup1 mfschunkserver[2556]: connecting ... May 28 19:31:05 backup1 mfschunkserver[2556]: connected to Master May 28 19:31:05 backup1 mfsmount[3621]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:31:05 backup1 mfsmount[12917]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:31:08 backup1 mfsmount[3621]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:31:08 backup1 mfsmount[12917]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:31:11 backup1 mfsmount[3621]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:31:11 backup1 mfsmount[12917]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:31:14 backup1 mfsmount[3621]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:31:14 backup1 mfsmount[12917]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:31:17 backup1 mfsmount[3621]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:31:17 backup1 mfsmount[12917]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:31:20 backup1 mfsmount[3621]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:31:20 backup1 mfsmount[12917]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:31:23 backup1 mfsmount[3621]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:31:23 backup1 mfsmount[12917]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:31:26 backup1 mfsmount[3621]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:31:26 backup1 mfsmount[12917]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:31:29 backup1 mfsmount[3621]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:31:29 backup1 mfsmount[12917]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:31:32 backup1 mfsmount[3621]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:31:32 backup1 mfsmount[12917]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:31:35 backup1 mfsmount[3621]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:31:35 backup1 mfsmount[12917]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:31:38 backup1 mfsmount[3621]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:31:38 backup1 mfsmount[12917]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:31:41 backup1 mfsmount[3621]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:31:41 backup1 mfsmount[12917]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:31:44 backup1 mfsmount[3621]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:31:44 backup1 mfsmount[12917]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:31:47 backup1 mfsmount[3621]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:31:47 backup1 mfsmount[12917]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:31:50 backup1 mfsmount[3621]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:31:50 backup1 mfsmount[12917]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:31:53 backup1 mfsmount[3621]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:31:53 backup1 mfsmount[12917]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:31:56 backup1 mfsmount[3621]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:31:56 backup1 mfsmount[12917]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:31:59 backup1 mfsmount[3621]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:31:59 backup1 mfsmount[12917]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:32:02 backup1 mfsmount[3621]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:32:02 backup1 mfsmount[12917]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:32:05 backup1 mfsmount[3621]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:32:05 backup1 mfsmount[12917]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:32:08 backup1 mfsmount[3621]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:32:08 backup1 mfsmount[12917]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:32:11 backup1 mfsmount[3621]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:32:11 backup1 mfsmount[12917]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:32:14 backup1 mfsmount[3621]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:32:14 backup1 mfsmount[12917]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:32:15 backup1 mfschunkserver[2556]: testing chunk: /mnt/mfschunk1/16/chunk_0000000000008A16_00000001.mfs May 28 19:32:17 backup1 mfsmount[3621]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:32:17 backup1 mfsmount[12917]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:32:20 backup1 mfsmount[3621]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:32:20 backup1 mfsmount[12917]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:32:23 backup1 mfsmount[3621]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:32:23 backup1 mfsmount[12917]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:32:25 backup1 mfschunkserver[2556]: connecting ... May 28 19:32:25 backup1 mfschunkserver[2556]: connected to Master May 28 19:32:26 backup1 mfschunkserver[2556]: testing chunk: /mnt/mfschunk1/17/chunk_0000000000008A17_00000001.mfs May 28 19:32:26 backup1 mfsmount[3621]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:32:26 backup1 mfsmount[12917]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:32:29 backup1 mfsmount[3621]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:32:29 backup1 mfsmount[12917]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:32:32 backup1 mfsmount[3621]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:32:32 backup1 mfsmount[12917]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:32:35 backup1 mfsmount[3621]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:32:35 backup1 mfsmount[12917]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:32:36 backup1 mfschunkserver[2556]: testing chunk: /mnt/mfschunk1/18/chunk_0000000000008A18_00000001.mfs May 28 19:32:38 backup1 mfsmount[3621]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:32:38 backup1 mfsmount[12917]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:32:41 backup1 mfsmount[3621]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:32:41 backup1 mfsmount[12917]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:32:44 backup1 mfsmount[3621]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:32:46 backup1 mfschunkserver[2556]: testing chunk: /mnt/mfschunk1/19/chunk_0000000000008A19_00000001.mfs May 28 19:32:47 backup1 mfsmount[3621]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:32:53 backup1 mfsmount[3621]: last message repeated 2 times May 28 19:32:53 backup1 mfsmount[12917]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:32:56 backup1 mfschunkserver[2556]: testing chunk: /mnt/mfschunk1/1A/chunk_0000000000008A1A_00000001.mfs May 28 19:32:59 backup1 mfsmount[3621]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:33:00 backup1 mfsmount[12917]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:33:05 backup1 mfsmount[3621]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:33:06 backup1 mfsmount[12917]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:33:06 backup1 mfschunkserver[2556]: testing chunk: /mnt/mfschunk1/1B/chunk_0000000000008A1B_00000001.mfs May 28 19:33:11 backup1 mfsmount[3621]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:33:12 backup1 mfsmount[12917]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:33:16 backup1 mfschunkserver[2556]: testing chunk: /mnt/mfschunk1/1C/chunk_0000000000008A1C_00000001.mfs May 28 19:33:17 backup1 mfsmount[3621]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:33:18 backup1 mfsmount[12917]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:33:18 backup1 mfsmaster[12869]: connection with client(ip:192.168.3.21) has been closed by peer May 28 19:33:18 backup1 mfsmaster[12869]: connection with client(ip:192.168.3.21) has been closed by peer May 28 19:33:18 backup1 mfsmaster[12869]: connection with CS(192.168.3.21) has been closed by peer May 28 19:33:18 backup1 mfsmaster[12869]: chunkserver disconnected - ip: 192.168.3.21, port: 9422, usedspace: 7726711480320 (7196.06 GiB), totalspace: 10944744390656 (10193.09 GiB) May 28 19:33:20 backup1 mfsmount[3621]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:33:23 backup1 mfsmaster[12869]: connection with ML(192.168.3.20) has been closed by peer May 28 19:33:23 backup1 mfsmaster[12869]: connection with ML(192.168.3.13) has been closed by peer May 28 19:33:23 backup1 mfsmaster[12869]: connection with client(ip:192.168.3.21) has been closed by peer May 28 19:33:23 backup1 mfsmaster[12869]: chunkserver register begin (packet version: 5) - ip: 192.168.3.21, port: 9422 May 28 19:33:23 backup1 mfsmaster[12869]: connection with ML(192.168.3.13) has been closed by peer May 28 19:33:23 backup1 mfsmaster[12869]: connection with client(ip:192.168.3.21) has been closed by peer May 28 19:33:23 backup1 mfsmaster[12869]: chunk-server already connected !!! May 28 19:33:23 backup1 mfsmaster[12869]: connection with CS(192.168.3.21) has been closed by peer May 28 19:33:23 backup1 mfsmaster[12869]: connection with CS(192.168.3.21) has been closed by peer May 28 19:33:23 backup1 mfsmaster[12869]: chunkserver disconnected - ip: 192.168.3.21, port: 9422, usedspace: 0 (0.00 GiB), totalspace: 0 (0.00 GiB) May 28 19:33:23 backup1 mfsmaster[12869]: chunkserver disconnected - ip: 192.168.3.21, port: 9422, usedspace: 0 (0.00 GiB), totalspace: 0 (0.00 GiB) May 28 19:33:23 backup1 mfsmaster[12869]: connection with ML(192.168.3.20) has been closed by peer May 28 19:33:23 backup1 mfsmaster[12869]: connection with client(ip:192.168.3.21) has been closed by peer May 28 19:33:23 backup1 mfsmaster[12869]: connection with ML(192.168.3.13) has been closed by peer May 28 19:33:23 backup1 mfsmaster[12869]: connection with client(ip:192.168.3.21) has been closed by peer May 28 19:33:23 backup1 mfsmaster[12869]: connection with ML(192.168.3.20) has been closed by peer May 28 19:33:23 backup1 mfsmaster[12869]: connection with client(ip:192.168.3.21) has been closed by peer May 28 19:33:23 backup1 mfsmaster[12869]: last message repeated 98 times May 28 19:33:23 backup1 mfsmount[12917]: registered to master May 28 19:33:23 backup1 mfsmount[3621]: registered to master May 28 19:33:23 backup1 mfsmount[12917]: file: 122301, index: 0, chunk: 4817595, version: 1 - there are no valid copies May 28 19:33:23 backup1 mfsmount[12917]: file: 122301, index: 0 - can't connect to proper chunkserver (try counter: 1) May 28 19:33:24 backup1 mfsmaster[12869]: connection with client(ip:192.168.3.21) has been closed by peer May 28 19:33:25 backup1 mfschunkserver[2556]: connecting ... May 28 19:33:25 backup1 mfschunkserver[2556]: connected to Master May 28 19:33:25 backup1 mfsmaster[12869]: connection with client(ip:192.168.3.21) has been closed by peer May 28 19:33:25 backup1 mfsmaster[12869]: connection with client(ip:192.168.3.21) has been closed by peer May 28 19:33:26 backup1 mfschunkserver[2556]: testing chunk: /mnt/mfschunk1/1D/chunk_0000000000008A1D_00000001.mfs May 28 19:33:26 backup1 mfsmaster[12869]: chunkserver register begin (packet version: 5) - ip: 192.168.3.21, port: 9422 May 28 19:33:28 backup1 mfsmaster[12869]: chunkserver register end (packet version: 5) - ip: 192.168.3.21, port: 9422, usedspace: 7726711480320 (7196.06 GiB), totalspace: 10944744390656 (10193.09 GiB) May 28 19:33:28 backup1 mfsmaster[12869]: connection with client(ip:192.168.3.21) has been closed by peer May 28 19:33:28 backup1 mfsmaster[12869]: connection with client(ip:192.168.3.21) has been closed by peer May 28 19:33:36 backup1 mfschunkserver[2556]: testing chunk: /mnt/mfschunk1/1E/chunk_0000000000008A1E_00000001.mfs mfsmaster is not stopable again. After /etc/init.mfs-master stop I run it again but it's stucked, and master is again in state 'D'. tamas |
From: Michal B. <mic...@ge...> - 2011-05-30 07:22:56
|
Hi! It looks like a problem we already know about - the master can be stuck when a group of chunkservers disconnects what causes big data flow upon their reconnection which leads to network timeouts what again causes chunkservers disconnections... We have some ideas to improve this behaviour. Kind regards Michał Borychowski MooseFS Support Manager _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ Gemius S.A. ul. Wołoska 7, 02-672 Warszawa Budynek MARS, klatka D Tel.: +4822 874-41-00 Fax : +4822 874-41-01 -----Original Message----- From: Papp Tamas [mailto:to...@ma...] Sent: Saturday, May 28, 2011 3:13 PM To: moo...@li... Subject: Re: [Moosefs-users] timeout On 05/27/2011 10:19 AM, Papp Tamas wrote: > hi! > > Sometimes there is an error on our mini cluster. > Still Ubuntu Natty with recompiled moosefs from ppa. > > May 27 08:15:00 backup1 mfsmaster[12869]: server 1 (ip: 192.168.3.21, > port: 9422): usedspace: 7739308400640 (7207.79 GiB), totalspace: > 10944744390656 (10193.09 GiB), usage: 70.71% > May 27 08:15:00 backup1 mfsmaster[12869]: total: usedspace: > 7739308400640 (7207.79 GiB), totalspace: 10944744390656 (10193.09 GiB), > usage: 70.71% > May 27 08:15:02 backup1 mfschunkserver[2556]: testing chunk: > /mnt/mfschunk1/B1/chunk_0000000000005EB1_00000001.mfs > May 27 08:15:12 backup1 mfschunkserver[2556]: testing chunk: > /mnt/mfschunk1/B2/chunk_0000000000005EB2_00000001.mfs > May 27 08:15:23 backup1 mfschunkserver[2556]: testing chunk: > /mnt/mfschunk1/B3/chunk_0000000000005EB3_00000001.mfs > May 27 08:15:28 backup1 mfsmount[12917]: master: tcp recv error: > ETIMEDOUT (Operation timed out) (1) > May 27 08:15:29 backup1 mfsmount[12917]: master: register error (read > header: ETIMEDOUT (Operation timed out)) > May 27 08:15:32 backup1 mfsmount[12917]: master: register error (read > header: ETIMEDOUT (Operation timed out)) > May 27 08:15:33 backup1 mfschunkserver[2556]: testing chunk: > /mnt/mfschunk1/B4/chunk_0000000000005EB4_00000001.mfs > May 27 08:15:35 backup1 mfsmount[12917]: master: register error (read > header: ETIMEDOUT (Operation timed out)) > May 27 08:15:44 backup1 mfsmount[12917]: last message repeated 3 times > May 27 08:15:44 backup1 mfschunkserver[2556]: testing chunk: > /mnt/mfschunk1/B5/chunk_0000000000005EB5_00000001.mfs > May 27 08:15:47 backup1 mfsmount[12917]: master: register error (read > header: ETIMEDOUT (Operation timed out)) > May 27 08:15:55 backup1 mfsmount[12917]: last message repeated 2 times > May 27 08:15:55 backup1 mfschunkserver[2556]: testing chunk: > /mnt/mfschunk1/B6/chunk_0000000000005EB6_00000001.mfs > May 27 08:15:56 backup1 mfsmount[12917]: master: register error (read > header: ETIMEDOUT (Operation timed out)) > May 27 08:15:59 backup1 mfsmount[12917]: master: register error (read > header: ETIMEDOUT (Operation timed out)) > May 27 08:16:00 backup1 mfschunkserver[2556]: connecting ... > May 27 08:16:00 backup1 mfschunkserver[2556]: connected to Master > May 27 08:16:02 backup1 mfsmount[12917]: master: register error (read > header: ETIMEDOUT (Operation timed out)) > May 27 08:17:05 backup1 mfsmount[12917]: last message repeated 21 times > May 27 08:17:15 backup1 mfsmount[12917]: last message repeated 3 times > May 27 08:17:15 backup1 mfsmaster[12869]: connection with > client(ip:192.168.3.21) has been closed by peer > May 27 08:17:15 backup1 mfsmaster[12869]: connection with > CS(192.168.3.21) has been closed by peer > May 27 08:17:15 backup1 mfsmaster[12869]: chunkserver disconnected - ip: > 192.168.3.21, port: 9422, usedspace: 7739308400640 (7207.79 GiB), > totalspace: 10944744390656 (10193.09 GiB) > May 27 08:17:16 backup1 mfsmaster[12869]: connection with > ML(192.168.3.13) has been closed by peer > May 27 08:17:16 backup1 mfsmaster[12869]: connection with > client(ip:192.168.3.21) has been closed by peer > May 27 08:17:17 backup1 mfsmaster[12869]: last message repeated 7 times > May 27 08:17:17 backup1 mfsmount[12917]: master: register error (read > header: ETIMEDOUT (Operation timed out)) > May 27 08:17:17 backup1 mfsmaster[12869]: connection with > client(ip:192.168.3.21) has been closed by peer > May 27 08:17:19 backup1 mfsmaster[12869]: last message repeated 28 times > May 27 08:17:19 backup1 mfsmount[12917]: registered to master > May 27 08:18:00 backup1 mfsmaster[12869]: chunkservers status: > May 27 08:18:00 backup1 mfsmaster[12869]: total: usedspace: 0 (0.00 > GiB), totalspace: 0 (0.00 GiB), usage: 0.00% > May 27 08:18:16 backup1 mfsmaster[12869]: chunkserver disconnected - ip: > 192.168.3.21, port: 0, usedspace: 0 (0.00 GiB), totalspace: 0 (0.00 GiB) > May 27 08:19:00 backup1 mfsmaster[12869]: chunkservers status: > May 27 08:19:00 backup1 mfsmaster[12869]: total: usedspace: 0 (0.00 > GiB), totalspace: 0 (0.00 GiB), usage: 0.00% > May 27 08:19:14 backup1 mfsmount[12917]: file: 7340122, index: 0 - > fs_writechunk returns status 8 > May 27 08:20:00 backup1 mfsmount[12917]: last message repeated 17 times > May 27 08:20:00 backup1 mfsmaster[12869]: chunkservers status: > May 27 08:20:00 backup1 mfsmaster[12869]: total: usedspace: 0 (0.00 > GiB), totalspace: 0 (0.00 GiB), usage: 0.00% > May 27 08:20:05 backup1 mfsmount[12917]: file: 7340122, index: 0 - > fs_writechunk returns status 8 > May 27 08:21:00 backup1 mfsmount[12917]: last message repeated 7 times > May 27 08:21:00 backup1 mfsmaster[12869]: chunkservers status: > May 27 08:21:00 backup1 mfsmaster[12869]: total: usedspace: 0 (0.00 > GiB), totalspace: 0 (0.00 GiB), usage: 0.00% > May 27 08:21:02 backup1 mfsmount[12917]: file: 7340122, index: 0 - > fs_writechunk returns status 8 > May 27 08:21:29 backup1 mfsmount[12917]: last message repeated 3 times > May 27 08:21:29 backup1 mfsmount[12917]: error writing file number > 7340122: EIO (Input/output error) > May 27 08:21:30 backup1 mfsmount[12917]: file: 7340122, index: 0, chunk: > 4816320, version: 1 - there are no valid copies > May 27 08:21:30 backup1 mfsmount[12917]: file: 7340122, index: 0 - can't > connect to proper chunkserver (try counter: 1) > May 27 08:22:00 backup1 mfsmaster[12869]: chunkservers status: > May 27 08:22:00 backup1 mfsmaster[12869]: total: usedspace: 0 (0.00 > GiB), totalspace: 0 (0.00 GiB), usage: 0.00% > May 27 08:22:30 backup1 mfsmount[12917]: file: 7340122, index: 0, chunk: > 4816320, version: 1 - there are no valid copies > May 27 08:22:30 backup1 mfsmount[12917]: file: 7340122, index: 0 - can't > connect to proper chunkserver (try counter: 8) > May 27 08:23:00 backup1 mfsmaster[12869]: chunkservers status: > May 27 08:23:00 backup1 mfsmaster[12869]: total: usedspace: 0 (0.00 > GiB), totalspace: 0 (0.00 GiB), usage: 0.00% > May 27 08:23:30 backup1 mfsmount[12917]: file: 7340122, index: 0, chunk: > 4816320, version: 1 - there are no valid copies > May 27 08:23:30 backup1 mfsmount[12917]: file: 7340122, index: 0 - can't > connect to proper chunkserver (try counter: 15) > May 27 08:24:00 backup1 mfsmaster[12869]: chunkservers status: > May 27 08:24:00 backup1 mfsmaster[12869]: total: usedspace: 0 (0.00 > GiB), totalspace: 0 (0.00 GiB), usage: 0.00% > May 27 08:24:30 backup1 mfsmount[12917]: file: 7340122, index: 0, chunk: > 4816320, version: 1 - there are no valid copies > May 27 08:24:30 backup1 mfsmount[12917]: file: 7340122, index: 0 - can't > connect to proper chunkserver (try counter: 22) > May 27 08:25:00 backup1 mfsmaster[12869]: chunkservers status: > May 27 08:25:00 backup1 mfsmaster[12869]: total: usedspace: 0 (0.00 > GiB), totalspace: 0 (0.00 GiB), usage: 0.00% > May 27 08:25:30 backup1 mfsmount[12917]: file: 7340122, index: 0, chunk: > 4816320, version: 1 - there are no valid copies > May 27 08:25:30 backup1 mfsmount[12917]: file: 7340122, index: 0 - can't > connect to proper chunkserver (try counter: 29) > May 27 08:26:00 backup1 mfsmaster[12869]: chunkservers status: > May 27 08:26:00 backup1 mfsmaster[12869]: total: usedspace: 0 (0.00 > GiB), totalspace: 0 (0.00 GiB), usage: 0.00% > May 27 08:26:10 backup1 mfschunkserver[2556]: testing chunk: > /mnt/mfschunk1/B7/chunk_0000000000005EB7_00000001.mfs > May 27 08:26:10 backup1 mfschunkserver[2556]: connection reset by Master > May 27 08:26:20 backup1 mfschunkserver[2556]: connecting ... > May 27 08:26:20 backup1 mfschunkserver[2556]: connected to Master > May 27 08:26:21 backup1 mfschunkserver[2556]: testing chunk: > /mnt/mfschunk1/B8/chunk_0000000000005EB8_00000001.mfs > May 27 08:26:21 backup1 mfsmaster[12869]: chunkserver register begin > (packet version: 5) - ip: 192.168.3.21, port: 9422 > May 27 08:26:28 backup1 mfsmaster[12869]: chunkserver register end > (packet version: 5) - ip: 192.168.3.21, port: 9422, usedspace: > 7739308400640 (7207.79 GiB), totalspace: 10944744390656 (10193.09 GiB) > May 27 08:26:31 backup1 mfschunkserver[2556]: testing chunk: > /mnt/mfschunk1/B9/chunk_0000000000005EB9_00000001.mfs > May 27 08:26:41 backup1 mfschunkserver[2556]: testing chunk: > /mnt/mfschunk1/BA/chunk_0000000000005EBA_00000001.mfs > May 27 08:26:42 backup1 mfsmount[12917]: file: 7445943, index: 0, chunk: > 4816323, version: 1 - writeworker: connection with (C0A80315:9422) was > timed out (unfinished writes: 2; try counter: 1) > May 27 08:26:51 backup1 mfsmount[12917]: last message repeated 2 times > May 27 08:26:51 backup1 mfschunkserver[2556]: testing chunk: > /mnt/mfschunk1/BB/chunk_0000000000005EBB_00000001.mfs > May 27 08:27:00 backup1 mfsmaster[12869]: chunkservers status: > May 27 08:27:00 backup1 mfsmaster[12869]: server 1 (ip: 192.168.3.21, > port: 9422): usedspace: 7739309965312 (7207.79 GiB), totalspace: > 10944744390656 (10193.09 GiB), usage: 70.71% > May 27 08:27:00 backup1 mfsmaster[12869]: total: usedspace: > 7739309965312 (7207.79 GiB), totalspace: 10944744390656 (10193.09 GiB), > usage: 70.71% > May 27 08:27:01 backup1 mfschunkserver[2556]: testing chunk: > /mnt/mfschunk1/BC/chunk_0000000000005EBC_00000001.mfs > May 27 08:27:11 backup1 mfschunkserver[2556]: testing chunk: > /mnt/mfschunk1/BD/chunk_0000000000005EBD_00000001.mfs > May 27 08:27:21 backup1 mfschunkserver[2556]: testing chunk: > /mnt/mfschunk1/BE/chunk_0000000000005EBE_00000001.mfs > May 27 08:27:31 backup1 mfschunkserver[2556]: testing chunk: > /mnt/mfschunk1/BF/chunk_0000000000005EBF_00000001.mfs > May 27 08:27:41 backup1 mfschunkserver[2556]: testing chunk: > /mnt/mfschunk1/C0/chunk_0000000000005EC0_00000001.mfs > May 27 08:27:51 backup1 mfschunkserver[2556]: testing chunk: > /mnt/mfschunk1/C1/chunk_0000000000005EC1_00000001.mfs > May 27 08:28:00 backup1 mfsmaster[12869]: chunkservers status: > May 27 08:28:00 backup1 mfsmaster[12869]: server 1 (ip: 192.168.3.21, > port: 9422): usedspace: 7739310948352 (7207.79 GiB), totalspace: > 10944744390656 (10193.09 GiB), usage: 70.71% > May 27 08:28:00 backup1 mfsmaster[12869]: total: usedspace: > 7739310948352 (7207.79 GiB), totalspace: 10944744390656 (10193.09 GiB), > usage: 70.71% > May 27 08:28:01 backup1 mfschunkserver[2556]: testing chunk: > /mnt/mfschunk1/C3/chunk_0000000000005EC3_00000001.mfs > May 27 08:28:11 backup1 mfschunkserver[2556]: testing chunk: > /mnt/mfschunk1/C4/chunk_0000000000005EC4_00000001.mfs > May 27 08:28:22 backup1 mfschunkserver[2556]: testing chunk: > /mnt/mfschunk1/C5/chunk_0000000000005EC5_00000001.mfs > May 27 08:28:32 backup1 mfschunkserver[2556]: testing chunk: > /mnt/mfschunk1/C7/chunk_0000000000005EC7_00000001.mfs > May 27 08:28:42 backup1 mfschunkserver[2556]: testing chunk: > /mnt/mfschunk1/C8/chunk_0000000000005EC8_00000001.mfs > May 27 08:28:52 backup1 mfschunkserver[2556]: testing chunk: > /mnt/mfschunk1/C9/chunk_0000000000005EC9_00000001.mfs > May 27 08:29:00 backup1 mfsmaster[12869]: server 1 (ip: 192.168.3.21, > port: 9422): usedspace: 7739313045504 (7207.80 GiB), totalspace: > 10944744390656 (10193.09 GiB), usage: 70.71% > May 27 08:29:00 backup1 mfsmaster[12869]: total: usedspace: > 7739313045504 (7207.80 GiB), totalspace: 10944744390656 (10193.09 GiB), > usage: 70.71% > May 27 08:29:02 backup1 mfschunkserver[2556]: testing chunk: > /mnt/mfschunk1/CA/chunk_0000000000005ECA_00000001.mfs > May 27 08:29:12 backup1 mfschunkserver[2556]: testing chunk: > /mnt/mfschunk1/CB/chunk_0000000000005ECB_00000001.mfs > May 27 08:29:22 backup1 mfschunkserver[2556]: testing chunk: > /mnt/mfschunk1/CC/chunk_0000000000005ECC_00000001.mfs > May 27 08:29:26 backup1 mfsmount[12917]: master: tcp recv error: > ETIMEDOUT (Operation timed out) (1) > May 27 08:29:27 backup1 mfsmount[12917]: master: register error (read > header: ETIMEDOUT (Operation timed out)) > May 27 08:29:30 backup1 mfsmount[12917]: master: register error (read > header: ETIMEDOUT (Operation timed out)) > May 27 08:29:32 backup1 mfschunkserver[2556]: testing chunk: > /mnt/mfschunk1/CD/chunk_0000000000005ECD_00000001.mfs > May 27 08:29:32 backup1 mfsmaster[12869]: connection with > client(ip:192.168.3.21) has been closed by peer > May 27 08:29:32 backup1 mfsmaster[12869]: last message repeated 2 times > May 27 08:29:32 backup1 mfsmount[12917]: registered to master > May 27 08:29:42 backup1 mfschunkserver[2556]: testing chunk: > /mnt/mfschunk1/CE/chunk_0000000000005ECE_00000001.mfs > May 27 08:29:52 backup1 mfschunkserver[2556]: testing chunk: > /mnt/mfschunk1/D0/chunk_0000000000005ED0_00000001.mfs > May 27 08:30:02 backup1 mfschunkserver[2556]: testing chunk: > /mnt/mfschunk1/D1/chunk_0000000000005ED1_00000001.mfs > May 27 08:30:05 backup1 mfsmount[12917]: file: 122301, index: 0, chunk: > 4816324, version: 1 - writeworker: connection with (C0A80315:9422) was > timed out (unfinished writes: 1; try counter: 1) > May 27 08:30:12 backup1 mfschunkserver[2556]: testing chunk: > /mnt/mfschunk1/D2/chunk_0000000000005ED2_00000001.mfs > May 27 08:30:22 backup1 mfschunkserver[2556]: testing chunk: > /mnt/mfschunk1/D3/chunk_0000000000005ED3_00000001.mfs > May 27 08:30:33 backup1 mfschunkserver[2556]: testing chunk: > /mnt/mfschunk1/D4/chunk_0000000000005ED4_00000001.mfs > May 27 08:30:43 backup1 mfschunkserver[2556]: testing chunk: > /mnt/mfschunk1/D5/chunk_0000000000005ED5_00000001.mfs > May 27 08:30:53 backup1 mfschunkserver[2556]: testing chunk: > /mnt/mfschunk1/D6/chunk_0000000000005ED6_00000001.mfs > May 27 08:31:03 backup1 mfschunkserver[2556]: testing chunk: > /mnt/mfschunk1/D7/chunk_0000000000005ED7_00000001.mfs > May 27 08:31:13 backup1 mfschunkserver[2556]: testing chunk: > /mnt/mfschunk1/D8/chunk_0000000000005ED8_00000001.mfs > May 27 08:31:23 backup1 mfschunkserver[2556]: testing chunk: > /mnt/mfschunk1/D9/chunk_0000000000005ED9_00000001.mfs > May 27 08:31:33 backup1 mfschunkserver[2556]: testing chunk: > /mnt/mfschunk1/DA/chunk_0000000000005EDA_00000001.mfs > May 27 08:31:43 backup1 mfsmount[12917]: master: tcp recv error: > ETIMEDOUT (Operation timed out) (1) > May 27 08:31:43 backup1 mfschunkserver[2556]: testing chunk: > /mnt/mfschunk1/DB/chunk_0000000000005EDB_00000001.mfs > May 27 08:31:44 backup1 mfsmount[12917]: master: register error (read > header: ETIMEDOUT (Operation timed out)) > May 27 08:31:53 backup1 mfsmount[12917]: last message repeated 3 times > May 27 08:31:53 backup1 mfschunkserver[2556]: testing chunk: > /mnt/mfschunk1/DD/chunk_0000000000005EDD_00000001.mfs > May 27 08:31:56 backup1 mfsmount[12917]: master: register error (read > header: ETIMEDOUT (Operation timed out)) > May 27 08:32:03 backup1 mfsmount[12917]: last message repeated 2 times > May 27 08:32:03 backup1 mfschunkserver[2556]: testing chunk: > /mnt/mfschunk1/DE/chunk_0000000000005EDE_00000001.mfs > May 27 08:32:05 backup1 mfsmount[12917]: master: register error (read > header: ETIMEDOUT (Operation timed out)) > May 27 08:32:14 backup1 mfsmount[12917]: last message repeated 2 times > May 27 08:32:14 backup1 mfschunkserver[2556]: testing chunk: > /mnt/mfschunk1/DF/chunk_0000000000005EDF_00000001.mfs > May 27 08:32:14 backup1 mfsmount[12917]: master: register error (read > header: ETIMEDOUT (Operation timed out)) > May 27 08:32:24 backup1 mfsmount[12917]: last message repeated 3 times > May 27 08:32:24 backup1 mfschunkserver[2556]: testing chunk: > /mnt/mfschunk1/E0/chunk_0000000000005EE0_00000001.mfs > May 27 08:32:25 backup1 mfschunkserver[2556]: connecting ... > May 27 08:32:25 backup1 mfschunkserver[2556]: connected to Master > May 27 08:32:26 backup1 mfsmount[12917]: master: register error (read > header: ETIMEDOUT (Operation timed out)) > May 27 08:33:29 backup1 mfsmount[12917]: last message repeated 21 times > May 27 08:34:25 backup1 mfsmount[12917]: last message repeated 18 times > May 27 08:34:25 backup1 mfsmaster[12869]: connection with > client(ip:192.168.3.21) has been closed by peer > May 27 08:34:25 backup1 mfsmaster[12869]: connection with > CS(192.168.3.21) has been closed by peer > May 27 08:34:25 backup1 mfsmaster[12869]: chunkserver disconnected - ip: > 192.168.3.21, port: 9422, usedspace: 7739311943680 (7207.80 GiB), > totalspace: 10944744390656 (10193.09 GiB) > May 27 08:34:26 backup1 mfsmount[12917]: master: register error (read > header: ETIMEDOUT (Operation timed out)) > May 27 08:34:33 backup1 mfsmount[12917]: last message repeated 2 times > May 27 08:34:33 backup1 mfschunkserver[2556]: testing chunk: > /mnt/mfschunk1/E1/chunk_0000000000005EE1_00000001.mfs > May 27 08:34:34 backup1 mfsmaster[12869]: connection with > ML(192.168.3.13) has been closed by peer > May 27 08:34:34 backup1 mfsmaster[12869]: connection with > client(ip:192.168.3.21) has been closed by peer > May 27 08:34:34 backup1 mfsmaster[12869]: chunkserver register begin > (packet version: 5) - ip: 192.168.3.21, port: 9422 > May 27 08:34:34 backup1 mfsmaster[12869]: connection with > ML(192.168.3.13) has been closed by peer > May 27 08:34:34 backup1 mfsmaster[12869]: connection with > client(ip:192.168.3.21) has been closed by peer > May 27 08:34:34 backup1 mfsmaster[12869]: connection with > CS(192.168.3.21) has been closed by peer > May 27 08:34:34 backup1 mfsmaster[12869]: chunkserver disconnected - ip: > 192.168.3.21, port: 9422, usedspace: 0 (0.00 GiB), totalspace: 0 (0.00 GiB) > May 27 08:34:35 backup1 mfsmaster[12869]: connection with > client(ip:192.168.3.21) has been closed by peer > May 27 08:34:35 backup1 mfsmaster[12869]: last message repeated 2 times > May 27 08:34:35 backup1 mfsmount[12917]: master: register error (read > header: ETIMEDOUT (Operation timed out)) > May 27 08:34:35 backup1 mfsmaster[12869]: connection with > client(ip:192.168.3.21) has been closed by peer > May 27 08:34:37 backup1 mfsmaster[12869]: last message repeated 52 times > May 27 08:34:37 backup1 mfsmount[12917]: registered to master > May 27 08:34:39 backup1 mfsmount[12917]: file: 7445943, index: 0 - > fs_writechunk returns status 8 > May 27 08:34:40 backup1 mfsmount[12917]: last message repeated 2 times > May 27 08:34:40 backup1 mfschunkserver[2556]: connecting ... > May 27 08:34:40 backup1 mfschunkserver[2556]: connected to Master > May 27 08:34:40 backup1 mfsmount[12917]: file: 7445943, index: 0 - > fs_writechunk returns status 8 > May 27 08:34:41 backup1 mfsmount[12917]: file: 7445943, index: 0 - > fs_writechunk returns status 8 > May 27 08:34:41 backup1 mfsmaster[12869]: chunkserver register begin > (packet version: 5) - ip: 192.168.3.21, port: 9422 > May 27 08:34:42 backup1 mfsmaster[12869]: chunkserver register end > (packet version: 5) - ip: 192.168.3.21, port: 9422, usedspace: > 7739311943680 (7207.80 GiB), totalspace: 10944744390656 (10193.09 GiB) > May 27 08:34:43 backup1 mfschunkserver[2556]: testing chunk: > /mnt/mfschunk1/E2/chunk_0000000000005EE2_00000001.mfs > May 27 08:34:53 backup1 mfschunkserver[2556]: testing chunk: > /mnt/mfschunk1/E3/chunk_0000000000005EE3_00000001.mfs > May 27 08:35:00 backup1 mfsmaster[12869]: chunkservers status: > May 27 08:35:00 backup1 mfsmaster[12869]: server 1 (ip: 192.168.3.21, > port: 9422): usedspace: 7739314479104 (7207.80 GiB), totalspace: > 10944744390656 (10193.09 GiB), usage: 70.71% > May 27 08:35:00 backup1 mfsmaster[12869]: total: usedspace: > 7739314479104 (7207.80 GiB), totalspace: 10944744390656 (10193.09 GiB), > usage: 70.71% > May 27 08:35:03 backup1 mfschunkserver[2556]: testing chunk: > /mnt/mfschunk1/E4/chunk_0000000000005EE4_00000001.mfs > May 27 08:35:14 backup1 mfschunkserver[2556]: testing chunk: > /mnt/mfschunk1/E5/chunk_0000000000005EE5_00000001.mfs > May 27 08:35:24 backup1 mfschunkserver[2556]: testing chunk: > /mnt/mfschunk1/E6/chunk_0000000000005EE6_00000001.mfs > May 27 08:35:34 backup1 mfschunkserver[2556]: testing chunk: > /mnt/mfschunk1/E7/chunk_0000000000005EE7_00000001.mfs > May 27 08:35:44 backup1 mfschunkserver[2556]: testing chunk: > /mnt/mfschunk1/E8/chunk_0000000000005EE8_00000001.mfs > May 27 08:35:54 backup1 mfschunkserver[2556]: testing chunk: > /mnt/mfschunk1/E9/chunk_0000000000005EE9_00000001.mfs > May 27 08:36:00 backup1 mfsmaster[12869]: chunkservers status: > May 27 08:36:00 backup1 mfsmaster[12869]: server 1 (ip: 192.168.3.21, > port: 9422): usedspace: 7739314479104 (7207.80 GiB), totalspace: > 10944744390656 (10193.09 GiB), usage: 70.71% > May 27 08:36:00 backup1 mfsmaster[12869]: total: usedspace: > 7739314479104 (7207.80 GiB), totalspace: 10944744390656 (10193.09 GiB), > usage: 70.71% > May 27 08:36:04 backup1 mfschunkserver[2556]: testing chunk: > /mnt/mfschunk1/EA/chunk_0000000000005EEA_00000001.mfs > May 27 08:36:14 backup1 mfschunkserver[2556]: testing chunk: > /mnt/mfschunk1/EB/chunk_0000000000005EEB_00000001.mfs > May 27 08:36:24 backup1 mfschunkserver[2556]: testing chunk: > /mnt/mfschunk1/EC/chunk_0000000000005EEC_00000001.mfs > May 27 08:36:34 backup1 mfschunkserver[2556]: testing chunk: > /mnt/mfschunk1/ED/chunk_0000000000005EED_00000001.mfs > May 27 08:36:45 backup1 mfschunkserver[2556]: testing chunk: > /mnt/mfschunk1/EE/chunk_0000000000005EEE_00000001.mfs > May 27 08:36:55 backup1 mfschunkserver[2556]: testing chunk: > /mnt/mfschunk1/EF/chunk_0000000000005EEF_00000001.mfs > May 27 08:37:00 backup1 mfsmaster[12869]: chunkservers status: > May 27 08:37:00 backup1 mfsmaster[12869]: server 1 (ip: 192.168.3.21, > port: 9422): usedspace: 7739314479104 (7207.80 GiB), totalspace: > 10944744390656 (10193.09 GiB), usage: 70.71% > May 27 08:37:00 backup1 mfsmaster[12869]: total: usedspace: > 7739314479104 (7207.80 GiB), totalspace: 10944744390656 (10193.09 GiB), > usage: 70.71% > > > Can you help me what to do with it? > Is this fuse, moosefs, kernel problem or something else? hi! I've just realized an other error. $ dirvish --vault cluster/Projects dirvish cluster/Projects:default fatal error: filesystem full dirvish cluster/Projects:default fatal error (12) -- filesystem full cluster/Projects:default post-server failed (1) log: May 28 14:58:04 backup1 mfschunkserver[2556]: testing chunk: /mnt/mfschunk1/70/chunk_0000000000008470_00000001.mfs May 28 14:58:14 backup1 mfschunkserver[2556]: testing chunk: /mnt/mfschunk1/71/chunk_0000000000008471_00000001.mfs May 28 14:58:24 backup1 mfschunkserver[2556]: testing chunk: /mnt/mfschunk1/72/chunk_0000000000008472_00000001.mfs May 28 14:58:34 backup1 mfschunkserver[2556]: testing chunk: /mnt/mfschunk1/73/chunk_0000000000008473_00000001.mfs May 28 14:58:44 backup1 mfschunkserver[2556]: testing chunk: /mnt/mfschunk1/74/chunk_0000000000008474_00000001.mfs May 28 14:58:55 backup1 mfschunkserver[2556]: testing chunk: /mnt/mfschunk1/75/chunk_0000000000008475_00000001.mfs May 28 14:59:00 backup1 mfsmaster[12869]: chunkservers status: May 28 14:59:00 backup1 mfsmaster[12869]: server 1 (ip: 192.168.3.21, port: 9422): usedspace: 7734218166272 (7203.05 GiB), totalspace: 10944744390656 (10193.09 GiB), usage: 70.67% May 28 14:59:00 backup1 mfsmaster[12869]: total: usedspace: 7734218166272 (7203.05 GiB), totalspace: 10944744390656 (10193.09 GiB), usage: 70.67% May 28 14:59:05 backup1 mfschunkserver[2556]: testing chunk: /mnt/mfschunk1/76/chunk_0000000000008476_00000001.mfs May 28 14:59:15 backup1 mfschunkserver[2556]: testing chunk: /mnt/mfschunk1/77/chunk_0000000000008477_00000001.mfs May 28 14:59:24 backup1 mfsmount[12917]: master: tcp recv error: ETIMEDOUT (Operation timed out) (1) May 28 14:59:25 backup1 mfsmount[12917]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 14:59:25 backup1 mfschunkserver[2556]: testing chunk: /mnt/mfschunk1/78/chunk_0000000000008478_00000001.mfs May 28 14:59:28 backup1 mfsmount[12917]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 14:59:35 backup1 mfsmount[12917]: last message repeated 2 times May 28 14:59:35 backup1 mfschunkserver[2556]: testing chunk: /mnt/mfschunk1/79/chunk_0000000000008479_00000001.mfs May 28 14:59:37 backup1 mfsmount[12917]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 14:59:45 backup1 mfsmount[12917]: last message repeated 2 times May 28 14:59:45 backup1 mfschunkserver[2556]: testing chunk: /mnt/mfschunk1/7A/chunk_000000000000847A_00000001.mfs May 28 14:59:46 backup1 mfsmount[12917]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 14:59:55 backup1 mfsmount[12917]: last message repeated 3 times May 28 14:59:55 backup1 mfschunkserver[2556]: testing chunk: /mnt/mfschunk1/7B/chunk_000000000000847B_00000001.mfs May 28 14:59:58 backup1 mfsmount[12917]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 15:00:06 backup1 mfsmount[12917]: last message repeated 2 times May 28 15:00:06 backup1 mfschunkserver[2556]: testing chunk: /mnt/mfschunk1/7C/chunk_000000000000847C_00000001.mfs May 28 15:00:07 backup1 mfsmount[12917]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 15:00:15 backup1 mfsmount[12917]: last message repeated 2 times May 28 15:00:15 backup1 mfschunkserver[2556]: connecting ... May 28 15:00:15 backup1 mfschunkserver[2556]: connected to Master May 28 15:00:16 backup1 mfsmount[12917]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 15:01:19 backup1 mfsmount[12917]: last message repeated 21 times May 28 15:02:19 backup1 mfsmount[12917]: last message repeated 19 times May 28 15:02:19 backup1 mfsmaster[12869]: connection with client(ip:192.168.3.21) has been closed by peer May 28 15:02:19 backup1 mfsmaster[12869]: connection with CS(192.168.3.21) has been closed by peer May 28 15:02:19 backup1 mfsmaster[12869]: chunkserver disconnected - ip: 192.168.3.21, port: 9422, usedspace: 7734399709184 (7203.22 GiB), totalspace: 10944744390656 (10193.09 GiB) May 28 15:02:19 backup1 mfsmount[12917]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 15:02:22 backup1 mfsmount[12917]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 15:02:23 backup1 mfsmaster[12869]: connection with ML(192.168.3.20) has been closed by peer May 28 15:02:23 backup1 mfsmaster[12869]: connection with ML(192.168.3.13) has been closed by peer May 28 15:02:23 backup1 mfsmaster[12869]: connection with client(ip:192.168.3.21) has been closed by peer May 28 15:02:23 backup1 mfsmaster[12869]: connection with ML(192.168.3.13) has been closed by peer May 28 15:02:23 backup1 mfsmaster[12869]: connection with client(ip:192.168.3.21) has been closed by peer May 28 15:02:23 backup1 mfsmaster[12869]: connection with ML(192.168.3.20) has been closed by peer May 28 15:02:23 backup1 mfsmaster[12869]: connection with client(ip:192.168.3.21) has been closed by peer May 28 15:02:23 backup1 mfsmaster[12869]: connection with ML(192.168.3.13) has been closed by peer May 28 15:02:23 backup1 mfsmaster[12869]: connection with client(ip:192.168.3.21) has been closed by peer May 28 15:02:24 backup1 mfsmaster[12869]: last message repeated 55 times May 28 15:02:24 backup1 mfsmount[12917]: registered to master May 28 15:02:25 backup1 mfsmount[12917]: file: 122301, index: 0, chunk: 4816943, version: 1 - there are no valid copies May 28 15:02:25 backup1 mfsmount[12917]: file: 122301, index: 0 - can't connect to proper chunkserver (try counter: 1) May 28 15:02:26 backup1 mfsmount[12917]: file: 4691391, index: 0 - fs_writechunk returns status 12 May 28 15:02:26 backup1 mfsmount[12917]: last message repeated 2 times May 28 15:02:26 backup1 mfsmount[12917]: file: 107222, index: 0 - fs_writechunk returns status 8 May 28 15:02:27 backup1 mfsmount[12917]: file: 4691391, index: 0 - fs_writechunk returns status 12 May 28 15:02:32 backup1 mfsmount[12917]: last message repeated 3 times May 28 15:02:32 backup1 mfsmount[12917]: file: 107222, index: 0 - fs_writechunk returns status 8 May 28 15:02:33 backup1 mfsmount[12917]: file: 4691391, index: 0 - fs_writechunk returns status 12 May 28 15:02:38 backup1 mfsmount[12917]: last message repeated 2 times May 28 15:02:38 backup1 mfsmount[12917]: file: 107222, index: 0 - fs_writechunk returns status 8 May 28 15:02:41 backup1 mfsmount[12917]: file: 4691391, index: 0 - fs_writechunk returns status 12 May 28 15:02:44 backup1 mfsmount[12917]: file: 4691391, index: 0 - fs_writechunk returns status 12 May 28 15:02:44 backup1 mfsmount[12917]: file: 107222, index: 0 - fs_writechunk returns status 8 May 28 15:02:48 backup1 mfsmount[12917]: file: 4691391, index: 0 - fs_writechunk returns status 12 May 28 15:02:51 backup1 mfsmount[12917]: file: 107222, index: 0 - fs_writechunk returns status 8 May 28 15:02:52 backup1 mfsmount[12917]: file: 4691391, index: 0 - fs_writechunk returns status 12 May 28 15:02:56 backup1 mfsmount[12917]: file: 4691391, index: 0 - fs_writechunk returns status 12 May 28 15:02:58 backup1 mfsmount[12917]: file: 107222, index: 0 - fs_writechunk returns status 8 May 28 15:03:01 backup1 mfsmount[12917]: file: 4691391, index: 0 - fs_writechunk returns status 12 May 28 15:03:05 backup1 mfsmount[12917]: file: 107222, index: 0 - fs_writechunk returns status 8 May 28 15:03:06 backup1 mfsmount[12917]: file: 4691391, index: 0 - fs_writechunk returns status 12 May 28 15:03:11 backup1 mfsmount[12917]: file: 4691391, index: 0 - fs_writechunk returns status 12 May 28 15:03:13 backup1 mfsmount[12917]: file: 107222, index: 0 - fs_writechunk returns status 8 May 28 15:03:17 backup1 mfsmount[12917]: file: 4691391, index: 0 - fs_writechunk returns status 12 May 28 15:03:20 backup1 mfsmaster[12869]: chunkserver disconnected - ip: 192.168.3.21, port: 0, usedspace: 0 (0.00 GiB), totalspace: 0 (0.00 GiB) May 28 15:03:22 backup1 mfsmount[12917]: file: 107222, index: 0 - fs_writechunk returns status 8 May 28 15:03:23 backup1 mfsmount[12917]: file: 4691391, index: 0 - fs_writechunk returns status 12 May 28 15:03:25 backup1 mfsmount[12917]: file: 122301, index: 0, chunk: 4816943, version: 1 - there are no valid copies May 28 15:03:25 backup1 mfsmount[12917]: file: 122301, index: 0 - can't connect to proper chunkserver (try counter: 8) May 28 15:03:29 backup1 mfsmount[12917]: file: 4691391, index: 0 - fs_writechunk returns status 12 May 28 15:03:30 backup1 mfsmount[12917]: file: 107222, index: 0 - fs_writechunk returns status 8 May 28 15:03:36 backup1 mfsmount[12917]: file: 4691391, index: 0 - fs_writechunk returns status 12 May 28 15:03:39 backup1 mfsmount[12917]: file: 107222, index: 0 - fs_writechunk returns status 8 May 28 15:03:43 backup1 mfsmount[12917]: file: 4691391, index: 0 - fs_writechunk returns status 12 May 28 15:03:48 backup1 mfsmount[12917]: file: 107222, index: 0 - fs_writechunk returns status 8 May 28 15:03:50 backup1 mfsmount[12917]: file: 4691391, index: 0 - fs_writechunk returns status 12 May 28 15:03:57 backup1 mfsmount[12917]: file: 107222, index: 0 - fs_writechunk returns status 8 May 28 15:03:57 backup1 mfsmount[12917]: error writing file number 107222: EIO (Input/output error) May 28 15:03:58 backup1 mfsmount[12917]: file: 4691391, index: 0 - fs_writechunk returns status 12 May 28 15:04:00 backup1 mfsmaster[12869]: chunkservers status: May 28 15:04:00 backup1 mfsmaster[12869]: total: usedspace: 0 (0.00 GiB), totalspace: 0 (0.00 GiB), usage: 0.00% May 28 15:04:06 backup1 mfsmount[12917]: file: 4691391, index: 0 - fs_writechunk returns status 12 May 28 15:04:25 backup1 mfsmount[12917]: last message repeated 2 times May 28 15:04:25 backup1 mfsmount[12917]: file: 122301, index: 0, chunk: 4816943, version: 1 - there are no valid copies May 28 15:04:25 backup1 mfsmount[12917]: file: 122301, index: 0 - can't connect to proper chunkserver (try counter: 15) May 28 15:04:32 backup1 mfsmount[12917]: file: 4691391, index: 0 - fs_writechunk returns status 12 May 28 15:04:41 backup1 mfsmount[12917]: file: 4691391, index: 0 - fs_writechunk returns status 12 May 28 15:04:41 backup1 mfsmount[12917]: error writing file number 4691391: ENOSPC (No space left on device) May 28 15:04:42 backup1 mfsmount[12917]: file: 107221, index: 0 - fs_writechunk returns status 8 May 28 15:05:00 backup1 mfsmount[12917]: last message repeated 10 times May 28 15:05:00 backup1 mfsmaster[12869]: chunkservers status: May 28 15:05:00 backup1 mfsmaster[12869]: total: usedspace: 0 (0.00 GiB), totalspace: 0 (0.00 GiB), usage: 0.00% May 28 15:05:00 backup1 mfsmount[12917]: file: 107221, index: 0 - fs_writechunk returns status 8 May 28 15:05:25 backup1 mfsmount[12917]: last message repeated 5 times May 28 15:05:25 backup1 mfsmount[12917]: file: 122301, index: 0, chunk: 4816943, version: 1 - there are no valid copies May 28 15:05:25 backup1 mfsmount[12917]: file: 122301, index: 0 - can't connect to proper chunkserver (try counter: 22) May 28 15:05:27 backup1 mfsmount[12917]: file: 107221, index: 0 - fs_writechunk returns status 8 May 28 15:06:00 backup1 mfsmount[12917]: last message repeated 4 times May 28 15:06:00 backup1 mfsmaster[12869]: chunkservers status: May 28 15:06:00 backup1 mfsmaster[12869]: total: usedspace: 0 (0.00 GiB), totalspace: 0 (0.00 GiB), usage: 0.00% May 28 15:06:01 backup1 mfsmount[12917]: file: 107221, index: 0 - fs_writechunk returns status 8 May 28 15:06:25 backup1 mfsmount[12917]: last message repeated 3 times May 28 15:06:25 backup1 mfsmount[12917]: file: 122301, index: 0, chunk: 4816943, version: 1 - there are no valid copies May 28 15:06:25 backup1 mfsmount[12917]: file: 122301, index: 0 - can't connect to proper chunkserver (try counter: 29) May 28 15:06:25 backup1 mfschunkserver[2556]: testing chunk: /mnt/mfschunk1/7D/chunk_000000000000847D_00000001.mfs May 28 15:06:25 backup1 mfschunkserver[2556]: connection reset by Master May 28 15:06:32 backup1 mfsmount[12917]: file: 107221, index: 0 - fs_writechunk returns status 8 May 28 15:06:35 backup1 mfschunkserver[2556]: connecting ... May 28 15:06:35 backup1 mfschunkserver[2556]: connected to Master May 28 15:06:36 backup1 mfschunkserver[2556]: testing chunk: /mnt/mfschunk1/7E/chunk_000000000000847E_00000001.mfs May 28 15:06:36 backup1 mfsmaster[12869]: chunkserver register begin (packet version: 5) - ip: 192.168.3.21, port: 9422 May 28 15:06:38 backup1 mfsmaster[12869]: chunkserver register end (packet version: 5) - ip: 192.168.3.21, port: 9422, usedspace: 7734393311232 (7203.22 GiB), totalspace: 10944744390656 (10193.09 GiB) May 28 15:06:46 backup1 mfschunkserver[2556]: testing chunk: /mnt/mfschunk1/7F/chunk_000000000000847F_00000001.mfs May 28 15:06:56 backup1 mfschunkserver[2556]: testing chunk: /mnt/mfschunk1/80/chunk_0000000000008480_00000001.mfs May 28 15:07:00 backup1 mfsmaster[12869]: chunkservers status: May 28 15:07:00 backup1 mfsmaster[12869]: server 1 (ip: 192.168.3.21, port: 9422): usedspace: 7734393442304 (7203.22 GiB), totalspace: 10944744390656 (10193.09 GiB), usage: 70.67% May 28 15:07:00 backup1 mfsmaster[12869]: total: usedspace: 7734393442304 (7203.22 GiB), totalspace: 10944744390656 (10193.09 GiB), usage: 70.67% Filesystem Size Used Avail Use% Mounted on /dev/sda2 19G 1.4G 17G 8% / none 3.9G 180K 3.9G 1% /dev none 4.0G 0 4.0G 0% /dev/shm none 4.0G 716K 4.0G 1% /var/run none 4.0G 0 4.0G 0% /var/lock /dev/sda6 10T 7.1T 3.0T 71% /mnt/mfschunk1 /dev/sda3 19G 12G 6.0G 66% /var /dev/sda4 4.6G 138M 4.3G 4% /tmp mfsmaster:9421 10T 7.1T 3.0T 71% /data/backup Filesystem Inodes IUsed IFree IUse% Mounted on /dev/sda2 1222992 64010 1158982 6% / none 1021194 762 1020432 1% /dev none 1023079 1 1023078 1% /dev/shm none 1023079 58 1023021 1% /var/run none 1023079 1 1023078 1% /var/lock /dev/sda6 2138062720 4778154 2133284566 1% /mnt/mfschunk1 /dev/sda3 1222992 3174 1219818 1% /var /dev/sda4 305216 13 305203 1% /tmp mfsmaster:9421 1031528383 30522363 1001006020 3% /data/backup So it's definetly not full. I don't understand, what's going on:/ Does somebody have any idea? Thank you, tamas ---------------------------------------------------------------------------- -- vRanger cuts backup time in half-while increasing security. With the market-leading solution for virtual backup and recovery, you get blazing-fast, flexible, and affordable data protection. Download your free trial now. http://p.sf.net/sfu/quest-d2dcopy1 _______________________________________________ moosefs-users mailing list moo...@li... https://lists.sourceforge.net/lists/listinfo/moosefs-users |
From: Papp T. <to...@ma...> - 2011-05-30 09:08:42
|
On 05/30/2011 09:22 AM, Michal Borychowski wrote: > Hi! > > It looks like a problem we already know about - the master can be stuck when > a group of chunkservers disconnects what causes big data flow upon their > reconnection which leads to network timeouts what again causes chunkservers > disconnections... We have some ideas to improve this behaviour. hi! I'm sad to hear the reason, but good to hear you know the problem. My server swapping almost 5GB, so I decreased swappines but didn't help. FYI, I don't know, how much it matters: /data/backup: inodes: 33Mi directories: 4.2Mi files: 29Mi chunks: 29Mi length: 72TiB size: 73TiB realsize: 73TiB /dev/sda6 10T 7.3T 2.8T 73% /mnt/mfschunk1 mfsmaster:9421 10T 7.3T 2.8T 73% /data/backup So it counts wrong size and real size because of hardlinks. Thank for your help and your work, tamas |
From: Papp T. <to...@ma...> - 2011-05-30 09:23:49
|
On 05/30/2011 11:13 AM, Michal Borychowski wrote: > Hi! > > Just start your chunkservers one by one in such a situation. But this > happens very rarely. In this case this is not the same situation. I have only one chunkserver which the same as the master server. It starts this behaviour after a day or two days uptime. > I don't see much differences in the sizes - what do you mean exactly? > /data/backup: > inodes: 33Mi > directories: 4.2Mi > files: 29Mi > chunks: 29Mi > length: 72TiB > size: 73TiB > realsize: 73TiB > > > /dev/sda6 10T 7.3T 2.8T 73% /mnt/mfschunk1 > mfsmaster:9421 10T 7.3T 2.8T 73% /data/backup msdirinfo shows the volume size 73T while df shows the real one, which is 7.3T, or do I misunderstand something? Thanks, tamas |
From: Michal B. <mic...@ge...> - 2011-05-30 09:29:02
|
On 05/30/2011 11:13 AM, Michal Borychowski wrote: > Hi! > > Just start your chunkservers one by one in such a situation. But this > happens very rarely. In this case this is not the same situation. I have only one chunkserver which the same as the master server. It starts this behaviour after a day or two days uptime. [MB] We'll look into this again > I don't see much differences in the sizes - what do you mean exactly? > /data/backup: > inodes: 33Mi > directories: 4.2Mi > files: 29Mi > chunks: 29Mi > length: 72TiB > size: 73TiB > realsize: 73TiB > > > /dev/sda6 10T 7.3T 2.8T 73% /mnt/mfschunk1 > mfsmaster:9421 10T 7.3T 2.8T 73% /data/backup msdirinfo shows the volume size 73T while df shows the real one, which is 7.3T, or do I misunderstand something? [MB] Too much 7(.)3 ;) Regards Michal |
From: Papp T. <to...@ma...> - 2011-05-30 10:25:16
|
On 05/30/2011 11:28 AM, Michal Borychowski wrote: > length: 72TiB > > size: 73TiB > > realsize: 73TiB Do you mean, this should be read as: length: 7.2TiB size: 7.3TiB realsize: 7.3TiB ? Thanks, tamas |
From: Michal B. <mic...@ge...> - 2011-05-31 08:51:42
|
Hi! Hmmm... How many chunks do you have on this test machine? Maybe there are so many many chunks that their processing causes timeouts - but this is rather unprobable. Maybe you just have too little RAM and your swap is overused which causes timeouts? More or less you should have about 12GB of RAM. Regarding sizes of hardlinks - "fixing" it would be too demanding for the CPU of the master. Similarly mfsmakesnapshot causes multiple counting of the same data. Best regards -Michał -----Original Message----- From: Papp Tamas [mailto:to...@ma...] Sent: Monday, May 30, 2011 11:24 AM To: Michal Borychowski; moo...@li... Subject: Re: [Moosefs-users] timeout On 05/30/2011 11:13 AM, Michal Borychowski wrote: > Hi! > > Just start your chunkservers one by one in such a situation. But this > happens very rarely. In this case this is not the same situation. I have only one chunkserver which the same as the master server. It starts this behaviour after a day or two days uptime. > I don't see much differences in the sizes - what do you mean exactly? > /data/backup: > inodes: 33Mi > directories: 4.2Mi > files: 29Mi > chunks: 29Mi > length: 72TiB > size: 73TiB > realsize: 73TiB > > > /dev/sda6 10T 7.3T 2.8T 73% /mnt/mfschunk1 > mfsmaster:9421 10T 7.3T 2.8T 73% /data/backup msdirinfo shows the volume size 73T while df shows the real one, which is 7.3T, or do I misunderstand something? Thanks, tamas ---------------------------------------------------------------------------- -- vRanger cuts backup time in half-while increasing security. With the market-leading solution for virtual backup and recovery, you get blazing-fast, flexible, and affordable data protection. Download your free trial now. http://p.sf.net/sfu/quest-d2dcopy1 _______________________________________________ moosefs-users mailing list moo...@li... https://lists.sourceforge.net/lists/listinfo/moosefs-users |
From: Papp T. <to...@ma...> - 2011-05-31 10:03:08
|
On 05/31/2011 10:51 AM, Michal Borychowski wrote: > Hi! hi! > Hmmm... How many chunks do you have on this test machine? Maybe there are so > many many chunks that their processing causes timeouts - but this is rather > unprobable. /data/backup: inodes: 30Mi directories: 4.0Mi files: 26Mi chunks: 27Mi length: 67TiB size: 68TiB realsize: 68TiB I think, this is not really huge number, just a smaller backup server with dirvish. > Maybe you just have too little RAM and your swap is overused which causes > timeouts? More or less you should have about 12GB of RAM. I'm sure the RAM size should be bigger, I exptect it to work with smaller rate. > Regarding sizes of hardlinks - "fixing" it would be too demanding for the > CPU of the master. Similarly mfsmakesnapshot causes multiple counting of the > same data. I think I set rate limit for rsync, maybe helps... Thanks, tamas |