From: Papp T. <to...@ma...> - 2011-05-28 17:49:31
|
On 05/28/2011 03:12 PM, Papp Tamas wrote: > Does somebody have any idea? Again, more info on this: More info on this. The mfsmaster died totally, strace show nothing and the process is in state 'D': 12869 ? D< 199:42 /usr/sbin/mfsmaster Actually I'm removing files from trash. After the rm job the master node came back. log: May 28 19:30:00 backup1 mfsmaster[12869]: chunkservers status: May 28 19:30:00 backup1 mfsmaster[12869]: server 1 (ip: 192.168.3.21, port: 9422): usedspace: 7726711619584 (7196.06 GiB), totalspace: 10944744390656 (10193.09 GiB), usage: 70.60% May 28 19:30:00 backup1 mfsmaster[12869]: total: usedspace: 7726711619584 (7196.06 GiB), totalspace: 10944744390656 (10193.09 GiB), usage: 70.60% May 28 19:30:08 backup1 mfschunkserver[2556]: testing chunk: /mnt/mfschunk1/10/chunk_0000000000008A10_00000001.mfs May 28 19:30:10 backup1 mfsmount[3621]: master: tcp recv error: ETIMEDOUT (Operation timed out) (1) May 28 19:30:10 backup1 mfsmount[12917]: master: tcp recv error: ETIMEDOUT (Operation timed out) (1) May 28 19:30:11 backup1 mfsmount[3621]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:30:11 backup1 mfsmount[12917]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:30:14 backup1 mfsmount[3621]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:30:14 backup1 mfsmount[12917]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:30:17 backup1 mfsmount[3621]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:30:17 backup1 mfsmount[12917]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:30:18 backup1 mfschunkserver[2556]: testing chunk: /mnt/mfschunk1/11/chunk_0000000000008A11_00000001.mfs May 28 19:30:20 backup1 mfsmount[3621]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:30:20 backup1 mfsmount[12917]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:30:23 backup1 mfsmount[3621]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:30:23 backup1 mfsmount[12917]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:30:26 backup1 mfsmount[3621]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:30:26 backup1 mfsmount[12917]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:30:28 backup1 mfschunkserver[2556]: testing chunk: /mnt/mfschunk1/12/chunk_0000000000008A12_00000001.mfs May 28 19:30:29 backup1 mfsmount[3621]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:30:29 backup1 mfsmount[12917]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:30:32 backup1 mfsmount[3621]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:30:32 backup1 mfsmount[12917]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:30:35 backup1 mfsmount[3621]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:30:35 backup1 mfsmount[12917]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:30:38 backup1 mfsmount[3621]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:30:38 backup1 mfsmount[12917]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:30:39 backup1 mfschunkserver[2556]: testing chunk: /mnt/mfschunk1/13/chunk_0000000000008A13_00000001.mfs May 28 19:30:41 backup1 mfsmount[3621]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:30:41 backup1 mfsmount[12917]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:30:44 backup1 mfsmount[3621]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:30:44 backup1 mfsmount[12917]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:30:47 backup1 mfsmount[3621]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:30:47 backup1 mfsmount[12917]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:30:49 backup1 mfschunkserver[2556]: testing chunk: /mnt/mfschunk1/14/chunk_0000000000008A14_00000001.mfs May 28 19:30:41 backup1 mfsmount[3621]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:30:41 backup1 mfsmount[12917]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:30:44 backup1 mfsmount[3621]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:30:44 backup1 mfsmount[12917]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:30:47 backup1 mfsmount[3621]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:30:47 backup1 mfsmount[12917]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:30:49 backup1 mfschunkserver[2556]: testing chunk: /mnt/mfschunk1/14/chunk_0000000000008A14_00000001.mfs May 28 19:30:50 backup1 mfsmount[3621]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:30:50 backup1 mfsmount[12917]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:30:53 backup1 mfsmount[3621]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:30:53 backup1 mfsmount[12917]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:30:56 backup1 mfsmount[3621]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:30:56 backup1 mfsmount[12917]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:30:59 backup1 mfschunkserver[2556]: testing chunk: /mnt/mfschunk1/15/chunk_0000000000008A15_00000001.mfs May 28 19:30:59 backup1 mfsmount[3621]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:30:59 backup1 mfsmount[12917]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:31:02 backup1 mfsmount[3621]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:31:02 backup1 mfsmount[12917]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:31:05 backup1 mfschunkserver[2556]: connecting ... May 28 19:31:05 backup1 mfschunkserver[2556]: connected to Master May 28 19:31:05 backup1 mfsmount[3621]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:31:05 backup1 mfsmount[12917]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:31:08 backup1 mfsmount[3621]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:31:08 backup1 mfsmount[12917]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:31:11 backup1 mfsmount[3621]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:31:11 backup1 mfsmount[12917]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:31:14 backup1 mfsmount[3621]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:31:14 backup1 mfsmount[12917]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:31:17 backup1 mfsmount[3621]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:31:17 backup1 mfsmount[12917]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:31:20 backup1 mfsmount[3621]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:31:20 backup1 mfsmount[12917]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:31:23 backup1 mfsmount[3621]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:31:23 backup1 mfsmount[12917]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:31:26 backup1 mfsmount[3621]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:31:26 backup1 mfsmount[12917]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:31:29 backup1 mfsmount[3621]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:31:29 backup1 mfsmount[12917]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:31:32 backup1 mfsmount[3621]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:31:32 backup1 mfsmount[12917]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:31:35 backup1 mfsmount[3621]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:31:35 backup1 mfsmount[12917]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:31:38 backup1 mfsmount[3621]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:31:38 backup1 mfsmount[12917]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:31:41 backup1 mfsmount[3621]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:31:41 backup1 mfsmount[12917]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:31:44 backup1 mfsmount[3621]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:31:44 backup1 mfsmount[12917]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:31:47 backup1 mfsmount[3621]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:31:47 backup1 mfsmount[12917]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:31:50 backup1 mfsmount[3621]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:31:50 backup1 mfsmount[12917]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:31:53 backup1 mfsmount[3621]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:31:53 backup1 mfsmount[12917]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:31:56 backup1 mfsmount[3621]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:31:56 backup1 mfsmount[12917]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:31:59 backup1 mfsmount[3621]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:31:59 backup1 mfsmount[12917]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:32:02 backup1 mfsmount[3621]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:32:02 backup1 mfsmount[12917]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:32:05 backup1 mfsmount[3621]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:32:05 backup1 mfsmount[12917]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:32:08 backup1 mfsmount[3621]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:32:08 backup1 mfsmount[12917]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:32:11 backup1 mfsmount[3621]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:32:11 backup1 mfsmount[12917]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:32:14 backup1 mfsmount[3621]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:32:14 backup1 mfsmount[12917]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:32:15 backup1 mfschunkserver[2556]: testing chunk: /mnt/mfschunk1/16/chunk_0000000000008A16_00000001.mfs May 28 19:32:17 backup1 mfsmount[3621]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:32:17 backup1 mfsmount[12917]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:32:20 backup1 mfsmount[3621]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:32:20 backup1 mfsmount[12917]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:32:23 backup1 mfsmount[3621]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:32:23 backup1 mfsmount[12917]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:32:25 backup1 mfschunkserver[2556]: connecting ... May 28 19:32:25 backup1 mfschunkserver[2556]: connected to Master May 28 19:32:26 backup1 mfschunkserver[2556]: testing chunk: /mnt/mfschunk1/17/chunk_0000000000008A17_00000001.mfs May 28 19:32:26 backup1 mfsmount[3621]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:32:26 backup1 mfsmount[12917]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:32:29 backup1 mfsmount[3621]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:32:29 backup1 mfsmount[12917]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:32:32 backup1 mfsmount[3621]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:32:32 backup1 mfsmount[12917]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:32:35 backup1 mfsmount[3621]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:32:35 backup1 mfsmount[12917]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:32:36 backup1 mfschunkserver[2556]: testing chunk: /mnt/mfschunk1/18/chunk_0000000000008A18_00000001.mfs May 28 19:32:38 backup1 mfsmount[3621]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:32:38 backup1 mfsmount[12917]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:32:41 backup1 mfsmount[3621]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:32:41 backup1 mfsmount[12917]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:32:44 backup1 mfsmount[3621]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:32:46 backup1 mfschunkserver[2556]: testing chunk: /mnt/mfschunk1/19/chunk_0000000000008A19_00000001.mfs May 28 19:32:47 backup1 mfsmount[3621]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:32:53 backup1 mfsmount[3621]: last message repeated 2 times May 28 19:32:53 backup1 mfsmount[12917]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:32:56 backup1 mfschunkserver[2556]: testing chunk: /mnt/mfschunk1/1A/chunk_0000000000008A1A_00000001.mfs May 28 19:32:59 backup1 mfsmount[3621]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:33:00 backup1 mfsmount[12917]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:33:05 backup1 mfsmount[3621]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:33:06 backup1 mfsmount[12917]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:33:06 backup1 mfschunkserver[2556]: testing chunk: /mnt/mfschunk1/1B/chunk_0000000000008A1B_00000001.mfs May 28 19:33:11 backup1 mfsmount[3621]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:33:12 backup1 mfsmount[12917]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:33:16 backup1 mfschunkserver[2556]: testing chunk: /mnt/mfschunk1/1C/chunk_0000000000008A1C_00000001.mfs May 28 19:33:17 backup1 mfsmount[3621]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:33:18 backup1 mfsmount[12917]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:33:18 backup1 mfsmaster[12869]: connection with client(ip:192.168.3.21) has been closed by peer May 28 19:33:18 backup1 mfsmaster[12869]: connection with client(ip:192.168.3.21) has been closed by peer May 28 19:33:18 backup1 mfsmaster[12869]: connection with CS(192.168.3.21) has been closed by peer May 28 19:33:18 backup1 mfsmaster[12869]: chunkserver disconnected - ip: 192.168.3.21, port: 9422, usedspace: 7726711480320 (7196.06 GiB), totalspace: 10944744390656 (10193.09 GiB) May 28 19:33:20 backup1 mfsmount[3621]: master: register error (read header: ETIMEDOUT (Operation timed out)) May 28 19:33:23 backup1 mfsmaster[12869]: connection with ML(192.168.3.20) has been closed by peer May 28 19:33:23 backup1 mfsmaster[12869]: connection with ML(192.168.3.13) has been closed by peer May 28 19:33:23 backup1 mfsmaster[12869]: connection with client(ip:192.168.3.21) has been closed by peer May 28 19:33:23 backup1 mfsmaster[12869]: chunkserver register begin (packet version: 5) - ip: 192.168.3.21, port: 9422 May 28 19:33:23 backup1 mfsmaster[12869]: connection with ML(192.168.3.13) has been closed by peer May 28 19:33:23 backup1 mfsmaster[12869]: connection with client(ip:192.168.3.21) has been closed by peer May 28 19:33:23 backup1 mfsmaster[12869]: chunk-server already connected !!! May 28 19:33:23 backup1 mfsmaster[12869]: connection with CS(192.168.3.21) has been closed by peer May 28 19:33:23 backup1 mfsmaster[12869]: connection with CS(192.168.3.21) has been closed by peer May 28 19:33:23 backup1 mfsmaster[12869]: chunkserver disconnected - ip: 192.168.3.21, port: 9422, usedspace: 0 (0.00 GiB), totalspace: 0 (0.00 GiB) May 28 19:33:23 backup1 mfsmaster[12869]: chunkserver disconnected - ip: 192.168.3.21, port: 9422, usedspace: 0 (0.00 GiB), totalspace: 0 (0.00 GiB) May 28 19:33:23 backup1 mfsmaster[12869]: connection with ML(192.168.3.20) has been closed by peer May 28 19:33:23 backup1 mfsmaster[12869]: connection with client(ip:192.168.3.21) has been closed by peer May 28 19:33:23 backup1 mfsmaster[12869]: connection with ML(192.168.3.13) has been closed by peer May 28 19:33:23 backup1 mfsmaster[12869]: connection with client(ip:192.168.3.21) has been closed by peer May 28 19:33:23 backup1 mfsmaster[12869]: connection with ML(192.168.3.20) has been closed by peer May 28 19:33:23 backup1 mfsmaster[12869]: connection with client(ip:192.168.3.21) has been closed by peer May 28 19:33:23 backup1 mfsmaster[12869]: last message repeated 98 times May 28 19:33:23 backup1 mfsmount[12917]: registered to master May 28 19:33:23 backup1 mfsmount[3621]: registered to master May 28 19:33:23 backup1 mfsmount[12917]: file: 122301, index: 0, chunk: 4817595, version: 1 - there are no valid copies May 28 19:33:23 backup1 mfsmount[12917]: file: 122301, index: 0 - can't connect to proper chunkserver (try counter: 1) May 28 19:33:24 backup1 mfsmaster[12869]: connection with client(ip:192.168.3.21) has been closed by peer May 28 19:33:25 backup1 mfschunkserver[2556]: connecting ... May 28 19:33:25 backup1 mfschunkserver[2556]: connected to Master May 28 19:33:25 backup1 mfsmaster[12869]: connection with client(ip:192.168.3.21) has been closed by peer May 28 19:33:25 backup1 mfsmaster[12869]: connection with client(ip:192.168.3.21) has been closed by peer May 28 19:33:26 backup1 mfschunkserver[2556]: testing chunk: /mnt/mfschunk1/1D/chunk_0000000000008A1D_00000001.mfs May 28 19:33:26 backup1 mfsmaster[12869]: chunkserver register begin (packet version: 5) - ip: 192.168.3.21, port: 9422 May 28 19:33:28 backup1 mfsmaster[12869]: chunkserver register end (packet version: 5) - ip: 192.168.3.21, port: 9422, usedspace: 7726711480320 (7196.06 GiB), totalspace: 10944744390656 (10193.09 GiB) May 28 19:33:28 backup1 mfsmaster[12869]: connection with client(ip:192.168.3.21) has been closed by peer May 28 19:33:28 backup1 mfsmaster[12869]: connection with client(ip:192.168.3.21) has been closed by peer May 28 19:33:36 backup1 mfschunkserver[2556]: testing chunk: /mnt/mfschunk1/1E/chunk_0000000000008A1E_00000001.mfs mfsmaster is not stopable again. After /etc/init.mfs-master stop I run it again but it's stucked, and master is again in state 'D'. tamas |