From: Michal B. <mic...@ge...> - 2011-03-23 12:13:13
|
Hi Thomas! You have bad chunk headers (but we don't know why). You can just erase the wrong chunks or change (just for some time) these constants: #define LASTERRSIZE 3 #define LASTERRTIME 60 to: #define LASTERRSIZE 10 #define LASTERRTIME 1 in the mfschunkserver/hddspacemgr.c file, recompile CS and run it again. CS will stop to "unlink" the disks and will remove the wrong chunks by itself. Regards -Michal From: Thomas S Hatch [mailto:tha...@gm...] Sent: Wednesday, March 23, 2011 6:10 AM To: moosefs-users Subject: [Moosefs-users] Failing Chunkserver I am having some trouble with a chunkserver, it errors out and then the chunkserver stops working and reports %0 on the mfs cgi page Here is the error in the logs. 2011-03-22T16:18:38+00:00 node10 mfschunkserver[25905]: set gid to 70003 2011-03-22T16:18:38+00:00 node10 mfschunkserver[25905]: set uid to 70003 2011-03-22T16:18:38+00:00 node10 mfschunkserver[25783]: closing 172.11.1.110:9422 2011-03-22T16:18:47+00:00 node10 mfschunkserver[25905]: main server module: listen on 172.11.1.110:9422 2011-03-22T16:18:48+00:00 node10 mfschunkserver[25905]: connecting ... 2011-03-22T16:18:48+00:00 node10 mfschunkserver[25905]: stats file has been loaded 2011-03-22T16:18:48+00:00 node10 mfschunkserver[25905]: open files limit: 10000 2011-03-22T16:18:48+00:00 node10 mfschunkserver[25905]: connected to Master 2011-03-22T16:18:52+00:00 node10 mfschunkserver[25905]: testing chunk: /mnt/moose1/11/chunk_00000000019A9511_00000001.mfs 2011-03-22T16:18:52+00:00 node10 mfschunkserver[25905]: chunk_readcrc: file:/mnt/moose1/11/chunk_00000000019A9511_00000001.mfs - wrong id/version in header (00000000019A9511_00000000) 2011-03-22T16:18:52+00:00 node10 mfschunkserver[25905]: hdd_io_begin: file:/mnt/moose1/11/chunk_00000000019A9511_00000001.mfs - read error: Unknown error 2011-03-22T16:19:02+00:00 node10 mfschunkserver[25905]: testing chunk: /mnt/moose1/4A/chunk_0000000001EFFE4A_00000001.mfs 2011-03-22T16:19:03+00:00 node10 mfschunkserver[25905]: chunk_readcrc: file:/mnt/moose1/4A/chunk_0000000001EFFE4A_00000001.mfs - wrong id/version in header (0000000001EFFE4A_00000000) 2011-03-22T16:19:03+00:00 node10 mfschunkserver[25905]: hdd_io_begin: file:/mnt/moose1/4A/chunk_0000000001EFFE4A_00000001.mfs - read error: Unknown error 2011-03-22T16:19:13+00:00 node10 mfschunkserver[25905]: testing chunk: /mnt/moose1/83/chunk_0000000001776783_00000001.mfs 2011-03-22T16:19:13+00:00 node10 mfschunkserver[25905]: chunk_readcrc: file:/mnt/moose1/83/chunk_0000000001776783_00000001.mfs - wrong id/version in header (0000000001776783_00000000) 2011-03-22T16:19:13+00:00 node10 mfschunkserver[25905]: hdd_io_begin: file:/mnt/moose1/83/chunk_0000000001776783_00000001.mfs - read error: Unknown error 2011-03-22T16:19:13+00:00 node10 mfschunkserver[25905]: 3 errors occurred in 60 seconds on folder: /mnt/moose1/ 2011-03-22T16:19:15+00:00 node10 mfschunkserver[25905]: replicator: hdd_create status: 21 What do these errors mean? And what is the best way to recover? If worse comes to worse we of course have replicated chunks, so we can format the chunkserver and start it back up, but I am very curious how to best approach the situation. -Thomas S Hatch |