From: Thomas S H. <tha...@gm...> - 2011-03-23 14:48:32
|
Thanks Michal! We were having some hardware issues on the node, and I suspect that this is a residual problem, I will give your suggestion a try! 2011/3/23 Michal Borychowski <mic...@ge...> > Hi Thomas! > > > > You have bad chunk headers (but we don’t know why). You can just erase the > wrong chunks or change (just for some time) these constants: > > > > #define LASTERRSIZE 3 > > #define LASTERRTIME 60 > > > > to: > > > > #define LASTERRSIZE 10 > > #define LASTERRTIME 1 > > > > in the mfschunkserver/hddspacemgr.c file, recompile CS and run it again. CS > will stop to “unlink” the disks and will remove the wrong chunks by itself. > > > > > > Regards > > -Michal > > > > *From:* Thomas S Hatch [mailto:tha...@gm...] > *Sent:* Wednesday, March 23, 2011 6:10 AM > *To:* moosefs-users > *Subject:* [Moosefs-users] Failing Chunkserver > > > > I am having some trouble with a chunkserver, it errors out and then the > chunkserver stops working and reports %0 on the mfs cgi page > > Here is the error in the logs. > > > > 2011-03-22T16:18:38+00:00 node10 mfschunkserver[25905]: set gid to 70003 > > 2011-03-22T16:18:38+00:00 node10 mfschunkserver[25905]: set uid to 70003 > > 2011-03-22T16:18:38+00:00 node10 mfschunkserver[25783]: closing > 172.11.1.110:9422 > > 2011-03-22T16:18:47+00:00 node10 mfschunkserver[25905]: main server module: > listen on 172.11.1.110:9422 > > 2011-03-22T16:18:48+00:00 node10 mfschunkserver[25905]: connecting ... > > 2011-03-22T16:18:48+00:00 node10 mfschunkserver[25905]: stats file has been > loaded > > 2011-03-22T16:18:48+00:00 node10 mfschunkserver[25905]: open files limit: > 10000 > > 2011-03-22T16:18:48+00:00 node10 mfschunkserver[25905]: connected to Master > > 2011-03-22T16:18:52+00:00 node10 mfschunkserver[25905]: testing chunk: > /mnt/moose1/11/chunk_00000000019A9511_00000001.mfs > > 2011-03-22T16:18:52+00:00 node10 mfschunkserver[25905]: chunk_readcrc: > file:/mnt/moose1/11/chunk_00000000019A9511_00000001.mfs - wrong id/version > in header (00000000019A9511_00000000) > > 2011-03-22T16:18:52+00:00 node10 mfschunkserver[25905]: hdd_io_begin: > file:/mnt/moose1/11/chunk_00000000019A9511_00000001.mfs - read error: > Unknown error > > 2011-03-22T16:19:02+00:00 node10 mfschunkserver[25905]: testing chunk: > /mnt/moose1/4A/chunk_0000000001EFFE4A_00000001.mfs > > 2011-03-22T16:19:03+00:00 node10 mfschunkserver[25905]: chunk_readcrc: > file:/mnt/moose1/4A/chunk_0000000001EFFE4A_00000001.mfs - wrong id/version > in header (0000000001EFFE4A_00000000) > > 2011-03-22T16:19:03+00:00 node10 mfschunkserver[25905]: hdd_io_begin: > file:/mnt/moose1/4A/chunk_0000000001EFFE4A_00000001.mfs - read error: > Unknown error > > 2011-03-22T16:19:13+00:00 node10 mfschunkserver[25905]: testing chunk: > /mnt/moose1/83/chunk_0000000001776783_00000001.mfs > > 2011-03-22T16:19:13+00:00 node10 mfschunkserver[25905]: chunk_readcrc: > file:/mnt/moose1/83/chunk_0000000001776783_00000001.mfs - wrong id/version > in header (0000000001776783_00000000) > > 2011-03-22T16:19:13+00:00 node10 mfschunkserver[25905]: hdd_io_begin: > file:/mnt/moose1/83/chunk_0000000001776783_00000001.mfs - read error: > Unknown error > > 2011-03-22T16:19:13+00:00 node10 mfschunkserver[25905]: 3 errors occurred > in 60 seconds on folder: /mnt/moose1/ > > 2011-03-22T16:19:15+00:00 node10 mfschunkserver[25905]: replicator: > hdd_create status: 21 > > > > What do these errors mean? And what is the best way to recover? > > > > If worse comes to worse we of course have replicated chunks, so we can > format the chunkserver and start it back up, but I am very curious how to > best approach the situation. > > > > -Thomas S Hatch > |