From: Michał B. <mic...@ge...> - 2010-12-07 08:53:25
|
Hi Thomas! These errors were caused by a "disconnected" hdd. If you looked in the cgi monitor you would see a disk with "damaged" status. The strange thing is that this bad chunk was retested after 10 seconds. It should have been removed after the first test. And unfortunately in this case these errors caused that MooseFS marked the hdd as damaged. But this was a "logical" error not a physical one. Probably you should run "fsck" on this hard drive. On the other hand we will make a patch so that system doesn't test the same chunk in a loop. Kind regards Michal From: Thomas S Hatch <mailto:[mailto:tha...@gm...]> [mailto:tha...@gm...] Sent: Friday, December 03, 2010 6:05 PM To: moosefs-users Subject: [Moosefs-users] Errors and then "crash" This is the second time a chunkserver has issued this type of failure in out environment, after giving this log message the chunkserver does not crash, but all files on the chunk become unavailable and it shows %0 usage on the mfsmaster Dec 3 14:58:08 localhost mfschunkserver[6969]: testing chunk: /mnt/moose1/6C/chunk_000000000008F96C_00000001.mfs Dec 3 14:58:18 localhost mfschunkserver[6969]: testing chunk: /mnt/moose1/0D/chunk_000000000001BB0D_00000001.mfs Dec 3 14:58:18 localhost mfschunkserver[6969]: chunk_readcrc: file:/mnt/moose1/0D/chunk_000000000001BB0D_00000001.mfs - wrong id/version in header (000000000001BB0D_00000000) Dec 3 14:58:18 localhost mfschunkserver[6969]: hdd_io_begin: file:/mnt/moose1/0D/chunk_000000000001BB0D_00000001.mfs - read error: Unknown error Dec 3 14:58:28 localhost mfschunkserver[6969]: testing chunk: /mnt/moose1/0D/chunk_000000000001BB0D_00000001.mfs Dec 3 14:58:28 localhost mfschunkserver[6969]: chunk_readcrc: file:/mnt/moose1/0D/chunk_000000000001BB0D_00000001.mfs - wrong id/version in header (000000000001BB0D_00000000) Dec 3 14:58:28 localhost mfschunkserver[6969]: hdd_io_begin: file:/mnt/moose1/0D/chunk_000000000001BB0D_00000001.mfs - read error: Unknown error Dec 3 14:58:38 localhost mfschunkserver[6969]: testing chunk: /mnt/moose1/0D/chunk_000000000001BB0D_00000001.mfs Dec 3 14:58:38 localhost mfschunkserver[6969]: chunk_readcrc: file:/mnt/moose1/0D/chunk_000000000001BB0D_00000001.mfs - wrong id/version in header (000000000001BB0D_00000000) Dec 3 14:58:38 localhost mfschunkserver[6969]: hdd_io_begin: file:/mnt/moose1/0D/chunk_000000000001BB0D_00000001.mfs - read error: Unknown error Dec 3 14:58:38 localhost mfschunkserver[6969]: 3 errors occurred in 60 seconds on folder: /mnt/moose1/ Dec 3 14:58:39 localhost mfschunkserver[6969]: replicator: hdd_create status: 21 I am running the prerelease of 1.6.18 on Ubuntu 10.04. After restarting the chunkserver everything comes back online without problems. Any ideas as to what could be causing this? -Tom Hatch |