[Moosefs-users] mfsmaster crashes during metadata rebuild

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

I've been running MooseFS as a scratch space on a small computing cluster
to aggregate unused storage.  It has been very robust, far more so
than my experiments with OrangeFS and GlusterFS.  The total size
of the filesystem is 6.4TB with goal set to 2, so the effective
usable space is 3.2TB.

Last week the node running the mfsmaster had to be rebooted due
to a runaway process which had nothing to do with MooseFS but which spiked
the system load so that the node became unresponsive and mfschunkserver
and mfsmount connections began to time out.  The filesystem where mfsmaster
stored its metadata/changelogs did not fill up, nor did the system
run out of physical memory.

After rebooting, the mfsmetarestore rebuilt the metadata.mfs file
without incident.  But when the mfsmaster process is started there are
numerous messages of the type:

master mfsmaster[26378]: chunkserver has nonexistent chunk (0000000002CC8487_00000001), so create it for future deletion

After about 24-48 hours of this mfsmaster then abruptly terminates.

I have tried using an earlier subset of changelog files and using the
changelogs and metadata backup files on the metadata loggers, without
any success and with the same messages followed by the mfsmaster crashing
after 24+ hours.

Is there anything else I can try?  Since this is a scratch system it's
not necessary to recover it, but I'd like to find out what went wrong.

-- 
C. Chan <c-chan at uchicago.edu>
GPG Public Key registered at pgp.mit.edu

[Moosefs-users] mfsmaster crashes during metadata rebuild

Fault tolerant, POSIX-compliant, Net Distributed Storage / File System

[Moosefs-users] mfsmaster crashes during metadata rebuild