From: C. C. <c-...@uc...> - 2012-03-09 22:15:23
|
I've been running MooseFS as a scratch space on a small computing cluster to aggregate unused storage. It has been very robust, far more so than my experiments with OrangeFS and GlusterFS. The total size of the filesystem is 6.4TB with goal set to 2, so the effective usable space is 3.2TB. Last week the node running the mfsmaster had to be rebooted due to a runaway process which had nothing to do with MooseFS but which spiked the system load so that the node became unresponsive and mfschunkserver and mfsmount connections began to time out. The filesystem where mfsmaster stored its metadata/changelogs did not fill up, nor did the system run out of physical memory. After rebooting, the mfsmetarestore rebuilt the metadata.mfs file without incident. But when the mfsmaster process is started there are numerous messages of the type: master mfsmaster[26378]: chunkserver has nonexistent chunk (0000000002CC8487_00000001), so create it for future deletion After about 24-48 hours of this mfsmaster then abruptly terminates. I have tried using an earlier subset of changelog files and using the changelogs and metadata backup files on the metadata loggers, without any success and with the same messages followed by the mfsmaster crashing after 24+ hours. Is there anything else I can try? Since this is a scratch system it's not necessary to recover it, but I'd like to find out what went wrong. -- C. Chan <c-chan at uchicago.edu> GPG Public Key registered at pgp.mit.edu |