[MooseFS-Users] Chunk lost but metadata healthy

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Hi all

I want to share with you my recent experience.

Two days ago or so we experienced an unexpected power-loss. The master 
and 1 chunk-server shutdown properly but the remaining 2 chunk-servers 
had a degraded UPS battery that left them too early...

After power-back the metadata was healthy (as expected) but 1 data chunk 
was lost. 1 file was affected and, even with mfsfilerepair, I had no 
chance of restoring it.

At the beginning I thought it was due to write-cache on chunk-disks: in 
the case of that specific chunk was about to be written on those servers 
that had faulty UPS, most likely the write cache was still holding the 
data and it was then lost forever.
Unfortunately, a rapid check of disks configuration negated that, being 
write cache disabled on those two servers. Moreover, no battery-backed 
RAID controllers are used for chunk disks.
I'm quite sure write-barriers are enabled by default on CENTOS (the only 
distro we use here)

How can I mitigate the possibility of experiencing such a problem again? 
(apart from changin' UPS batteries... :-)
The goal is now set to 2. Should I increase to 3?

Our system:
Server1: master+chunk (2 dedicated HDs - XFS filesystem - write cache 
enabled)
Server2: metalog+chunk (2 dedicated HDs - XFS filesystem - write cache 
disabled)
Server3: metalog+chunk (2 dedicated HDs - XFS filesystem - write cache 
disabled)
Server4: metalog+chunk (2 dedicated HDs - XFS filesystem - write cache 
disabled)

Thanks for reading

Bye

Raffaello

[MooseFS-Users] Chunk lost but metadata healthy

Fault tolerant, POSIX-compliant, Net Distributed Storage / File System

[MooseFS-Users] Chunk lost but metadata healthy