Re: [Moosefs-users] ChunkServer in different Level

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

On 1/13/12 12:47 AM, ro...@mm... wrote:
> Sorry for intervening and excuse a moosefs newbie question.
> Why are you concerned so much about mfsmaster failing? How often does this
> happen?
>
> I am considering moosefs for a small lan of 15 users, mainly for
> aggregating unused storage space from various machines. Googling suggested
> moosefs is rather robust, but this thread suggest otherwise.
> Have I misunderstood something?

The Master is a single point of failure. If it fails, your Data is not 
available until you bring it backup.

The MooseFS software is very reliable, we run several clusters and have 
only seen failures due to human error or hardware (we started off 
testing with old, thrown off kit).

The good news is that if you have MetaLoggers, recovering is very easy 
and very reliable. We have never seen data loss due with a recovery 
(except for data "on the fly") and we have seen some rather "inelegant" 
failures as we were playing around with the system.

So use good kit (like a server class chassis, dual Power Supply, UPS and 
ECC memory) and you dramatically reduce the chance of the outage. Make 
sure one of the MetaLoggers is capable of being a Master, so you can 
promote it if needed. There are lots of reasons for an outage and 
MooseFS is pretty minor on the list.

Because it is a rare issue (with good kit) AND our application can deal 
with some downtime, we elected to not have an automated failover and 
have a human identify what the real issue is and handle it.

If the Master fails a staff member just promotes a MetaLogger to take 
over the role until we can fix the real master and switch back at a 
convenient time.  Downtime for that is 5-15 minutes once you figure in, 
identifying the issue, recovering the metadata and moving over the IP, 
clearing arps  and maybe restarting chunkservers, depending upon what 
happened.

When you do recover, there will be garbage files (for the on the fly 
files) in the Control Panel that eventually get cleaned out automatically.

As mentioned there ARE better procedures, and we would love to have a 
more automated (reliable) failover, but its not quite there yet.

Re: [Moosefs-users] ChunkServer in different Level

Fault tolerant, POSIX-compliant, Net Distributed Storage / File System

Re: [Moosefs-users] ChunkServer in different Level